R-Squared and Adjusted R-Squared
Goodness of Fit, the Bias of Adding Variables, and What R² Doesn't Tell You — A TLDR Primer
Your stats textbook spends dozens of pages on regression before it ever explains what R² actually means — and even then the explanation is buried in notation. If you have an exam coming up, a problem set due, or you just need to understand why your professor keeps talking about "goodness of fit," this guide cuts straight to it.
**R-Squared and Adjusted R-Squared** is a focused, no-filler primer that covers exactly what the title promises. You'll learn what R² is actually measuring (proportion of variance explained — not accuracy, not causation), how to build it from the three sums of squares (SST, SSR, and SSE) with a fully worked numerical example, and why R² has a clean relationship with Pearson's correlation coefficient in simple linear regression. Then comes the problem every student eventually hits: add any variable to a regression model and R² goes up, even if that variable is useless. The guide explains why this happens and how adjusted R² applies a complexity penalty to fix it — complete with the formula, a worked computation, and the edge cases (yes, adjusted R² can go negative).
The final section catalogs the most common misuses of R²: mistaking a high value for evidence of causation, assuming it means good predictions, or ignoring whether the model form is even appropriate for the data.
Written for high school and early college students in statistics, AP Statistics, introductory econometrics, or any course that touches regression. Concise and stripped to essentials, with definitions, worked examples, and misconception callouts throughout.
If R² has felt slippery, pick this up and get clear on it today.
- Define R² as the proportion of variance explained and compute it from sums of squares
- Derive R² from SST, SSR, and SSE and connect it to the correlation coefficient in simple regression
- Explain why R² never decreases when predictors are added, and how adjusted R² corrects for this
- Compute adjusted R² from R², sample size, and number of predictors
- Identify common misuses of R² (causation, model adequacy, prediction accuracy) and know when to use adjusted R² instead
- 1. What R² Actually MeasuresIntroduces R² as the proportion of variance in the response variable explained by the regression model, with intuition before formulas.
- 2. The Sums of Squares: SST, SSR, and SSEBuilds R² from its components — total, regression (explained), and error (residual) sums of squares — with a fully worked numerical example.
- 3. R² and the Correlation CoefficientShows that in simple linear regression R² equals the square of Pearson's r, and clarifies what changes once you move to multiple regression.
- 4. Why R² Keeps Going Up: The Need for Adjusted R²Explains why adding any predictor — even a useless one — never decreases R², motivating a penalty for model complexity.
- 5. Computing and Interpreting Adjusted R²Presents the adjusted R² formula, walks through computation, and shows when it rises, falls, or even goes negative.
- 6. What R² Doesn't Tell YouCatalogs the common misuses — equating high R² with causation, good prediction, or correct model form — and gives rules of thumb for using R² responsibly.