SOLID STATE PRESS
← Back to catalog
The Least Squares Method cover
Coming soon
Coming soon to Amazon
This title is in our publishing queue.
Browse available titles
Mathematics

The Least Squares Method

Residuals, the Normal Equations, and the Line That Best Fits — A TLDR Primer

Least squares regression shows up in statistics class, AP courses, college math, and data science — and most textbooks bury the core idea under pages of theory before you ever see a single worked number. This guide cuts straight to what you need.

**The Least Squares Method: A TLDR Primer** walks you through the complete picture of linear regression, concise and without the bloat. You'll start with the real problem: real data is messy, and you need a principled way to draw the best possible line through it. From there, the guide explains exactly why squaring residuals is the right move (not absolute values, not signed sums), derives the slope and intercept formulas using basic calculus, and then works through a complete numerical example by hand — residuals, R-squared, and a prediction included.

The guide doesn't stop at the simple case. It names the failure modes that trip students up — outliers, nonlinearity, heteroscedasticity — explains what each one actually does to your results, and points to the standard fixes. The final section generalizes to multiple regression through the matrix normal equations, giving you a clean on-ramp to the statistics and machine learning courses that come next.

Written for high school students in statistics or precalculus, college freshmen and sophomores meeting regression for the first time, and anyone who needs a fast, honest explanation of how the line of best fit actually works. Short by design, every section earns its place.

If least squares is on your next exam or assignment, start here.

What you'll learn
  • Define residuals and explain why we minimize their squares rather than their absolute values or signed sums
  • Derive and apply the formulas for the slope and intercept of the least-squares regression line
  • Compute a best-fit line by hand from a small data set and interpret slope, intercept, and R-squared
  • Recognize when least squares is appropriate and when outliers, nonlinearity, or heteroscedasticity break it
  • Connect the one-variable case to the general normal equations used in multiple regression
What's inside
  1. 1. The Problem: Fitting a Line to Messy Data
    Sets up the core problem least squares solves and introduces residuals and the cost function.
  2. 2. Why Squares? The Logic Behind the Choice
    Explains why we square residuals instead of taking absolute values or signed sums, with both geometric and statistical reasons.
  3. 3. Deriving the Slope and Intercept Formulas
    Uses calculus to minimize the sum of squared residuals and derive the closed-form formulas for slope and intercept.
  4. 4. A Worked Example from Scratch
    Computes a best-fit line by hand on a small data set, including residuals, R-squared, and prediction.
  5. 5. When Least Squares Fails (and What to Do About It)
    Names the common failure modes — outliers, nonlinearity, heteroscedasticity — and the standard fixes.
  6. 6. Beyond One Variable: The General Picture
    Generalizes to multiple regression via the matrix normal equations and points to where this leads in statistics and machine learning.
Published by Solid State Press
The Least Squares Method cover
TLDR STUDY GUIDES

The Least Squares Method

Residuals, the Normal Equations, and the Line That Best Fits — A TLDR Primer
Solid State Press

Contents

  1. 1 The Problem: Fitting a Line to Messy Data
  2. 2 Why Squares? The Logic Behind the Choice
  3. 3 Deriving the Slope and Intercept Formulas
  4. 4 A Worked Example from Scratch
  5. 5 When Least Squares Fails (and What to Do About It)
  6. 6 Beyond One Variable: The General Picture
Chapter 1

The Problem: Fitting a Line to Messy Data

Imagine you record how many hours five students studied for a test and the score each one earned. You plot the points on a graph — hours on the horizontal axis, score on the vertical — and you see a rough upward trend, but the points are scattered. They don't lie on a single straight line. No straight line perfectly describes this data. So which line comes closest?

That question is the entire problem least squares is built to answer.

A scatterplot is simply a graph where each observation becomes a point $(x_i, y_i)$: one value you measure or control ($x$, the input), and one value you observe ($y$, the output). Real data is messy. Measurement error, individual variation, factors you didn't record — all of these push points away from any clean pattern. The goal is to find the line that captures the underlying trend as faithfully as possible, despite the mess.

A line is described by two numbers: a slope $m$ and an intercept $b$, giving the equation $\hat{y} = mx + b$. The hat on $\hat{y}$ is standard notation meaning "the value the line predicts," as opposed to $y$, the value you actually observed. For a given $x_i$, the line predicts $\hat{y}_i = mx_i + b$. The data gives you $y_i$. Those two numbers are almost never equal.

The gap between them is the residual.

$e_i = y_i - \hat{y}_i$

A residual is the vertical distance from a data point down (or up) to the line. If the point sits above the line, $e_i$ is positive. If below, $e_i$ is negative. Residuals are signed — they carry information about direction, not just size. A common mistake is to think of residuals as horizontal distances or as distances measured perpendicularly to the line. They are neither: residuals are always measured vertically, because we're asking "how wrong is the line's prediction of $y$?"

Now the core question: given that every candidate line produces a different set of residuals, how do we declare one line the best?

We need a single number — a cost function — that measures how poorly a given line fits the data as a whole. The lower the cost, the better the fit. The best-fit line is the one that makes the cost as small as possible.

About This Book

If you're taking AP Statistics, a college intro to statistics course, or any class where you need to understand linear regression, this book is for you. It's also for the student who has stared at a scatter plot and wondered how to find the best-fit line by hand — without just punching numbers into a calculator and hoping for the best.

This is a statistics math primer for beginners that covers the core ideas clearly: what residuals are, why we minimize squared error, the least squares method calculus derivation behind the slope and intercept formulas, and how to read results including residuals and R-squared explained simply. It also introduces the general framework for multiple regression for students who need a first look beyond one variable. Concise and focused, with ruthless cuts — no filler.

Read straight through to build the logic in order. Work every worked example yourself before reading the solution. Then use the problem set at the end to confirm you can apply the ideas on your own.

Keep reading

You've read the first half of Chapter 1. The complete book covers 6 chapters in roughly fifteen pages — readable in one sitting.

Coming soon to Amazon