SOLID STATE PRESS
← Back to catalog
Correlation vs. Causation cover
Buy on Amazon
US list price $2.99
Mathematics

Correlation vs. Causation

Confounders, Spurious Patterns, and Why Randomized Trials Win — A TLDR Primer

You've seen the headlines: "Coffee drinkers live longer" or "Ice cream sales linked to drowning deaths." But does that mean coffee causes longevity — or ice cream causes drowning? Probably not. Knowing the difference between a real cause and a coincidental pattern in data is one of the most useful skills in modern life, and most students never get a clear explanation of it.

This TLDR primer covers exactly that gap. You'll learn what correlation actually measures and how to read a scatterplot, then walk through the four main reasons a correlation can appear without any causal link: confounding variables, reverse causation, selection bias, and plain old chance. From there, the guide explains how randomized controlled trials solve the causation problem — and why randomization is such a powerful tool. A final section gives you a practical checklist for evaluating causal claims you encounter in textbooks, news articles, and exam questions.

Designed for high school and early college students studying statistics, AP courses, or any class that involves reading research, this guide is short by design and stripped to essentials. No filler, no detours — just the core logic you need to stop nodding along to bad reasoning and start asking the right questions.

If you can tell a confounder from a cause, you're already ahead of most adults. Grab your copy and start thinking more clearly about data.

What you'll learn
  • Define correlation precisely and interpret the correlation coefficient r
  • Distinguish correlation from causation and identify common ways the two get confused
  • Recognize confounding variables, reverse causation, selection bias, and chance as alternative explanations
  • Explain why randomized controlled experiments can establish causation while observational studies usually cannot
  • Apply Bradford Hill–style reasoning to evaluate causal claims in news, science, and everyday arguments
What's inside
  1. 1. What Correlation Actually Means
    Defines correlation, introduces the correlation coefficient r, and shows how to read scatterplots.
  2. 2. Why Correlation Doesn't Imply Causation
    Lays out the core logical gap and walks through famous spurious correlations to build intuition.
  3. 3. The Four Suspects: Confounding, Reverse Causation, Selection Bias, and Chance
    Names and unpacks the four main reasons a correlation can appear without a direct causal link.
  4. 4. How Scientists Establish Causation
    Explains randomized controlled trials, control groups, and why randomization neutralizes confounders.
  5. 5. A Toolkit for Evaluating Causal Claims
    Gives the reader practical questions and Bradford Hill–style criteria to apply to headlines and studies.
Published by Solid State Press · June 2026
Correlation vs. Causation cover
TLDR STUDY GUIDES

Correlation vs. Causation

Confounders, Spurious Patterns, and Why Randomized Trials Win — A TLDR Primer
Solid State Press

Contents

  1. 1 What Correlation Actually Means
  2. 2 Why Correlation Doesn't Imply Causation
  3. 3 The Four Suspects: Confounding, Reverse Causation, Selection Bias, and Chance
  4. 4 How Scientists Establish Causation
  5. 5 A Toolkit for Evaluating Causal Claims
Chapter 1

What Correlation Actually Means

Two variables are correlated when they tend to move together — when one goes up, the other tends to go up (or down) in a predictable way. That "tends to" is doing real work. Correlation is a statement about a statistical pattern across many observations, not a guarantee about any single one.

The cleanest way to see a correlation is a scatterplot: a graph where each dot represents one observation, the horizontal axis ($x$) shows one variable, and the vertical axis ($y$) shows another. If you plot the heights and weights of 200 people, taller people will tend to cluster toward the upper right and shorter people toward the lower left. That diagonal drift from lower-left to upper-right is positive correlation. If you plotted hours of TV watched per day against GPA, the drift would likely run from upper-left to lower-right — more TV, lower GPA on average. That's negative correlation.

Direction and Strength

Every correlation has two properties worth separating: direction and strength.

Direction is simple: positive correlation means both variables rise together; negative correlation (also called an inverse correlation) means one rises as the other falls. Neither direction is inherently "better" — they're just patterns.

Strength is about how tightly the data cluster around that pattern. A strong correlation means the dots hug a line closely; a weak one means they're scattered loosely and the trend is barely visible. You can have a weak positive or a strong negative — direction and strength are independent.

To capture both in a single number, statisticians use the Pearson correlation coefficient, written $r$. It always falls between $-1$ and $+1$:

$r \in [-1,\ 1]$

  • $r = +1$: perfect positive linear relationship — every dot sits exactly on an upward line.
  • $r = -1$: perfect negative linear relationship — every dot sits exactly on a downward line.
  • $r = 0$: no linear relationship — knowing one variable tells you nothing about the other.

Real data almost never produce $r = \pm 1$. In practice, social-science correlations above $|r| = 0.5$ are considered fairly strong; correlations around $|r| = 0.2$–$0.3$ show up constantly and are considered moderate. The sign tells you direction; the absolute value tells you strength.

About This Book

If you are a high school student who needs correlation vs. causation explained for students in plain English — whether for AP Statistics, AP Biology, a dual-enrollment research methods course, or a standardized exam — this book is for you. It is equally useful for college freshmen in introductory statistics, data science, or psychology who keep hearing "correlation does not imply causation" without ever getting a satisfying explanation of why.

The book covers the core ideas: what a correlation coefficient actually measures, how confounding variables in statistics can produce misleading conclusions, the problem of spurious correlations, reverse causation, selection bias, and how randomized controlled trials work as the gold standard for establishing causation. Each section builds toward understanding causation in data science and research contexts, giving you the statistics critical thinking skills to evaluate research studies across any subject. Short by design, with no filler.

Read straight through, follow the worked examples carefully, then attempt the practice problems at the end to test what you have learned.

Keep reading

You've read the first half of Chapter 1. The complete book covers 5 chapters in roughly fifteen pages — readable in one sitting.

Continue reading on Amazon