SOLID STATE PRESS
← Back to catalog
Reinforcement Learning: From Atari to AlphaGo cover
Coming soon
Coming soon to Amazon
This title is in our publishing queue.
Browse available titles
Artificial Intelligence

Reinforcement Learning: From Atari to AlphaGo

A High School & College Primer on Learning by Trial, Error, and Reward

You've heard that AI taught itself to beat the world champion at Go, and that it learned to play Atari games better than humans — without being told the rules. But when you open a textbook or watch a lecture, the math hits fast and the intuition never arrives. This guide fixes that.

**TLDR: Reinforcement Learning** walks you through the core ideas — from the basic agent-environment loop to the Bellman equation, Q-learning, and the deep neural network breakthroughs behind DeepMind's Atari results and Google's AlphaGo — in plain language, with worked examples and concrete numbers. It's written for high school and early college students who want a genuine understanding, not just buzzwords.

In about 15 focused pages, you'll learn how an RL agent decides what to do, how it estimates future reward using the Bellman equation, why exploration vs. exploitation is a real tension (not just a talking point), how a convolutional neural network replaced a lookup table to make Atari playable from raw pixels, and how self-play and Monte Carlo Tree Search scaled those ideas to the game of Go. If you're looking for a machine learning study guide for college freshmen or a clean intro to AI and machine learning for high school, this is the shortest path to actually understanding how these systems work.

No calculus prerequisite. No fluff. Pick it up before a class, an exam, or a conversation you want to follow.

Grab your copy and get oriented today.

What you'll learn
  • Define agents, environments, states, actions, rewards, and policies, and explain how they fit together in the RL loop.
  • Use the Bellman equation and Q-learning to reason about value and optimal action in small problems.
  • Explain the exploration vs. exploitation tradeoff and standard strategies like epsilon-greedy.
  • Describe how Deep Q-Networks let agents learn directly from pixels in Atari games.
  • Outline how policy gradients, self-play, and Monte Carlo Tree Search combined to produce AlphaGo and AlphaZero.
What's inside
  1. 1. The RL Setup: Agents, Environments, and Rewards
    Introduces the core RL loop and vocabulary using a simple gridworld and a video game example.
  2. 2. Value, the Bellman Equation, and Q-Learning
    Develops state values, action values, discounting, and the tabular Q-learning update rule with a worked gridworld example.
  3. 3. Exploration vs. Exploitation
    Explains why an RL agent must sometimes act suboptimally to learn, using the multi-armed bandit and epsilon-greedy strategies.
  4. 4. Deep Q-Networks: Playing Atari from Pixels
    Shows how DeepMind's DQN replaced the Q-table with a neural network and learned to play Atari games end-to-end from raw frames.
  5. 5. Policy Gradients and Self-Play: The Road to AlphaGo
    Introduces policy-based methods, self-play, and Monte Carlo Tree Search, then walks through how AlphaGo and AlphaZero defeated top human players.
Published by Solid State Press
Reinforcement Learning: From Atari to AlphaGo cover
TLDR STUDY GUIDES

Reinforcement Learning: From Atari to AlphaGo

A High School & College Primer on Learning by Trial, Error, and Reward
Solid State Press

Who This Book Is For

If you are a high school student looking for an artificial intelligence primer that actually explains how machines learn, or a college freshman who just enrolled in an intro to AI and machine learning course and needs a foothold fast, this guide was written for you. It also works for AP Computer Science students, self-directed learners preparing for technical interviews, and parents helping a teenager navigate a school AI project.

The book covers reinforcement learning explained for beginners — from agents and reward signals through the Bellman equation explained simply, Q-learning, and a deep Q-network tutorial built around Atari games. It closes with a clear, how AlphaGo works simple explanation rooted in policy gradients and self-play. Roughly 15 focused pages, no filler.

Think of this as a machine learning study guide for college freshmen who need the core ideas locked in before the midterm hits. Read straight through once, work every numbered example, then tackle the problem set at the end to confirm you have it.

Contents

  1. 1 The RL Setup: Agents, Environments, and Rewards
  2. 2 Value, the Bellman Equation, and Q-Learning
  3. 3 Exploration vs. Exploitation
  4. 4 Deep Q-Networks: Playing Atari from Pixels
  5. 5 Policy Gradients and Self-Play: The Road to AlphaGo
Chapter 1

The RL Setup: Agents, Environments, and Rewards

Imagine you are a dog learning a new trick. You do something — sit, spin, bark — and sometimes you get a treat. Over thousands of repetitions, you figure out which actions in which situations earn the most treats. You are not given a rule book. You learn entirely from the feedback of the environment. That is the core idea behind reinforcement learning (RL).

RL is a branch of machine learning in which a software program learns to make decisions by interacting with its surroundings and collecting rewards or penalties. Unlike supervised learning, where a model is trained on labeled examples, an RL agent is never told directly what the right action is. It has to discover good behavior by trying things and observing what happens.

The Main Players

Every RL problem has two key pieces: an agent and an environment.

The agent is the learner and decision-maker — the dog, the video-game character, the robot arm. The environment is everything the agent interacts with: the room, the game, the physical world. The agent cannot control the environment directly; it can only choose actions and observe what comes back.

What comes back is twofold. First, the environment returns a state (sometimes called an observation), which is a description of the current situation. In a chess game, the state is the board position. In a racing game, the state might be the pixel image on screen. Second, the environment returns a reward — a number that signals how good or bad the last action was. Rewards can be positive (scored a point), negative (crashed the car), or zero (nothing interesting happened yet).

The agent's goal is simple to state and hard to achieve: collect as much total reward as possible over time.

The RL Loop

These pieces connect in a cycle that repeats continuously:

  1. The agent observes the current state $s$.
  2. The agent chooses an action $a$.
  3. The environment transitions to a new state $s'$ and hands the agent a reward $r$.
  4. Repeat.

This cycle is the RL loop. Everything in reinforcement learning — every equation, every algorithm — is an attempt to make an agent loop through this cycle more intelligently.

A finite run of this loop is called an episode. In a video game, one episode is one playthrough from start to game-over. In a maze, one episode is one attempt to find the exit. Some problems (like controlling a power grid) have no natural endpoint; those are called continuing tasks. Most introductory examples use episodic tasks, and so will this book.

Policies: The Agent's Strategy

Keep reading

You've read the first half of Chapter 1. The complete book covers 5 chapters in roughly fifteen pages — readable in one sitting.

Coming soon to Amazon