Inspired by AlphaGo’s Move 37 — learn how agents explore, exploit, and win

Reinforcement Learning Made Simple: Build a Q-Learning Agent in Python

Data Science Espresso by Sarah

Jun 01, 2025

In 2016, Go world champion Lee Sedol faced an opponent made not of flesh and blood — but of lines of code.

It soon became clear: The human would lose.

In the end, Lee Sedol lost 4:1.

That was the moment the world realized: This AI doesn’t play like a human — it plays better.

But what does that kind of learning actually look like?

Picture from Elena Popova on Unsplash with the Go game — Picture from Elena Popova on Unsplash

Last week, I rewatched the AlphaGo documentary — and found it just as fascinating as the first time.

Like many others, I often don’t have the time to dive deep into topics like reinforcement learning.

So I set out to build a project that’s simple enough to start — and real enough to matter.

Picture from AlphaGo – The Movie | Full award-winning documentary on YouTube

When you’re working a regular 8-to-5 job, the hardest part isn’t curiosity — it’s time.

Time to immerse yourself in a new topic.
Time to finally dig into something you’ve been meaning to explore.

And when you do find that time?

You need the right entry point.
Something practical.
Something you can tackle in an afternoon or evening.

At least, that’s how it is for me.

In this article, I walk you through a step-by-step project to build a simple Q-learning agent using Tic Tac Toe.

Grab your ☕ and the code and let’s dive in!

👉 Full article Reinforcement Learning Made Simple: Build a Q-Learning Agent in Python

👉 You can find the code in this GitHub repository.

👉 Great introduction book about RL: Reinforcement Learning: An Introduction by Richard S. Sutton & Andre G. Barto

☕ P.S. Yesterday, the 100th person signed up for Data Science Espresso. Whether you’ve been here from the start or just joined this week: I just wanted to say a big thank you.

Have a great Sunday, Sarah💕

New here? Or curious what other readers love?

Because you’re subscribed to Data Science Espresso, you get free access to my Medium articles through the Medium Friend Links. No paywall, no limits — just coffee-fueled content ☕️.

Q-Learning Agent with Python and Tic Tac Toe. — Own visualization — Illustrations from unDraw.com.

Share Data Science Espresso by Sarah Lea

Data Science Espresso by Sarah Lea

Inspired by AlphaGo’s Move 37 — learn how agents explore, exploit, and win

Reinforcement Learning Made Simple: Build a Q-Learning Agent in Python

New here? Or curious what other readers love?

Discussion about this post