I invite you to upgrade to a paid subscription. Paid subscribers have told me they appreciate me creating the programming projects and would like to see more of them in the future. Coding Challenge #119 - AI Pong PlayerThis challenge is to build your own AI pong player with reinforcement learning.
Hi, this is John with this week’s Coding Challenge. 🙏 Thank you for being a subscriber, I’m honoured to have you as a reader. 🎉 If there is a Coding Challenge you’d like to see, please let me know by replying to this email📧 Coding Challenge #119 - AI Pong PlayerThis challenge is to build your own reinforcement learning agent that learns to play Atari Pong directly from the pixels on the screen. Pong is one of the oldest video games ever made, and it has a special place in the history of artificial intelligence. In 2013, DeepMind used Pong (and a handful of other Atari games) to show that a single algorithm could learn to play games at a human level, just by watching the screen and being told the score. That work kicked off the modern era of deep reinforcement learning. Pong is the friendliest of the Atari games to start with, the rules are simple, the screen is mostly empty, and the agent only needs to choose between moving the paddle up or down. That makes it the perfect first project for going from “I’ve read about reinforcement learning” to “I’ve actually trained an agent from raw pixels and watched it learn to win.” Building this project will introduce you to ideas you’ll come across again and again throughout your career: turning observations into features, sampling from a stochastic policy, computing returns, reducing variance, and the policy gradient itself. If You Enjoy Coding Challenges Here Are Four Ways You Can Help Support It
The Challenge - Building Your Own AI Pong PlayerIn this challenge you’re going to build a policy gradient agent that learns to play Pong from raw pixels using the REINFORCE algorithm. Your agent will start out playing randomly, lose 21-0 over and over, and then, if you’ve wired everything up correctly, gradually start scoring points, then winning rallies, and eventually beating the built-in opponent more often than it loses. This challenge is a good fit if you’ve written some Python before, are comfortable with NumPy, and have at least a passing acquaintance with neural networks. You don’t need to be an reinforcement learning expert. REINFORCE is one of the simplest deep reinforcement learning algorithms there is, and the version we’ll build here is famously the one Andrej Karpathy described in his “Pong from Pixels“ blog post. A small policy network, no value function, no replay buffer, no target network. Just a policy, some episodes, and a gradient. A word of warning before you start: training from pixels is slow. Even on a sensible setup, you should expect a few hours of CPU training before the agent really starts to win, and you may want to leave it running overnight. That’s part of the experience, watching the score curve crawl upwards over many thousands of episodes is genuinely exciting once you’ve built the thing yourself. Step ZeroIn this introductory step you’re going to set your environment up ready to begin developing and testing your solution. Python is the natural choice for this challenge because the reinforcement learning ecosystem lives there, but the ideas transfer cleanly to any language with a deep learning framework. You’ll need three things installed: Gymnasium (the maintained successor to OpenAI Gym), the Atari environments via ALE-py, and a deep learning framework, PyTorch, TensorFlow, or JAX are all fine, pick whichever you’d like to practise with. You’ll also want NumPy, Matplotlib, and probably Before you write any code, spend a few minutes playing Pong yourself if you’ve never seen it. Notice that the only thing that matters is your paddle’s vertical position, the ball’s position, and the ball’s direction of travel. Your agent will need to work this out from the screen, with no idea what any of those concepts mean. Step 1In this step your goal is to get a Pong environment running and have a “random agent” play a full game so you can see the data flowing. Create the Have a look at the action space ( The observation is a Testing: Run your random agent for one episode and confirm that:
Step 2In this step your goal is to turn the raw There are four things to do here, and they should all happen inside a single function that takes a raw frame and returns the preprocessed observation:
A static frame doesn’t tell your agent anything about which way the ball is moving, and direction is the most important thing in Pong. The classic trick - and the one used in the original Karpathy write-up - is to feed in the difference between the current preprocessed frame and the previous one. Pixels that didn’t change become zero, and pixels that did change show up as positive or negative values. The ball appears as a little bright streak pointing the way it’s travelling. Add this difference computation on top of your preprocessing function. Testing: Save a few raw frames and their preprocessed versions to disk and look at them with an image viewer. The preprocessed frame should clearly show the two paddles and the ball as bright pixels on a dark background, with nothing else. Display a frame difference - it should be almost entirely black except for the ball and the moving paddle. A good sanity check: the output of your preprocessing function should be a 1D NumPy array of length Step 3In this step your goal is to build the neural network that maps a preprocessed observation to a probability distribution over actions, and use it to pick actions. The policy network Karparthy describes is a tiny network - a single hidden layer with about 200 ReLU units, then an output layer that produces one number per action. Pass that output through a softmax (or a sigmoid if you’ve reduced things to a single output for “probability of moving up”) and you have a probability distribution. To pick an action, sample from that distribution rather than taking the most likely one. Wire up an “act” function that takes a preprocessed frame, runs it through the network, and returns a sampled action plus whatever extra information you’ll need later for training (typically the log-probability of the action that was taken, or the network output itself). Once that’s working, run another full episode - this time with your untrained network choosing the actions instead of Testing: Run a single episode with the untrained policy. The total reward should be in the same Step 4In this step your goal is to collect a complete episode of experience and turn the rewards into the returns that will drive learning. For each step in an episode, store three things: the observation that was fed in, the action that was taken (or its log-probability), and the reward that came back from the environment. At the end of the episode you’ll have three lists, all the same length. Now compute the discounted return for each step. The return at step Once you have the per-step returns, normalise them across the whole episode by subtracting the mean and dividing by the standard deviation. Normalised returns put roughly half the actions on the “this was better than average” side and half on the “this was worse” side, which gives the policy gradient a much more stable signal. Testing: Run an episode, compute the returns, and have a look:
A nice sanity print is to show, for the last twenty steps of an episode, the reward at that step and the discounted return - you’ll see the return building up smoothly and then jumping when a point is scored. Step 5In this step your goal is to actually update the policy in the direction that makes good actions more likely and bad actions less likely. This is the heart of the whole challenge. The REINFORCE update is delightfully simple. For each step in your collected rollout, compute the loss as A single episode’s worth of gradient is very noisy. Batch up the gradients over multiple episodes - ten is a sensible starting point - before you actually call the optimiser. You can either accumulate gradients across episodes or concatenate the per-step data and do one bigger update; both work. Now wrap the whole thing in a training loop that runs for thousands of episodes, prints a running average of the score after each one, and just leaves it going. Be patient. For the first few hundred episodes the score will hover around Testing: This is the step where things either work or they very visibly don’t. A few things to check as training progresses:
If you’d like a stronger signal that things are alive, log the average length of an episode (in steps). Random play produces short episodes; an agent that’s learning to actually rally produces longer ones, well before the score itself starts to go up. Step 6In this step your goal is to make your training run something you can show off, not just a console of numbers scrolling by. There are four things to add:
Testing: Once you have all four bits in place:
Going FurtherHere are some ideas to take your Pong agent further:
P.S. If You Enjoy Coding Challenges Here Are Four Ways You Can Help Support It
Share Your Solutions!If you think your solution is an example other developers can learn from please share it, put it on GitHub, GitLab or elsewhere. Then let me know via Bluesky or LinkedIn or just post about it there and tag me. Alternately please add a link to it in the Coding Challenges Shared Solutions Github repo Request for FeedbackI’m writing these challenges to help you develop your skills as a software engineer based on how I’ve approached my own personal learning and development. What works for me, might not be the best way for you - so if you have suggestions for how I can make these challenges more useful to you and others, please get in touch and let me know. All feedback is greatly appreciated. You can reach me on Bluesky, LinkedIn or through SubStack Thanks and happy coding! John Invite your friends and earn rewards
If you enjoy Coding Challenges, share it with your friends and earn rewards when they subscribe.
|
#6636 WWE 2K26 v1.06 + 6 DLCs [Monkey Repack] Genres/Tags: Arcade, Fighting, Sports, 3D Companies: 2K Games, Visual Concepts Languages: ENG/MULTI6 Original Size: 129.2 GB Repack Size: 107.8 GB Download Mirrors (Direct Links) .dlinks {margi… Read on blog or Reader FitGirl Repacks Read on blog or Reader WWE 2K26, v1.06 + 6 DLCs [Monkey Repack] By FitGirl on 28/03/2026 # 66 3 6 WWE 2K2 6 v1.0 6 + 6 DLCs [Monkey ...

Comments
Post a Comment