Deep Q-Learning playground built around custom game environments and MLX-based DQN variants.
Current environments:
Flappy BirdBreakout
| Flappy Bird | Breakout |
|---|---|
![]() |
![]() |
Current training setup:
DQNDoubleDQNPrioritized Experience ReplayParallelRunnerwith distributed actor/learner layout
This project is written for Python 3.10 and uses MLX, so it is intended for Apple Silicon machines.
Install dependencies:
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install mlx
pip install -r requirements.txtrequirements.txt currently contains the non-MLX runtime dependencies:
pygamenumpy
For video recording during test:
opencv-python(pip install opencv-python)
Train:
python -m experiments.breakout.cnn_dqnTest latest checkpoint:
python -m experiments.breakout.cnn_dqn --testTest best checkpoint (by Avg100):
python -m experiments.breakout.cnn_dqn --test --bestTest best single-episode score checkpoint:
python -m experiments.breakout.cnn_dqn --test --best-scoreTest with full 5-life game (no terminal on life loss):
python -m experiments.breakout.cnn_dqn --test --best --full-gameTest with epsilon-greedy evaluation (e.g. ε=0.05):
python -m experiments.breakout.cnn_dqn --test --best --epsilon=0.05Test without rendering (faster, for benchmarking):
python -m experiments.breakout.cnn_dqn --test --best --no-renderRun a fixed number of test episodes:
python -m experiments.breakout.cnn_dqn --test --best --episodes=100Flags can be combined freely:
python -m experiments.breakout.cnn_dqn --test --best --full-game --no-render --episodes=100 --epsilon=0.05State-vector DQN:
python -m experiments.flappy.dqn
python -m experiments.flappy.dqn --test
python -m experiments.flappy.dqn --test --bestState-vector Double DQN:
python -m experiments.flappy.double_dqn
python -m experiments.flappy.double_dqn --test
python -m experiments.flappy.double_dqn --test --bestRecord the best-scoring episode as a video to runs/:
python -m experiments.flappy.double_dqn --test --best --recordPixel-based CNN Double DQN:
python -m experiments.flappy.cnn_dqn
python -m experiments.flappy.cnn_dqn --test
python -m experiments.flappy.cnn_dqn --test --bestpython play.py --game breakout
python play.py --game flappySee each environment's README for controls.
Checkpoints are written under checkpoints/<experiment_name>/ as:
latest.npz/latest.jsonbest.npz/best.jsonbest_score.npz/best_score.json
For the parallel runner:
latestis always the most recent learner statebestis selected using the best rollingAvg100, not the best single episodebest_scoreis selected by the highest single-episode score
Pass --record during test to capture the best-scoring episode as an MP4:
python -m experiments.flappy.double_dqn --test --best --record
