MuZero

This project is a Python implementation of the MuZero algorithm created by Google Deepmind, designed to be used with the OpenAI Gym environment. It is built to be modular and extensible, allowing for easy integration with different Gymnasium environments and neural network configurations.

📋 Table of contents

MuZero

Check out our presentation video below for an in-depth overview:

Our MuZero in action, demonstrating performance on the Car Racing game in OpenAI Gym:

Overview

MuZero is an advanced model-based reinforcement learning algorithm that jointly learns a dynamics model, a value function, and a policy through self-play training. Unlike traditional systems like AlphaZero, MuZero does not require explicit knowledge of the true environment dynamics. Instead the model gets the last 32 frames as input and it learns its own abstract internal representation of the environment. This allows for using Monte Carlo Tree Search (MCTS) to plan ahead in latent space making it capable of sophisticated planning, resulting in superhuman performance across various complex environments.

As the model does not require the true environment dynamics, it can be applied to a much wider range of problems. This being very useful in real-world applications where the environment is almost never fully known or is too complex to model explicitly. However the lack of explicit environment model makes it much more difficult to train, as the model has to learn the dynamics of the environment from scratch and it will start of by scrambling the signals it receives from the environment.

We needed to improve the sampling efficiency of the model training on the data the model thinks is most surprising and what episodes that it did the best on. This is done by using a Prioritized Experience Replay which allows the model to focus on the most important experiences and learn from them more effectively. The model is trained on a large number of episodes, and it learns to prioritize the most important experiences based on their impact on the learning process.

But as the model needs lots of training data we scaled the training process by using a Distributed Prioritized Experience Replay. Where we had several instances of the program continuously generating training data in parallel. Each instance of the program runs its own environment and collects experiences, which are then stored in a shared experience replay buffer. While we had a parameter server that manages the model parameters and synchronizes them across all instances. With a trainer that samples from the shared experience replay buffer and updates the model parameters based on the sampled experiences.

Prerequisites

Ensure that git is installed on your machine. Download Git
Docker is used for the backend and database setup. Download Docker

Usage

To start, run the following command in the root directory of the project:

python main.py

Testing

To run the tests, run the following command in the root directory of the project:

pytest

To run the tests with coverage, run the following command in the root directory of the project:

coverage run -m pytest

To see the coverage report, run the following command in the root directory of the project:

coverage html -i

📖 Documentations

Developer Setup Guild

Contributors

_{Kristoffer Nohr Olaisen}

_{Olav Selnes Lorentzen}

_{Simon Sandvik Lee}

_{Sverre Nystad}

Name		Name	Last commit message	Last commit date
Latest commit History 379 Commits
.devcontainer		.devcontainer
.github		.github
data		data
docs		docs
src		src
tests		tests
visualization		visualization
.dockerignore		.dockerignore
.example.env		.example.env
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
dev.Dockerfile		dev.Dockerfile
docker-compose.yml		docker-compose.yml
main.py		main.py
mypy.ini		mypy.ini
playground.py		playground.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
train-agent.slurm		train-agent.slurm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MuZero

Overview

Prerequisites

Usage

Testing

📖 Documentations

Contributors

About

Uh oh!

Contributors 4

Uh oh!

Languages

License

SverreNystad/MuZero

Folders and files

Latest commit

History

Repository files navigation

MuZero

Overview

Prerequisites

Usage

Testing

📖 Documentations

Contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 4

Uh oh!

Languages