My solutions for Project 1 from Udacity's Deep Reinforcement Learning Nanodegree Program.
The goal of the Navigation Project is to train an agent to navigate (and collect bananas!) in a large, square world. A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. Thus, the goal of the agent is to collect as many yellow bananas as possible while avoiding blue bananas.
The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around the agent's forward direction. Given this information, the agent has to learn how to best select actions. Four discrete actions are available, corresponding to:
- [0] move forward.
- [1] move backward.
- [2] turn left.
- [3] turn right.
The task is episodic, and in order to solve the environment, the agent must get an average score of +13 over 100 consecutive episodes.
If you would like to run this code locally follow the instructions below.
- Set up your Python environment as described the dependencies section of the readme from the Deep Reinforcement Learning Nanodegree program.
- Clone this repository.
- Create a directory called "data" at the root of the cloned repository.
- Select the environment that matches your operating system from the list below:
- Place the file in the data folder you created above.
- Unzip (or decompress) the file.
This repo has four Jupyter notebooks containing increasingly more interesting solutions to the Unity ML Banana-Collector environment using extentions to Q-learning.
- Deep Q-Learning for Navigation
- Double Deep Q-Learning for Navigation
- Dueling Deep Q-Learning for Navigation
- Dueling Double Deep Q-Learning for Navigation
You can train an agent to solve the Navigation Environment by executing the cells in the corresponding notebooks.
To avoid repeating too much code I've also included the following three Python modules capturing code implemented in the first notebook but needed also in later notebooks:
- model.py: implements an actor policy model as a simple neural network.
- dqn-agent.py: defines an abstract RL agent for deep Q-learning which will be subclassed in the specific solutions.
- trainer.py: implements the method used to train the agents.