Ramin Anushiravani
This repo contains a value-base method for the banana navigation challenge on Mac. You can train the agent by running "python env.py 1" and you can see the smart agent navigate the banana field by running "python env.py 0".
The same code is also included in "Navigation.ipynb", I had problems with running the environment locally, so I run the python script "env.py" instead. I included the report of the notebook "report.html" as well.
Install dependencies, pip -r requirements.txt
Inside "util/" there are three scripts:
- agent.py : Contains the learning algorithm which implements a dual Deep-QN.
- qn.py : Contain the Q-Network which implements a double Deep-QN.
- replay.py : Contains the replay buffer. I didn't make any improvements to this code.
The final model is saved in the "artifat/checkpoint.pth" along with the plot of all rewards over all episodes "artifact/scores.png" which exceeds 14.
You can see a short video of the smart agent looking for yellow bananas "artifact/smart_banana.mov"
Another possible improvement to this balue-based method would be using a "Prioritized experience replay" which should also help smooth the reward, as you can see it's very noisy. Rainbow DQN or a deeper Q-network would also help. Running it for more episodes or generating more data.
