This project (in collaboration with Noa Menashe) is a solution for a Reinforcement Learning task for finding the optimal path from an initial state to a goal state (in a deterministic environment and deterministic actions). In this project, we created a value function matrix as well as a state-value-function and a policy, that were trained by several classic RL algorithms such as Dynamic Programming, Monte Carlo, Q-Learning, and SARSA. All of the algorithms were trained in a stochastic environment. They (excl. DP) also used an "Epsilon-decay" scheme and were tuned for different parameters to achieve optimal results.
danbogu/Reinforcement-Learning-Algorithms
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|