-
Notifications
You must be signed in to change notification settings - Fork 31
Tkurth/rl tests #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tkurth/rl tests #15
Conversation
… internally from the RL system and the user does not have to bother with it
romerojosh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added mostly nitpick comments here. Otherwise changes LGTM. Thanks for adding some tests!
…, enabled building examples by default
|
I have addressed all your comments and agree to all of them, thanks for the careful review. |
romerojosh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR fixes a bug in SAC, where instead of alpha was read from the file and used as entropy coefficient, rho was used instead which made training unstable since alpha should be around 0.1 and rho is close to 1.0. In addition to this fix, this PR adds trainable entropy coefficient as it can improve stability of training.
Furthermore, this PR adds a test suite for many parts of the RL stack and in addition a full test for each algorithm (DDPG, TD3, SAC, PPO) on simple tests environments. Those tests run for some time but it is helpful to verify that the environment works. The only test which is currently failing is the one for the state dependent action reward for DDPG, and only for DDPG. This could have something to do with the fact that DDPG sometimes overfits rapidly on simple environments, so I would not assume that the DDPG implementation by itself is wrong. In any case, since TD3 is basically always superior to DDPG and it seems to work in all cases, I recommend using that instead.
In order for the tests to work, I included a gtest dependency and also added some simple RL related models as base models to torch fort. Those include a sac-policy and an actor-critic model, where actor and critic share the same feature extractor.