Skip to content

Conversation

@azrael417
Copy link
Collaborator

@azrael417 azrael417 commented Aug 16, 2024

This PR fixes a bug in SAC, where instead of alpha was read from the file and used as entropy coefficient, rho was used instead which made training unstable since alpha should be around 0.1 and rho is close to 1.0. In addition to this fix, this PR adds trainable entropy coefficient as it can improve stability of training.

Furthermore, this PR adds a test suite for many parts of the RL stack and in addition a full test for each algorithm (DDPG, TD3, SAC, PPO) on simple tests environments. Those tests run for some time but it is helpful to verify that the environment works. The only test which is currently failing is the one for the state dependent action reward for DDPG, and only for DDPG. This could have something to do with the fact that DDPG sometimes overfits rapidly on simple environments, so I would not assume that the DDPG implementation by itself is wrong. In any case, since TD3 is basically always superior to DDPG and it seems to work in all cases, I recommend using that instead.

In order for the tests to work, I included a gtest dependency and also added some simple RL related models as base models to torch fort. Those include a sac-policy and an actor-critic model, where actor and critic share the same feature extractor.

@azrael417 azrael417 requested a review from romerojosh August 16, 2024 07:23
Copy link
Collaborator

@romerojosh romerojosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added mostly nitpick comments here. Otherwise changes LGTM. Thanks for adding some tests!

@azrael417
Copy link
Collaborator Author

I have addressed all your comments and agree to all of them, thanks for the careful review.

Copy link
Collaborator

@romerojosh romerojosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants