[RLLib] Fix RNNSAC example failing on CI + fixes for recurrent models for other Q Learning Algos#24923
Conversation
|
Cool, thanks for this fix @ArturNiederfahrenhorst . Could you make sure all tests are passing? |
| def Replay( | ||
| *, | ||
| local_buffer: Optional[MultiAgentReplayBuffer] = None, | ||
| num_items_to_replay: int = 1, |
There was a problem hiding this comment.
Wait, do we even still need this class?
Hmm, I guess for users that still use the execution_plan_api. :(
There was a problem hiding this comment.
Yes, I fumbled around with this a little but figured that going without num_tems_to_replay we would not support users using execution plan api and replay buffer api. So it has to be there.
| stop: | ||
| episode_reward_mean: 150 | ||
| timesteps_total: 1000000 | ||
| episode_reward_mean: 100 |
There was a problem hiding this comment.
Actually, increasing num_workers to e.g. 3 or 4 also may work here to speed up learning.
Again, just making 100% sure we didn't introduce a bigger learning regression.
| """ | ||
| assert batch.count > 0, batch | ||
| warn_replay_capacity(item=batch, num_items=self.capacity / batch.count) | ||
| if not batch.count > 0: |
There was a problem hiding this comment.
Cool. Agree! Should be resilient to len==0 batches, which have their justification in some cases.
| "prioritized_replay_beta", | ||
| "prioritized_replay_eps", | ||
| "no_local_replay_buffer", | ||
| "replay_batch_size", |
There was a problem hiding this comment.
Can we leave a warning here if users are using this setting? Happy to make this an error, but we should produce some meaningful explanation as to what happened to this setting.
There was a problem hiding this comment.
It's been a warning only so far, but per your request I've made it an error.
sven1977
left a comment
There was a problem hiding this comment.
Looks great! Thanks for these fixes and enhancements @ArturNiederfahrenhorst . The API is getting more and more polished now.
I do have a few questions and requests before we can merge this. :)
Thanks!
…r2d2 use training intensity
Why are these changes needed?
RNNSAC samples sequences and it's example script is now using the new replay buffer api which has caused it to fail, since by default it would sample by timestep. This fix makes recurrent policies for Q learning possible.
Also changes to the original example script to be close to learning test (RNNSAC has no learning test), which usually want to get to at least 150 on CartPole-v0.
Checks
scripts/format.shto lint the changes in this PR.