Skip to content

[RLLib] Fix RNNSAC example failing on CI + fixes for recurrent models for other Q Learning Algos#24923

Merged
sven1977 merged 42 commits intoray-project:masterfrom
ArturNiederfahrenhorst:rnnsacfix
May 24, 2022
Merged

[RLLib] Fix RNNSAC example failing on CI + fixes for recurrent models for other Q Learning Algos#24923
sven1977 merged 42 commits intoray-project:masterfrom
ArturNiederfahrenhorst:rnnsacfix

Conversation

@ArturNiederfahrenhorst
Copy link
Contributor

Why are these changes needed?

RNNSAC samples sequences and it's example script is now using the new replay buffer api which has caused it to fail, since by default it would sample by timestep. This fix makes recurrent policies for Q learning possible.
Also changes to the original example script to be close to learning test (RNNSAC has no learning test), which usually want to get to at least 150 on CartPole-v0.

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like the new sample(n) api.

@sven1977
Copy link
Contributor

sven1977 commented May 20, 2022

Cool, thanks for this fix @ArturNiederfahrenhorst . Could you make sure all tests are passing?

def Replay(
*,
local_buffer: Optional[MultiAgentReplayBuffer] = None,
num_items_to_replay: int = 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, do we even still need this class?
Hmm, I guess for users that still use the execution_plan_api. :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I fumbled around with this a little but figured that going without num_tems_to_replay we would not support users using execution plan api and replay buffer api. So it has to be there.

stop:
episode_reward_mean: 150
timesteps_total: 1000000
episode_reward_mean: 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, increasing num_workers to e.g. 3 or 4 also may work here to speed up learning.

Again, just making 100% sure we didn't introduce a bigger learning regression.

"""
assert batch.count > 0, batch
warn_replay_capacity(item=batch, num_items=self.capacity / batch.count)
if not batch.count > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Agree! Should be resilient to len==0 batches, which have their justification in some cases.

"prioritized_replay_beta",
"prioritized_replay_eps",
"no_local_replay_buffer",
"replay_batch_size",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave a warning here if users are using this setting? Happy to make this an error, but we should produce some meaningful explanation as to what happened to this setting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a warning only so far, but per your request I've made it an error.

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for these fixes and enhancements @ArturNiederfahrenhorst . The API is getting more and more polished now.
I do have a few questions and requests before we can merge this. :)
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants