[RLLib] Fix RNNSAC example failing on CI + fixes for recurrent models for other Q Learning Algos by ArturNiederfahrenhorst · Pull Request #24923 · ray-project/ray

ArturNiederfahrenhorst · 2022-05-18T16:17:21Z

Why are these changes needed?

RNNSAC samples sequences and it's example script is now using the new replay buffer api which has caused it to fail, since by default it would sample by timestep. This fix makes recurrent policies for Q learning possible.
Also changes to the original example script to be close to learning test (RNNSAC has no learning test), which usually want to get to at least 150 on CartPole-v0.

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

gjoliver

like the new sample(n) api.

…ngth

sven1977 · 2022-05-20T12:39:13Z

Cool, thanks for this fix @ArturNiederfahrenhorst . Could you make sure all tests are passing?

rllib/algorithms/sac/rnnsac.py

rllib/tuned_examples/dqn/stateless-cartpole-r2d2-fake-gpus.yaml

sven1977 · 2022-05-22T18:38:30Z

rllib/execution/replay_ops.py

 def Replay(
    *,
    local_buffer: Optional[MultiAgentReplayBuffer] = None,
+    num_items_to_replay: int = 1,


Wait, do we even still need this class?
Hmm, I guess for users that still use the execution_plan_api. :(

Yes, I fumbled around with this a little but figured that going without num_tems_to_replay we would not support users using execution plan api and replay buffer api. So it has to be there.

sven1977 · 2022-05-22T18:39:57Z

rllib/tuned_examples/dqn/stateless-cartpole-r2d2.yaml

    stop:
-        episode_reward_mean: 150
-        timesteps_total: 1000000
+        episode_reward_mean: 100


Actually, increasing num_workers to e.g. 3 or 4 also may work here to speed up learning.

Again, just making 100% sure we didn't introduce a bigger learning regression.

rllib/utils/replay_buffers/multi_agent_mixin_replay_buffer.py

rllib/utils/replay_buffers/multi_agent_prioritized_replay_buffer.py

rllib/utils/replay_buffers/multi_agent_replay_buffer.py

sven1977 · 2022-05-22T18:42:20Z

rllib/utils/replay_buffers/replay_buffer.py

        """
-        assert batch.count > 0, batch
-        warn_replay_capacity(item=batch, num_items=self.capacity / batch.count)
+        if not batch.count > 0:


Cool. Agree! Should be resilient to len==0 batches, which have their justification in some cases.

rllib/utils/replay_buffers/replay_buffer.py

sven1977 · 2022-05-22T18:44:45Z

rllib/utils/replay_buffers/utils.py

        "prioritized_replay_beta",
        "prioritized_replay_eps",
        "no_local_replay_buffer",
-        "replay_batch_size",


Can we leave a warning here if users are using this setting? Happy to make this an error, but we should produce some meaningful explanation as to what happened to this setting.

It's been a warning only so far, but per your request I've made it an error.

sven1977

Looks great! Thanks for these fixes and enhancements @ArturNiederfahrenhorst . The API is getting more and more polished now.
I do have a few questions and requests before we can merge this. :)
Thanks!

…r2d2 use training intensity

ArturNiederfahrenhorst added 4 commits May 18, 2022 17:58

initial fix

e9582fa

lint

8b4fdfa

lint

34d56dd

lint

5c63ac8

ArturNiederfahrenhorst requested review from avnishn, gjoliver, kouroshHakha, maxpumperla, smorad and sven1977 as code owners May 18, 2022 16:17

remove replay_batch_size related code

c0e4ecf

gjoliver approved these changes May 18, 2022

View reviewed changes

ArturNiederfahrenhorst added 16 commits May 19, 2022 12:13

make experiment large (again)

eae3335

Merge branch 'master' into rnnsacfix

8a77968

replay buffer does not assert non-empty sample batch

6ce1831

Merge branch 'rnnsacfix' into replaybuffersequencingfixes

7b3f86c

adds warning for replay_sequence_length misconfiguration

7560d9f

wip

8904e10

Merge branch 'rnnsacfix' into replaybuffersequencingfixes

20cea6a

better warnings for sequencing, better sequencing

0bd01a6

fixes replay ops conflict with new buffer api

c614737

fix r2d2 test training batch size being unequal to replay sequence le…

e085cdb

…ngth

wip

06ae501

fix mixin buffer test

702d5e1

Merge branch 'master' into rnnsacfix

e2e95a4

Merge branch 'master' into replaybuffersequencingfixes

0163136

make underlying buffers use fragments

7a12f8b

fixes r2d2 trainer test

b946f89

merge master

7e9b67d