Conversation
RuofanKong
left a comment
There was a problem hiding this comment.
Just have one question. otherwise. looks good to me! Thank you @AdityaGudimella!
Edilmo
left a comment
There was a problem hiding this comment.
I just took a quick look, so maybe I missed important details, but my thought is the following:
- Is there a simplest way to have this flag available without duplicating all these code? (I think it's possible).
- Why do you want to turn off the prioritized replay in first place?
- Why is this reducing the sensitivity to episode horizon?
rllib/agents/sac/apex.py
Outdated
| # yapf: enable | ||
|
|
||
|
|
||
| class LocalVanillaReplayBuffer(LocalReplayBuffer): |
There was a problem hiding this comment.
What is the difference here with the LocalReplayBuffer beside not including the weights and the indices in the Batch?
There was a problem hiding this comment.
The three changes are
- PrioritizedReplayBuffer vs VanillaReplayBuffer
- SampleBatch containing different elements
- Stats not containing update_priorities time.
rllib/agents/sac/apex.py
Outdated
| "rewards": rewards, | ||
| "new_obs": obses_tp1, | ||
| "dones": dones, | ||
| # "weights": weights, |
There was a problem hiding this comment.
Why are we excluding the weights and the indices?
There was a problem hiding this comment.
What you want is to avoid the update_priorities method to perform its job, right?
There was a problem hiding this comment.
Yes. update_priorities should not do anything, but also, weights and batch_idxs are not returned by the vanilla replay buffer. These are only returned by the PrioritizedReplayBuffer.
@Edilmo First, Tuning "no-done-at-end" shouldn't be the right way to go. Two major reasons: Second, back to question Why turn-off prioritized replay buffer and why reducing the sensitivity to episode horizon? This high-level details may not give you full understanding, if you don't have deep understanding on the energy based RL algorithm, prioritized replay buffer, etc. So I listed all recommended papers to help deeply understand:
|
I think you totally misunderstood my comments. They are just about avoid code duplication. |
Why are these changes needed?
Fixes Bug in execution plan where setting prioritized_replay in config to False does not actually turn it off.
Related issue number
Checks
scripts/format.shto lint the changes in this PR.