Is your feature request related to a problem? Please describe.
Kudos for getting fully async rollout generation working! 🚀
Now that actors sample trajectories in parallel, the learner still waits for an entire batch before it can start back-prop. This leaves GPUs idle and stretches overall wall-clock training.
Describe the solution you’d like
Add a simple replay buffer between trajectory collection and training so the two stages run concurrently.
Actors call push(traj) as soon as a rollout finishes. there's even a ray Queue that can be used or this.
Is your feature request related to a problem? Please describe.
Kudos for getting fully async rollout generation working! 🚀
Now that actors sample trajectories in parallel, the learner still waits for an entire batch before it can start back-prop. This leaves GPUs idle and stretches overall wall-clock training.
Describe the solution you’d like
Add a simple replay buffer between trajectory collection and training so the two stages run concurrently.
Actors call push(traj) as soon as a rollout finishes. there's even a ray Queue that can be used or this.