cp: Support DAPO dynamic sampling and reward shaping (#602) into r0.4.0#1458
cp: Support DAPO dynamic sampling and reward shaping (#602) into r0.4.0#1458
Support DAPO dynamic sampling and reward shaping (#602) into r0.4.0#1458Conversation
feat: Support DAPO dynamic sampling and reward shaping (#602) into r0.4.0Support DAPO dynamic sampling and reward shaping (#602) into r0.4.0
7bd853a to
4b1f07f
Compare
ℹ️ File Consistency CheckCheck based on commit: 4b1f07f (PR #1458 from ✅ DTensor Policy Worker Synchronization CheckBoth DTensor policy worker files were modified in this PR:
Please ensure that the changes are consistent between both files where applicable. This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
Signed-off-by: Dheeraj Peri <peri.dheeraj@gmail.com> Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: ashors1 <ashors@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
4b1f07f to
d7a3586
Compare
ℹ️ File Consistency CheckCheck based on commit: d7a3586 (PR #1458 from ✅ DTensor Policy Worker Synchronization CheckBoth DTensor policy worker files were modified in this PR:
Please ensure that the changes are consistent between both files where applicable. This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
What does this PR do ?
cp:
feat: Support DAPO dynamic sampling and reward shaping(#602) intor0.4.0Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
Release Notes
New Features
Documentation
Configuration