Skip to content

cp: Support DAPO dynamic sampling and reward shaping (#602) into r0.4.0#1458

Merged
terrykong merged 1 commit intor0.4.0from
chtruong/cp-602-r0.4.0
Nov 1, 2025
Merged

cp: Support DAPO dynamic sampling and reward shaping (#602) into r0.4.0#1458
terrykong merged 1 commit intor0.4.0from
chtruong/cp-602-r0.4.0

Conversation

@chtruong814
Copy link
Copy Markdown
Contributor

@chtruong814 chtruong814 commented Oct 31, 2025

What does this PR do ?

cp: feat: Support DAPO dynamic sampling and reward shaping (#602) into r0.4.0

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

Release Notes

  • New Features

    • Added DAPO (Dynamic Sampling Policy Optimization) with dynamic sampling, reward scaling, and reward shaping capabilities for improved policy training.
    • Introduced multiple math verification backends for flexible reward evaluation.
    • Added HuggingFace model configuration override support for customized model loading.
  • Documentation

    • New comprehensive DAPO guide with quickstart, configuration examples, and algorithm details.
  • Configuration

    • New example configurations for DAPO training with Qwen2.5-7B model and updated GRPO configs.

@chtruong814 chtruong814 requested review from a team as code owners October 31, 2025 23:10
@github-actions github-actions Bot added Documentation Improvements or additions to documentation CI Relating to CI labels Oct 31, 2025
@chtruong814 chtruong814 changed the title cp: feat: Support DAPO dynamic sampling and reward shaping (#602) into r0.4.0 cp: Support DAPO dynamic sampling and reward shaping (#602) into r0.4.0 Oct 31, 2025
@terrykong terrykong force-pushed the chtruong/cp-602-r0.4.0 branch from 7bd853a to 4b1f07f Compare November 1, 2025 06:17
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Nov 1, 2025

ℹ️ File Consistency Check

Check based on commit: 4b1f07f (PR #1458 from chtruong/cp-602-r0.4.0)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

Signed-off-by: Dheeraj Peri <peri.dheeraj@gmail.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Co-authored-by: ashors1 <ashors@nvidia.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
@terrykong terrykong force-pushed the chtruong/cp-602-r0.4.0 branch from 4b1f07f to d7a3586 Compare November 1, 2025 06:19
@github-actions github-actions Bot removed the CI Relating to CI label Nov 1, 2025
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Nov 1, 2025

ℹ️ File Consistency Check

Check based on commit: d7a3586 (PR #1458 from chtruong/cp-602-r0.4.0)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@terrykong terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Nov 1, 2025
@terrykong terrykong enabled auto-merge (squash) November 1, 2025 06:19
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 1, 2025

Caution

Review failed

The head commit changed during the review from 4b1f07f to d7a3586.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chtruong/cp-602-r0.4.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@terrykong terrykong merged commit e941b4e into r0.4.0 Nov 1, 2025
60 of 64 checks passed
@terrykong terrykong deleted the chtruong/cp-602-r0.4.0 branch November 1, 2025 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests Documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants