cp: `feat: Using mcore cpu optimizer (1242)` into `r0.4.0` by chtruong814 · Pull Request #1329 · NVIDIA-NeMo/RL

chtruong814 · 2025-10-09T22:02:09Z

beep boop [🤖]: Hi @guyueh1 👋,

we've cherry picked #1242 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

New Features
- Added CPU optimizer offload support with configurable offload fraction.
- Training/refit now respects offload by keeping optimizer state on CPU when enabled.
- Validation ensures offload fraction is 1.0 when CPU offload is enabled.
Configuration
- Example configs and refit tool updated to include new offload options (disabled by default).
Tests
- Unit tests updated to cover the new optimizer offload configuration.

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

coderabbitai · 2025-10-09T22:07:10Z

📝 Walkthrough

Walkthrough

Adds two optimizer CPU offload configuration keys (optimizer_cpu_offload, optimizer_offload_fraction) across example configs, tests, and a tool. Extends MegatronOptimizerConfig TypedDict to include these fields. Updates Megatron policy worker to enforce fraction==1.0 when CPU offload is enabled and to avoid moving optimizer state to CUDA in that mode.

Changes

Cohort / File(s)	Summary
Example configs: add optimizer offload keys `examples/configs/dpo.yaml`, `examples/configs/grpo_math_1B.yaml`, `examples/configs/rm.yaml`, `examples/configs/sft.yaml`, `examples/configs/sft_openmathinstruct2_megatron.yaml`	Add `optimizer_cpu_offload: false` and `optimizer_offload_fraction: 0.0` under Megatron optimizer blocks (some in both `policy.megatron_cfg.optimizer` and `megatron_cfg.optimizer`).
Policy config typing `nemo_rl/models/policy/__init__.py`	Extend MegatronOptimizerConfig TypedDict with `optimizer_cpu_offload: bool` and `optimizer_offload_fraction: float` (note: offload requires fraction 1.0).
Policy worker control flow `nemo_rl/models/policy/megatron_policy_worker.py`	Add assertion: if `optimizer_cpu_offload` is True then `optimizer_offload_fraction == 1.0`. Skip moving optimizer state to CUDA when CPU offload is enabled; apply same guard in refit path.
Tests config updates `tests/unit/models/generation/test_vllm_generation.py`, `tests/unit/models/policy/test_megatron_worker.py`	Inject new optimizer offload config keys (`optimizer_cpu_offload: False`, `optimizer_offload_fraction: 0.0`) into test configs.
Tooling: refit verifier config `tools/refit_verifier.py`	Add `optimizer_cpu_offload` and `optimizer_offload_fraction` to Megatron Adam optimizer kwargs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Trainer
  participant PolicyWorker
  participant Optimizer
  participant CUDA as GPU Memory
  participant CPU as CPU Memory

  rect rgba(230,245,255,0.6)
  note over Trainer,PolicyWorker: Setup / Train preparation
  Trainer->>PolicyWorker: initialize(train_cfg)
  alt optimizer_cpu_offload == True
    PolicyWorker->>PolicyWorker: assert optimizer_offload_fraction == 1.0
    PolicyWorker--xCUDA: do not move optimizer state to CUDA
    PolicyWorker->>CPU: keep optimizer state on CPU
  else
    PolicyWorker->>CUDA: move optimizer state to CUDA
  end
  end

  rect rgba(235,255,235,0.6)
  note over PolicyWorker,Optimizer: Training step
  PolicyWorker->>Optimizer: step(...)
  alt CPU offload enabled
    Optimizer->>CPU: operate on CPU state
  else
    Optimizer->>CUDA: operate on CUDA state
  end
  end

  rect rgba(255,245,230,0.6)
  note over PolicyWorker,Optimizer: Refit/offload_before_refit
  alt CPU offload enabled
    PolicyWorker--xCUDA: skip state move to CUDA
    PolicyWorker->>CPU: maintain CPU residency
  else
    PolicyWorker->>CUDA: move state to CUDA as needed
  end
  end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

feat: Using mcore cpu optimizer #1242 — Edits the same megatron_policy_worker.py to add/enforce CPU optimizer offload configuration and guards around optimizer state movement.

Suggested labels

CI:L1

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title Check	⚠️ Warning	The title “cp: `feat: Using mcore cpu optimizer (1242)` into `r0.4.0`” is confusing and does not clearly summarize the actual changeset of adding CPU offload configuration options; it also includes internal references and backticks that do not convey the main feature.	Please rewrite the title to clearly and concisely reflect the primary change, for example “Cherry-pick CPU offloading support for Megatron optimizer (PR 1242) into r0.4.0” or “Add optimizer_cpu_offload and optimizer_offload_fraction config keys for CPU offload”.
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Test Results For Major Changes	✅ Passed	The PR only introduces configuration flags and corresponding guardrails for optional CPU optimizer offloading, without altering default behavior or existing training flows, and the cherry-pick description contains no test information; given the limited, non-breaking scope, these changes qualify as minor, so the absence of documented test results is acceptable.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cherry-pick-1242-r0.4.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (7)

examples/configs/dpo.yaml (1)
139-141: Consider documenting the configuration constraints and purpose.

The comment provides minimal guidance on these new CPU offload settings. Consider adding documentation that explains:

What optimizer CPU offload does and when to enable it

The constraint that optimizer_offload_fraction must be 1.0 when optimizer_cpu_offload is enabled (as enforced in the policy worker)

Performance trade-offs or use cases

Example:
-      # optimizer cpu offload
+      # Optimizer CPU offload: offloads optimizer state to CPU to reduce GPU memory usage
+      # When enabled, optimizer_offload_fraction must be 1.0 (fully offloaded)
       optimizer_cpu_offload: false
       optimizer_offload_fraction: 0.0
As per coding guidelines: "Exemplar configs under examples/configs/*.yaml must include documented defaults."
examples/configs/grpo_math_1B.yaml (1)

113-115: Consider documenting the configuration constraints and purpose.

Similar to other config files in this PR, the comment provides minimal guidance. Consider adding documentation that explains:

What optimizer CPU offload does and when to enable it

The constraint that optimizer_offload_fraction must be 1.0 when optimizer_cpu_offload is enabled

Performance trade-offs or use cases

As per coding guidelines: "Exemplar configs under examples/configs/*.yaml must include documented defaults."

examples/configs/sft.yaml (1)

117-119: Consider documenting the configuration constraints and purpose.

The comment provides minimal guidance on these new CPU offload settings. Consider adding documentation that explains:

What optimizer CPU offload does and when to enable it

The constraint that optimizer_offload_fraction must be 1.0 when optimizer_cpu_offload is enabled

Performance trade-offs or use cases

As per coding guidelines: "Exemplar configs under examples/configs/*.yaml must include documented defaults."

examples/configs/sft_openmathinstruct2_megatron.yaml (1)

65-67: Consider documenting the configuration constraints and purpose.

The comment provides minimal guidance on these new CPU offload settings. Consider adding documentation that explains:

What optimizer CPU offload does and when to enable it

The constraint that optimizer_offload_fraction must be 1.0 when optimizer_cpu_offload is enabled

Performance trade-offs or use cases

As per coding guidelines: "Exemplar configs under examples/configs/*.yaml must include documented defaults."
tools/refit_verifier.py (1)
235-237: Consider enhancing the comment for clarity.

While this is a tool file rather than an exemplar config, adding a brief note about the constraint (fraction must be 1.0 when offload is enabled) would help users understand the expected configuration when verifying refitted policies.

Example:
-                # Optimizer CPU offload settings
+                # Optimizer CPU offload settings (when enabled, fraction must be 1.0)
                 "optimizer_cpu_offload": False,
                 "optimizer_offload_fraction": 0.0,
examples/configs/rm.yaml (1)

108-110: Consider documenting the configuration constraints and purpose.

The comment provides minimal guidance on these new CPU offload settings. Consider adding documentation that explains:

What optimizer CPU offload does and when to enable it

The constraint that optimizer_offload_fraction must be 1.0 when optimizer_cpu_offload is enabled

Performance trade-offs or use cases

As per coding guidelines: "Exemplar configs under examples/configs/*.yaml must include documented defaults."
nemo_rl/models/policy/__init__.py (1)
64-68: Consider documenting recommended defaults in comments.

The new configuration keys are well-documented with clear purpose and constraints. However, per the coding guidelines for nemo_rl/**/*.py, when adding new config keys to a TypedDict, you should "document the key's purpose, valid values/types, and recommended default in code." Consider adding recommended default values to the comments.

Apply this diff to enhance the documentation:
-    # knob to enable optimizer cpu offload
+    # knob to enable optimizer cpu offload (recommended default: false)
     optimizer_cpu_offload: bool
-    # knob to set the fraction of parameters to keep on CPU
+    # knob to set the fraction of parameters to keep on CPU (recommended default: 0.0)
     # currently if optimizer_cpu_offload is true, this knob must be 1.0
     optimizer_offload_fraction: float
Based on coding guidelines

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4c0103 and 78d838a.

📒 Files selected for processing (10)

examples/configs/dpo.yaml (1 hunks)
examples/configs/grpo_math_1B.yaml (1 hunks)
examples/configs/rm.yaml (1 hunks)
examples/configs/sft.yaml (1 hunks)
examples/configs/sft_openmathinstruct2_megatron.yaml (1 hunks)
nemo_rl/models/policy/__init__.py (1 hunks)
nemo_rl/models/policy/megatron_policy_worker.py (3 hunks)
tests/unit/models/generation/test_vllm_generation.py (1 hunks)
tests/unit/models/policy/test_megatron_worker.py (1 hunks)
tools/refit_verifier.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

examples/configs/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/.yaml

Files:

examples/configs/grpo_math_1B.yaml
examples/configs/sft_openmathinstruct2_megatron.yaml
examples/configs/rm.yaml
examples/configs/dpo.yaml
examples/configs/sft.yaml

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

tools/refit_verifier.py
nemo_rl/models/policy/__init__.py
tests/unit/models/policy/test_megatron_worker.py
nemo_rl/models/policy/megatron_policy_worker.py
tests/unit/models/generation/test_vllm_generation.py

nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

nemo_rl/models/policy/__init__.py
nemo_rl/models/policy/megatron_policy_worker.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Lint check
GitHub Check: Lint check
GitHub Check: Lint check
GitHub Check: Post submodule check comment / Comment on PR
GitHub Check: Post automodel integration comment / Comment on PR

🔇 Additional comments (5)

tests/unit/models/generation/test_vllm_generation.py (1)

195-196: LGTM!

The test configuration correctly includes the new CPU offload settings with appropriate default values.

tests/unit/models/policy/test_megatron_worker.py (1)

117-118: LGTM!

The test configuration correctly includes the new CPU offload settings with appropriate default values.

nemo_rl/models/policy/megatron_policy_worker.py (3)

617-628: LGTM! Clear constraint enforcement.

The assertion correctly enforces that CPU optimizer offload requires optimizer_offload_fraction=1.0, preventing unsupported hybrid configurations. The comment clearly explains why partial offloading conflicts with the framework's generation/training transition logic.

1769-1773: LGTM! Proper gating of optimizer state movement.

The condition correctly prevents moving optimizer state to CUDA when CPU offload is enabled, aligning with the feature's design. The check properly combines with existing safety guards for optimizer existence.

1800-1804: LGTM! Consistent conditional logic.

The guard mirrors the pattern from prepare_for_training (lines 1769-1773), appropriately preventing redundant optimizer state movement when CPU offload is already enabled. This maintains consistency across the codebase.

feat: Using mcore cpu optimizer (#1242)

78d838a

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

chtruong814 requested review from guyueh1 and removed request for guyueh1 October 9, 2025 22:02

chtruong814 added the cherry-pick label Oct 9, 2025

chtruong814 requested a review from a team as a code owner October 9, 2025 22:02

chtruong814 added the Run CICD label Oct 9, 2025

chtruong814 requested review from a team as code owners October 9, 2025 22:02

coderabbitai Bot reviewed Oct 9, 2025

View reviewed changes

terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Oct 9, 2025

terrykong enabled auto-merge (squash) October 9, 2025 22:46

terrykong temporarily deployed to nemo-ci October 9, 2025 22:46 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci October 9, 2025 23:28 — with GitHub Actions Inactive

terrykong approved these changes Oct 10, 2025

View reviewed changes

terrykong temporarily deployed to nemo-ci October 10, 2025 01:52 — with GitHub Actions Inactive

terrykong merged commit e8a7473 into r0.4.0 Oct 10, 2025
68 of 71 checks passed

terrykong deleted the cherry-pick-1242-r0.4.0 branch October 10, 2025 03:42

coderabbitai Bot mentioned this pull request Oct 21, 2025

cp: feat: add Megatron support for on-policy distillation (1324) into r0.4.0 #1398

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cp: `feat: Using mcore cpu optimizer (1242)` into `r0.4.0`#1329

cp: `feat: Using mcore cpu optimizer (1242)` into `r0.4.0`#1329
terrykong merged 1 commit intor0.4.0from
cherry-pick-1242-r0.4.0

chtruong814 commented Oct 9, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Oct 9, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chtruong814 commented Oct 9, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Oct 9, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chtruong814 commented Oct 9, 2025 •

edited by coderabbitai Bot

Loading