Skip to content

cp: feat: Using mcore cpu optimizer (1242) into r0.4.0#1329

Merged
terrykong merged 1 commit intor0.4.0from
cherry-pick-1242-r0.4.0
Oct 10, 2025
Merged

cp: feat: Using mcore cpu optimizer (1242) into r0.4.0#1329
terrykong merged 1 commit intor0.4.0from
cherry-pick-1242-r0.4.0

Conversation

@chtruong814
Copy link
Copy Markdown
Contributor

@chtruong814 chtruong814 commented Oct 9, 2025

beep boop [🤖]: Hi @guyueh1 👋,

we've cherry picked #1242 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

  • New Features
    • Added CPU optimizer offload support with configurable offload fraction.
    • Training/refit now respects offload by keeping optimizer state on CPU when enabled.
    • Validation ensures offload fraction is 1.0 when CPU offload is enabled.
  • Configuration
    • Example configs and refit tool updated to include new offload options (disabled by default).
  • Tests
    • Unit tests updated to cover the new optimizer offload configuration.

Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@chtruong814 chtruong814 requested review from guyueh1 and removed request for guyueh1 October 9, 2025 22:02
@chtruong814 chtruong814 requested a review from a team as a code owner October 9, 2025 22:02
@chtruong814 chtruong814 requested review from a team as code owners October 9, 2025 22:02
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 9, 2025

📝 Walkthrough

Walkthrough

Adds two optimizer CPU offload configuration keys (optimizer_cpu_offload, optimizer_offload_fraction) across example configs, tests, and a tool. Extends MegatronOptimizerConfig TypedDict to include these fields. Updates Megatron policy worker to enforce fraction==1.0 when CPU offload is enabled and to avoid moving optimizer state to CUDA in that mode.

Changes

Cohort / File(s) Summary
Example configs: add optimizer offload keys
examples/configs/dpo.yaml, examples/configs/grpo_math_1B.yaml, examples/configs/rm.yaml, examples/configs/sft.yaml, examples/configs/sft_openmathinstruct2_megatron.yaml
Add optimizer_cpu_offload: false and optimizer_offload_fraction: 0.0 under Megatron optimizer blocks (some in both policy.megatron_cfg.optimizer and megatron_cfg.optimizer).
Policy config typing
nemo_rl/models/policy/__init__.py
Extend MegatronOptimizerConfig TypedDict with optimizer_cpu_offload: bool and optimizer_offload_fraction: float (note: offload requires fraction 1.0).
Policy worker control flow
nemo_rl/models/policy/megatron_policy_worker.py
Add assertion: if optimizer_cpu_offload is True then optimizer_offload_fraction == 1.0. Skip moving optimizer state to CUDA when CPU offload is enabled; apply same guard in refit path.
Tests config updates
tests/unit/models/generation/test_vllm_generation.py, tests/unit/models/policy/test_megatron_worker.py
Inject new optimizer offload config keys (optimizer_cpu_offload: False, optimizer_offload_fraction: 0.0) into test configs.
Tooling: refit verifier config
tools/refit_verifier.py
Add optimizer_cpu_offload and optimizer_offload_fraction to Megatron Adam optimizer kwargs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Trainer
  participant PolicyWorker
  participant Optimizer
  participant CUDA as GPU Memory
  participant CPU as CPU Memory

  rect rgba(230,245,255,0.6)
  note over Trainer,PolicyWorker: Setup / Train preparation
  Trainer->>PolicyWorker: initialize(train_cfg)
  alt optimizer_cpu_offload == True
    PolicyWorker->>PolicyWorker: assert optimizer_offload_fraction == 1.0
    PolicyWorker--xCUDA: do not move optimizer state to CUDA
    PolicyWorker->>CPU: keep optimizer state on CPU
  else
    PolicyWorker->>CUDA: move optimizer state to CUDA
  end
  end

  rect rgba(235,255,235,0.6)
  note over PolicyWorker,Optimizer: Training step
  PolicyWorker->>Optimizer: step(...)
  alt CPU offload enabled
    Optimizer->>CPU: operate on CPU state
  else
    Optimizer->>CUDA: operate on CUDA state
  end
  end

  rect rgba(255,245,230,0.6)
  note over PolicyWorker,Optimizer: Refit/offload_before_refit
  alt CPU offload enabled
    PolicyWorker--xCUDA: skip state move to CUDA
    PolicyWorker->>CPU: maintain CPU residency
  else
    PolicyWorker->>CUDA: move state to CUDA as needed
  end
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

CI:L1

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The title “cp: feat: Using mcore cpu optimizer (1242) into r0.4.0” is confusing and does not clearly summarize the actual changeset of adding CPU offload configuration options; it also includes internal references and backticks that do not convey the main feature. Please rewrite the title to clearly and concisely reflect the primary change, for example “Cherry-pick CPU offloading support for Megatron optimizer (PR 1242) into r0.4.0” or “Add optimizer_cpu_offload and optimizer_offload_fraction config keys for CPU offload”.
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Test Results For Major Changes ✅ Passed The PR only introduces configuration flags and corresponding guardrails for optional CPU optimizer offloading, without altering default behavior or existing training flows, and the cherry-pick description contains no test information; given the limited, non-breaking scope, these changes qualify as minor, so the absence of documented test results is acceptable.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cherry-pick-1242-r0.4.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
examples/configs/dpo.yaml (1)

139-141: Consider documenting the configuration constraints and purpose.

The comment provides minimal guidance on these new CPU offload settings. Consider adding documentation that explains:

  • What optimizer CPU offload does and when to enable it
  • The constraint that optimizer_offload_fraction must be 1.0 when optimizer_cpu_offload is enabled (as enforced in the policy worker)
  • Performance trade-offs or use cases

Example:

-      # optimizer cpu offload
+      # Optimizer CPU offload: offloads optimizer state to CPU to reduce GPU memory usage
+      # When enabled, optimizer_offload_fraction must be 1.0 (fully offloaded)
       optimizer_cpu_offload: false
       optimizer_offload_fraction: 0.0

As per coding guidelines: "Exemplar configs under examples/configs/*.yaml must include documented defaults."

examples/configs/grpo_math_1B.yaml (1)

113-115: Consider documenting the configuration constraints and purpose.

Similar to other config files in this PR, the comment provides minimal guidance. Consider adding documentation that explains:

  • What optimizer CPU offload does and when to enable it
  • The constraint that optimizer_offload_fraction must be 1.0 when optimizer_cpu_offload is enabled
  • Performance trade-offs or use cases

As per coding guidelines: "Exemplar configs under examples/configs/*.yaml must include documented defaults."

examples/configs/sft.yaml (1)

117-119: Consider documenting the configuration constraints and purpose.

The comment provides minimal guidance on these new CPU offload settings. Consider adding documentation that explains:

  • What optimizer CPU offload does and when to enable it
  • The constraint that optimizer_offload_fraction must be 1.0 when optimizer_cpu_offload is enabled
  • Performance trade-offs or use cases

As per coding guidelines: "Exemplar configs under examples/configs/*.yaml must include documented defaults."

examples/configs/sft_openmathinstruct2_megatron.yaml (1)

65-67: Consider documenting the configuration constraints and purpose.

The comment provides minimal guidance on these new CPU offload settings. Consider adding documentation that explains:

  • What optimizer CPU offload does and when to enable it
  • The constraint that optimizer_offload_fraction must be 1.0 when optimizer_cpu_offload is enabled
  • Performance trade-offs or use cases

As per coding guidelines: "Exemplar configs under examples/configs/*.yaml must include documented defaults."

tools/refit_verifier.py (1)

235-237: Consider enhancing the comment for clarity.

While this is a tool file rather than an exemplar config, adding a brief note about the constraint (fraction must be 1.0 when offload is enabled) would help users understand the expected configuration when verifying refitted policies.

Example:

-                # Optimizer CPU offload settings
+                # Optimizer CPU offload settings (when enabled, fraction must be 1.0)
                 "optimizer_cpu_offload": False,
                 "optimizer_offload_fraction": 0.0,
examples/configs/rm.yaml (1)

108-110: Consider documenting the configuration constraints and purpose.

The comment provides minimal guidance on these new CPU offload settings. Consider adding documentation that explains:

  • What optimizer CPU offload does and when to enable it
  • The constraint that optimizer_offload_fraction must be 1.0 when optimizer_cpu_offload is enabled
  • Performance trade-offs or use cases

As per coding guidelines: "Exemplar configs under examples/configs/*.yaml must include documented defaults."

nemo_rl/models/policy/__init__.py (1)

64-68: Consider documenting recommended defaults in comments.

The new configuration keys are well-documented with clear purpose and constraints. However, per the coding guidelines for nemo_rl/**/*.py, when adding new config keys to a TypedDict, you should "document the key's purpose, valid values/types, and recommended default in code." Consider adding recommended default values to the comments.

Apply this diff to enhance the documentation:

-    # knob to enable optimizer cpu offload
+    # knob to enable optimizer cpu offload (recommended default: false)
     optimizer_cpu_offload: bool
-    # knob to set the fraction of parameters to keep on CPU
+    # knob to set the fraction of parameters to keep on CPU (recommended default: 0.0)
     # currently if optimizer_cpu_offload is true, this knob must be 1.0
     optimizer_offload_fraction: float

Based on coding guidelines

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4c0103 and 78d838a.

📒 Files selected for processing (10)
  • examples/configs/dpo.yaml (1 hunks)
  • examples/configs/grpo_math_1B.yaml (1 hunks)
  • examples/configs/rm.yaml (1 hunks)
  • examples/configs/sft.yaml (1 hunks)
  • examples/configs/sft_openmathinstruct2_megatron.yaml (1 hunks)
  • nemo_rl/models/policy/__init__.py (1 hunks)
  • nemo_rl/models/policy/megatron_policy_worker.py (3 hunks)
  • tests/unit/models/generation/test_vllm_generation.py (1 hunks)
  • tests/unit/models/policy/test_megatron_worker.py (1 hunks)
  • tools/refit_verifier.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
examples/configs/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/
.yaml

Files:

  • examples/configs/grpo_math_1B.yaml
  • examples/configs/sft_openmathinstruct2_megatron.yaml
  • examples/configs/rm.yaml
  • examples/configs/dpo.yaml
  • examples/configs/sft.yaml
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

  • tools/refit_verifier.py
  • nemo_rl/models/policy/__init__.py
  • tests/unit/models/policy/test_megatron_worker.py
  • nemo_rl/models/policy/megatron_policy_worker.py
  • tests/unit/models/generation/test_vllm_generation.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

  • nemo_rl/models/policy/__init__.py
  • nemo_rl/models/policy/megatron_policy_worker.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Lint check
  • GitHub Check: Lint check
  • GitHub Check: Lint check
  • GitHub Check: Post submodule check comment / Comment on PR
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (5)
tests/unit/models/generation/test_vllm_generation.py (1)

195-196: LGTM!

The test configuration correctly includes the new CPU offload settings with appropriate default values.

tests/unit/models/policy/test_megatron_worker.py (1)

117-118: LGTM!

The test configuration correctly includes the new CPU offload settings with appropriate default values.

nemo_rl/models/policy/megatron_policy_worker.py (3)

617-628: LGTM! Clear constraint enforcement.

The assertion correctly enforces that CPU optimizer offload requires optimizer_offload_fraction=1.0, preventing unsupported hybrid configurations. The comment clearly explains why partial offloading conflicts with the framework's generation/training transition logic.


1769-1773: LGTM! Proper gating of optimizer state movement.

The condition correctly prevents moving optimizer state to CUDA when CPU offload is enabled, aligning with the feature's design. The check properly combines with existing safety guards for optimizer existence.


1800-1804: LGTM! Consistent conditional logic.

The guard mirrors the pattern from prepare_for_training (lines 1769-1773), appropriately preventing redundant optimizer state movement when CPU offload is already enabled. This maintains consistency across the codebase.

@terrykong terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Oct 9, 2025
@terrykong terrykong enabled auto-merge (squash) October 9, 2025 22:46
@terrykong terrykong merged commit e8a7473 into r0.4.0 Oct 10, 2025
68 of 71 checks passed
@terrykong terrykong deleted the cherry-pick-1242-r0.4.0 branch October 10, 2025 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick CI:L1 Run doctests, unit tests, and functional tests Run CICD

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants