cp: feat: Using mcore cpu optimizer (1242) into r0.4.0#1329
Conversation
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
📝 WalkthroughWalkthroughAdds two optimizer CPU offload configuration keys (optimizer_cpu_offload, optimizer_offload_fraction) across example configs, tests, and a tool. Extends MegatronOptimizerConfig TypedDict to include these fields. Updates Megatron policy worker to enforce fraction==1.0 when CPU offload is enabled and to avoid moving optimizer state to CUDA in that mode. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Trainer
participant PolicyWorker
participant Optimizer
participant CUDA as GPU Memory
participant CPU as CPU Memory
rect rgba(230,245,255,0.6)
note over Trainer,PolicyWorker: Setup / Train preparation
Trainer->>PolicyWorker: initialize(train_cfg)
alt optimizer_cpu_offload == True
PolicyWorker->>PolicyWorker: assert optimizer_offload_fraction == 1.0
PolicyWorker--xCUDA: do not move optimizer state to CUDA
PolicyWorker->>CPU: keep optimizer state on CPU
else
PolicyWorker->>CUDA: move optimizer state to CUDA
end
end
rect rgba(235,255,235,0.6)
note over PolicyWorker,Optimizer: Training step
PolicyWorker->>Optimizer: step(...)
alt CPU offload enabled
Optimizer->>CPU: operate on CPU state
else
Optimizer->>CUDA: operate on CUDA state
end
end
rect rgba(255,245,230,0.6)
note over PolicyWorker,Optimizer: Refit/offload_before_refit
alt CPU offload enabled
PolicyWorker--xCUDA: skip state move to CUDA
PolicyWorker->>CPU: maintain CPU residency
else
PolicyWorker->>CUDA: move state to CUDA as needed
end
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (7)
examples/configs/dpo.yaml (1)
139-141: Consider documenting the configuration constraints and purpose.The comment provides minimal guidance on these new CPU offload settings. Consider adding documentation that explains:
- What optimizer CPU offload does and when to enable it
- The constraint that
optimizer_offload_fractionmust be 1.0 whenoptimizer_cpu_offloadis enabled (as enforced in the policy worker)- Performance trade-offs or use cases
Example:
- # optimizer cpu offload + # Optimizer CPU offload: offloads optimizer state to CPU to reduce GPU memory usage + # When enabled, optimizer_offload_fraction must be 1.0 (fully offloaded) optimizer_cpu_offload: false optimizer_offload_fraction: 0.0As per coding guidelines: "Exemplar configs under examples/configs/*.yaml must include documented defaults."
examples/configs/grpo_math_1B.yaml (1)
113-115: Consider documenting the configuration constraints and purpose.Similar to other config files in this PR, the comment provides minimal guidance. Consider adding documentation that explains:
- What optimizer CPU offload does and when to enable it
- The constraint that
optimizer_offload_fractionmust be 1.0 whenoptimizer_cpu_offloadis enabled- Performance trade-offs or use cases
As per coding guidelines: "Exemplar configs under examples/configs/*.yaml must include documented defaults."
examples/configs/sft.yaml (1)
117-119: Consider documenting the configuration constraints and purpose.The comment provides minimal guidance on these new CPU offload settings. Consider adding documentation that explains:
- What optimizer CPU offload does and when to enable it
- The constraint that
optimizer_offload_fractionmust be 1.0 whenoptimizer_cpu_offloadis enabled- Performance trade-offs or use cases
As per coding guidelines: "Exemplar configs under examples/configs/*.yaml must include documented defaults."
examples/configs/sft_openmathinstruct2_megatron.yaml (1)
65-67: Consider documenting the configuration constraints and purpose.The comment provides minimal guidance on these new CPU offload settings. Consider adding documentation that explains:
- What optimizer CPU offload does and when to enable it
- The constraint that
optimizer_offload_fractionmust be 1.0 whenoptimizer_cpu_offloadis enabled- Performance trade-offs or use cases
As per coding guidelines: "Exemplar configs under examples/configs/*.yaml must include documented defaults."
tools/refit_verifier.py (1)
235-237: Consider enhancing the comment for clarity.While this is a tool file rather than an exemplar config, adding a brief note about the constraint (fraction must be 1.0 when offload is enabled) would help users understand the expected configuration when verifying refitted policies.
Example:
- # Optimizer CPU offload settings + # Optimizer CPU offload settings (when enabled, fraction must be 1.0) "optimizer_cpu_offload": False, "optimizer_offload_fraction": 0.0,examples/configs/rm.yaml (1)
108-110: Consider documenting the configuration constraints and purpose.The comment provides minimal guidance on these new CPU offload settings. Consider adding documentation that explains:
- What optimizer CPU offload does and when to enable it
- The constraint that
optimizer_offload_fractionmust be 1.0 whenoptimizer_cpu_offloadis enabled- Performance trade-offs or use cases
As per coding guidelines: "Exemplar configs under examples/configs/*.yaml must include documented defaults."
nemo_rl/models/policy/__init__.py (1)
64-68: Consider documenting recommended defaults in comments.The new configuration keys are well-documented with clear purpose and constraints. However, per the coding guidelines for
nemo_rl/**/*.py, when adding new config keys to a TypedDict, you should "document the key's purpose, valid values/types, and recommended default in code." Consider adding recommended default values to the comments.Apply this diff to enhance the documentation:
- # knob to enable optimizer cpu offload + # knob to enable optimizer cpu offload (recommended default: false) optimizer_cpu_offload: bool - # knob to set the fraction of parameters to keep on CPU + # knob to set the fraction of parameters to keep on CPU (recommended default: 0.0) # currently if optimizer_cpu_offload is true, this knob must be 1.0 optimizer_offload_fraction: floatBased on coding guidelines
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
examples/configs/dpo.yaml(1 hunks)examples/configs/grpo_math_1B.yaml(1 hunks)examples/configs/rm.yaml(1 hunks)examples/configs/sft.yaml(1 hunks)examples/configs/sft_openmathinstruct2_megatron.yaml(1 hunks)nemo_rl/models/policy/__init__.py(1 hunks)nemo_rl/models/policy/megatron_policy_worker.py(3 hunks)tests/unit/models/generation/test_vllm_generation.py(1 hunks)tests/unit/models/policy/test_megatron_worker.py(1 hunks)tools/refit_verifier.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
examples/configs/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/.yaml
Files:
examples/configs/grpo_math_1B.yamlexamples/configs/sft_openmathinstruct2_megatron.yamlexamples/configs/rm.yamlexamples/configs/dpo.yamlexamples/configs/sft.yaml
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts
Files:
tools/refit_verifier.pynemo_rl/models/policy/__init__.pytests/unit/models/policy/test_megatron_worker.pynemo_rl/models/policy/megatron_policy_worker.pytests/unit/models/generation/test_vllm_generation.py
nemo_rl/**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)
Files:
nemo_rl/models/policy/__init__.pynemo_rl/models/policy/megatron_policy_worker.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (5)
tests/unit/models/generation/test_vllm_generation.py (1)
195-196: LGTM!The test configuration correctly includes the new CPU offload settings with appropriate default values.
tests/unit/models/policy/test_megatron_worker.py (1)
117-118: LGTM!The test configuration correctly includes the new CPU offload settings with appropriate default values.
nemo_rl/models/policy/megatron_policy_worker.py (3)
617-628: LGTM! Clear constraint enforcement.The assertion correctly enforces that CPU optimizer offload requires
optimizer_offload_fraction=1.0, preventing unsupported hybrid configurations. The comment clearly explains why partial offloading conflicts with the framework's generation/training transition logic.
1769-1773: LGTM! Proper gating of optimizer state movement.The condition correctly prevents moving optimizer state to CUDA when CPU offload is enabled, aligning with the feature's design. The check properly combines with existing safety guards for optimizer existence.
1800-1804: LGTM! Consistent conditional logic.The guard mirrors the pattern from
prepare_for_training(lines 1769-1773), appropriately preventing redundant optimizer state movement when CPU offload is already enabled. This maintains consistency across the codebase.
beep boop [🤖]: Hi @guyueh1 👋,
Summary by CodeRabbit