cp: append to hf_overrides rather than overwriting (#1413) into r0.4.0#1460
cp: append to hf_overrides rather than overwriting (#1413) into r0.4.0#1460
append to hf_overrides rather than overwriting (#1413) into r0.4.0#1460Conversation
📝 WalkthroughWalkthroughThis PR extends configuration support for multiple training algorithms (distillation, DPO, SFT, GRPO, RM), introduces an internal padding key migration from Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Areas requiring extra attention:
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
79269af to
110685e
Compare
ℹ️ File Consistency CheckCheck based on commit: 110685e (PR #1460 from ✅ DTensor Policy Worker Synchronization CheckBoth DTensor policy worker files were modified in this PR:
Please ensure that the changes are consistent between both files where applicable. This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
110685e to
a89b4d0
Compare
|
should go in after #1458 |
ℹ️ File Consistency CheckCheck based on commit: a89b4d0 (PR #1460 from ✅ DTensor Policy Worker Synchronization CheckBoth DTensor policy worker files were modified in this PR:
Please ensure that the changes are consistent between both files where applicable. This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
a89b4d0 to
967c6d9
Compare
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com>
967c6d9 to
17c3e37
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
nemo_rl/models/generation/vllm/vllm_generation.py (1)
92-103: Avoid in-code defaults; require sampling params from YAML
top_pis fetched with a default (1.0). Per nemo_rl guidelines, avoid non-None defaults in code; YAML should be the single source of truth.- top_p: float = self.cfg.get("top_p", 1.0) + top_p: float = self.cfg["top_p"]Also consider adding a short assertion message mirroring the
top_kcheck if missing.examples/configs/vlm_grpo_3B_megatron.yaml (1)
59-77: Config will fail validation once disabled schema shipsThe new disabled
TypedDictvariants only accept theenabledflag, so keepingcpu_offload,sequence_parallel,tensor_parallel_size, etc. whileenabled: falsewill now trigger anextra_forbiddenvalidation error whenPolicyConfigis parsed. Please either strip these fields when the feature is disabled or relax the schema so the existing knobs remain optional, otherwise this config won’t load. (hugovk-typing.readthedocs.io)
🧹 Nitpick comments (2)
nemo_rl/models/generation/__init__.py (1)
30-36: Make the warning actionable with stacklevelAdd stacklevel=2 so the warning points at the caller site.
- warnings.warn( + warnings.warn( "'_pad_token_id' found in generation config and will be overridden with tokenizer.pad_token_id. " "Note: '_pad_token_id' is intended for internal use and has no effect when set in user-provided configs.", - UserWarning, + UserWarning, + stacklevel=2, )nemo_rl/models/generation/vllm/vllm_worker.py (1)
378-382: hf_overrides: merge instead of overwrite — goodInitializing to dict and updating preserves existing overrides.
If nested sub-maps (e.g., rope_scaling) need merging, consider a shallow+deep update helper:
def deep_update(dst, src): for k, v in src.items(): if isinstance(v, dict) and isinstance(dst.get(k), dict): deep_update(dst[k], v) else: dst[k] = vUse instead of plain
updatewhen required.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (42)
.pre-commit-config.yaml(1 hunks)docs/design-docs/generation.md(1 hunks)examples/configs/distillation_math.yaml(1 hunks)examples/configs/distillation_math_megatron.yaml(1 hunks)examples/configs/dpo.yaml(2 hunks)examples/configs/grpo_math_1B.yaml(2 hunks)examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml(0 hunks)examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-e2e.yaml(0 hunks)examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v3.yaml(0 hunks)examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml(0 hunks)examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml(0 hunks)examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml(0 hunks)examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml(0 hunks)examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml(0 hunks)examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml(0 hunks)examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml(0 hunks)examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml(0 hunks)examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml(0 hunks)examples/configs/sft.yaml(3 hunks)examples/configs/sft_openmathinstruct2_megatron.yaml(1 hunks)examples/configs/vlm_grpo_3B.yaml(2 hunks)examples/configs/vlm_grpo_3B_megatron.yaml(2 hunks)nemo_rl/algorithms/loss_functions.py(1 hunks)nemo_rl/data/__init__.py(2 hunks)nemo_rl/environments/math_environment.py(2 hunks)nemo_rl/evals/eval.py(2 hunks)nemo_rl/experience/rollouts.py(1 hunks)nemo_rl/models/generation/__init__.py(2 hunks)nemo_rl/models/generation/interfaces.py(2 hunks)nemo_rl/models/generation/vllm/vllm_generation.py(4 hunks)nemo_rl/models/generation/vllm/vllm_worker.py(3 hunks)nemo_rl/models/generation/vllm/vllm_worker_async.py(2 hunks)nemo_rl/models/policy/__init__.py(5 hunks)nemo_rl/models/policy/lm_policy.py(2 hunks)nemo_rl/models/policy/megatron_policy_worker.py(3 hunks)nemo_rl/utils/checkpoint.py(2 hunks)pyproject.toml(1 hunks)tests/unit/models/generation/test_vllm_generation.py(1 hunks)tests/unit/models/generation/test_vllm_large_model.py(1 hunks)tests/unit/test_config_validation.py(1 hunks)tests/unit/test_recipes_and_test_suites.py(1 hunks)tools/config_cli.py(1 hunks)
💤 Files with no reviewable changes (12)
- examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml
- examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml
- examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml
- examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml
- examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml
- examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml
- examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml
- examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml
- examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-e2e.yaml
- examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml
- examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml
- examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v3.yaml
🧰 Additional context used
📓 Path-based instructions (4)
examples/configs/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/.yaml
Files:
examples/configs/grpo_math_1B.yamlexamples/configs/sft_openmathinstruct2_megatron.yamlexamples/configs/vlm_grpo_3B_megatron.yamlexamples/configs/distillation_math.yamlexamples/configs/vlm_grpo_3B.yamlexamples/configs/dpo.yamlexamples/configs/distillation_math_megatron.yamlexamples/configs/sft.yaml
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts
Files:
nemo_rl/models/policy/lm_policy.pytests/unit/test_recipes_and_test_suites.pynemo_rl/models/generation/vllm/vllm_worker.pynemo_rl/models/generation/vllm/vllm_worker_async.pytools/config_cli.pynemo_rl/evals/eval.pynemo_rl/utils/checkpoint.pynemo_rl/models/generation/__init__.pytests/unit/models/generation/test_vllm_generation.pytests/unit/models/generation/test_vllm_large_model.pynemo_rl/algorithms/loss_functions.pytests/unit/test_config_validation.pynemo_rl/environments/math_environment.pynemo_rl/experience/rollouts.pynemo_rl/models/generation/interfaces.pynemo_rl/data/__init__.pynemo_rl/models/policy/megatron_policy_worker.pynemo_rl/models/policy/__init__.pynemo_rl/models/generation/vllm/vllm_generation.py
nemo_rl/**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)
Files:
nemo_rl/models/policy/lm_policy.pynemo_rl/models/generation/vllm/vllm_worker.pynemo_rl/models/generation/vllm/vllm_worker_async.pynemo_rl/evals/eval.pynemo_rl/utils/checkpoint.pynemo_rl/models/generation/__init__.pynemo_rl/algorithms/loss_functions.pynemo_rl/environments/math_environment.pynemo_rl/experience/rollouts.pynemo_rl/models/generation/interfaces.pynemo_rl/data/__init__.pynemo_rl/models/policy/megatron_policy_worker.pynemo_rl/models/policy/__init__.pynemo_rl/models/generation/vllm/vllm_generation.py
docs/**/*.md
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
When a markdown doc under docs/**/*.md is added or renamed, update docs/index.md to include it in the appropriate section
Files:
docs/design-docs/generation.md
🧠 Learnings (15)
📚 Learning: 2025-10-30T20:50:44.126Z
Learnt from: adil-a
Repo: NVIDIA-NeMo/RL PR: 1440
File: examples/configs/sft_automodel.yaml:48-58
Timestamp: 2025-10-30T20:50:44.126Z
Learning: In DTensor configurations for MoE (Mixture of Experts) models, expert_parallel_size and data_parallel_size can be applied together without multiplying the GPU requirements. Expert Parallelism (EP) only applies to MoE layers, while Data Parallelism/FSDP applies to non-MoE layers. Therefore, configurations like expert_parallel_size: 8 and data_parallel_size: 8 are valid on an 8-GPU cluster for MoE models.
Applied to files:
examples/configs/sft_openmathinstruct2_megatron.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/vlm/*.yaml : VLM recipe YAML filenames must follow: vlm_<algo>-<model>-<nodes>n<gpus>g-<strategy>[-modifiers][.vN].yaml
Applied to files:
tests/unit/test_recipes_and_test_suites.pytools/config_cli.py.pre-commit-config.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : LLM recipe YAML filenames must follow: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml
Applied to files:
tests/unit/test_recipes_and_test_suites.pytools/config_cli.py.pre-commit-config.yaml
📚 Learning: 2025-09-10T05:29:34.349Z
Learnt from: bxyu-nvidia
Repo: NVIDIA-NeMo/RL PR: 1110
File: nemo_rl/models/generation/vllm/vllm_worker_async.py:98-105
Timestamp: 2025-09-10T05:29:34.349Z
Learning: In the _maybe_correct_merged_tokens function in nemo_rl/models/generation/vllm/vllm_worker_async.py, the loop condition `len(candidate_token_ids) < len(actual_token_ids) - 1` is intentionally designed to prevent accessing the final token in actual_token_ids, likely to handle specific tokenization edge cases in the vLLM HTTP server integration.
Applied to files:
nemo_rl/models/generation/vllm/vllm_worker_async.pytests/unit/models/generation/test_vllm_generation.pytests/unit/models/generation/test_vllm_large_model.py
📚 Learning: 2025-09-10T05:34:35.406Z
Learnt from: bxyu-nvidia
Repo: NVIDIA-NeMo/RL PR: 1110
File: nemo_rl/models/generation/vllm/vllm_worker_async.py:346-359
Timestamp: 2025-09-10T05:34:35.406Z
Learning: In nemo_rl/models/generation/vllm/vllm_worker_async.py, the HTTP server intentionally uses different path structures: `/v1/chat/completions` is under the `/v1` prefix while `/tokenize` is at the root level without the `/v1` prefix. This is the intended design.
Applied to files:
nemo_rl/models/generation/vllm/vllm_worker_async.pytests/unit/models/generation/test_vllm_generation.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.yaml : When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name
Applied to files:
tools/config_cli.py.pre-commit-config.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.{yaml,sh} : Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script
Applied to files:
tools/config_cli.py.pre-commit-config.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
Applied to files:
nemo_rl/evals/eval.pynemo_rl/utils/checkpoint.pynemo_rl/algorithms/loss_functions.pynemo_rl/environments/math_environment.pynemo_rl/data/__init__.pynemo_rl/models/policy/__init__.py
📚 Learning: 2025-09-17T01:52:21.399Z
Learnt from: ffrujeri
Repo: NVIDIA-NeMo/RL PR: 1023
File: nemo_rl/utils/checkpoint.py:58-65
Timestamp: 2025-09-17T01:52:21.399Z
Learning: model_state_dict_keys is not intended to be part of the nemo-rl CheckpointingConfig TypedDict - it's handled at the automodel implementation layer, not as a general checkpointing configuration parameter.
Applied to files:
nemo_rl/utils/checkpoint.pynemo_rl/models/policy/__init__.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : Express configuration optionality via TypedDict using typing.NotRequired
Applied to files:
nemo_rl/utils/checkpoint.pynemo_rl/environments/math_environment.pynemo_rl/models/generation/interfaces.pynemo_rl/data/__init__.pynemo_rl/models/policy/__init__.py
📚 Learning: 2025-09-19T03:00:58.662Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.
Applied to files:
nemo_rl/models/generation/__init__.pytests/unit/models/generation/test_vllm_large_model.pynemo_rl/models/generation/vllm/vllm_generation.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Applied to files:
nemo_rl/data/__init__.pynemo_rl/models/policy/__init__.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.yaml : Recipe YAMLs under examples/configs/recipes/** are runnable snapshots and may omit documentation
Applied to files:
.pre-commit-config.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/vlm/*.sh : VLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension
Applied to files:
.pre-commit-config.yaml
📚 Learning: 2025-09-19T02:44:38.451Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:73-84
Timestamp: 2025-09-19T02:44:38.451Z
Learning: The scheduler configuration format with a separate "milestones: [20]" entry (not wrapped under name/kwargs) is a valid and established pattern used across GRPO, DPO, and distillation configs in the NeMo RL codebase. This format specifies transition points between different schedulers (e.g., LinearLR for warmup steps, then ConstantLR).
Applied to files:
nemo_rl/models/policy/__init__.py
🧬 Code graph analysis (9)
nemo_rl/models/generation/vllm/vllm_worker.py (1)
nemo_rl/models/generation/interfaces.py (1)
verify_right_padding(23-99)
nemo_rl/models/generation/vllm/vllm_worker_async.py (1)
nemo_rl/models/generation/interfaces.py (1)
verify_right_padding(23-99)
nemo_rl/evals/eval.py (3)
nemo_rl/environments/math_environment.py (1)
MathEnvConfig(42-46)nemo_rl/models/generation/interfaces.py (1)
GenerationConfig(118-131)nemo_rl/models/policy/__init__.py (1)
TokenizerConfig(129-133)
nemo_rl/models/generation/__init__.py (2)
tests/unit/models/generation/test_vllm_generation.py (1)
tokenizer(238-241)tests/unit/models/generation/test_vllm_large_model.py (1)
tokenizer(82-85)
tests/unit/models/generation/test_vllm_generation.py (2)
tests/unit/environments/test_retriever.py (1)
tokenizer(84-93)tests/unit/environments/test_code_environment.py (1)
tokenizer(85-94)
tests/unit/models/generation/test_vllm_large_model.py (2)
tests/unit/environments/test_retriever.py (1)
tokenizer(84-93)tests/unit/environments/test_code_environment.py (1)
tokenizer(85-94)
tests/unit/test_config_validation.py (5)
tests/unit/data/packing/test_algorithms.py (1)
algorithms(97-104)nemo_rl/evals/eval.py (1)
MasterConfig(57-63)nemo_rl/algorithms/distillation.py (1)
MasterConfig(110-121)nemo_rl/algorithms/grpo.py (1)
MasterConfig(161-169)tools/config_cli.py (1)
load_config_with_inheritance(100-141)
nemo_rl/experience/rollouts.py (2)
tests/unit/environments/test_retriever.py (1)
tokenizer(84-93)tests/unit/environments/test_code_environment.py (1)
tokenizer(85-94)
nemo_rl/models/policy/__init__.py (2)
nemo_rl/models/generation/interfaces.py (1)
GenerationConfig(118-131)nemo_rl/models/policy/megatron_policy_worker.py (1)
freeze_moe_router(251-263)
🪛 Ruff (0.14.2)
nemo_rl/models/generation/__init__.py
31-31: No explicit stacklevel keyword argument found
Set stacklevel=2
(B028)
tests/unit/test_config_validation.py
49-49: Avoid specifying long messages outside the exception class
(TRY003)
104-104: Avoid specifying long messages outside the exception class
(TRY003)
127-127: Local variable config_type is assigned to but never used
Remove assignment to unused variable config_type
(F841)
129-131: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Docs_Tests
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (25)
tests/unit/models/generation/test_vllm_generation.py (1)
440-440: LGTM! Padding key migration aligned with PR objectives.The change from
"pad_token_id"to"_pad_token_id"correctly reflects the broader codebase migration to use an internal padding key. The fallback totokenizer.pad_token_idensures safety if the key is missing.tests/unit/models/generation/test_vllm_large_model.py (1)
171-171: LGTM! Consistent padding key migration.The update to use
"_pad_token_id"with a fallback totokenizer.pad_token_idis consistent with the parallel change in test_vllm_generation.py and aligns with the PR's internal padding key migration.nemo_rl/models/generation/vllm/vllm_worker_async.py (2)
531-531: LGTM! Padding verification updated to use internal key.The right-padding verification now correctly uses
self.cfg["_pad_token_id"]instead of"pad_token_id". This aligns with the PR's migration to the internal padding key, which is populated byconfigure_generation_config.
639-639: LGTM! Output tensor initialization uses internal padding key.The change to use
self.cfg["_pad_token_id"]for filling the output tensor is consistent with the broader padding key migration across the vLLM generation code.examples/configs/vlm_grpo_3B.yaml (2)
96-96: LGTM! Explicit boolean default for defer_fp32_logits.Changing from
nullto explicitFalseimproves config clarity and aligns with the TypedDict typing updates across the codebase.
119-122: LGTM! New optimizer CPU offload configuration.The addition of
optimizer_cpu_offloadandoptimizer_offload_fractionwith explicit defaults provides clear control over optimizer offloading behavior. The section comment "# optimizer cpu offload" helps document the purpose.examples/configs/distillation_math_megatron.yaml (1)
60-60: LGTM! Explicit boolean default for defer_fp32_logits.Changing from
nulltoFalseprovides an explicit default for the Megatron configuration, consistent with similar updates across distillation and GRPO configs.examples/configs/distillation_math.yaml (1)
106-106: LGTM! Explicit boolean default for defer_fp32_logits.The explicit
Falsedefault is clearer thannulland aligns with the parallel change in distillation_math_megatron.yaml and broader Megatron configuration standardization..pre-commit-config.yaml (1)
74-74: LGTM! Distillation workflow added to minimize-check.The new distillation minimize-check workflow correctly follows the established pattern for dpo, grpo, and sft recipes. This ensures distillation recipes under
examples/configs/recipes/llm/distillation-*.yamlare properly minimized before merge, consistent with the new distillation base config atexamples/configs/distillation_math.yaml.examples/configs/dpo.yaml (2)
116-116: LGTM! New defer_fp32_logits configuration field.Adding
defer_fp32_logits: Falseprovides explicit control over FP32 logits behavior in the Megatron configuration, consistent with similar additions across DPO, SFT, GRPO, and distillation configs.
159-159: LGTM! New use_custom_fsdp configuration field.The addition of
use_custom_fsdp: falseunderdistributed_data_parallel_configprovides explicit control over custom FSDP usage, consistent with the parallel addition inexamples/configs/sft_openmathinstruct2_megatron.yaml.tools/config_cli.py (1)
49-55: Example loop: distillation inclusion LGTMThe added algo and base_config branch look correct and consistent with the other cases. No action needed.
nemo_rl/models/generation/__init__.py (1)
36-36: Internal key assignment LGTMOverriding to tokenizer.pad_token_id is correct; keeps user config surface clean.
nemo_rl/models/generation/interfaces.py (1)
125-132: GenerationConfig shape updates LGTMNullable top_k/top_p fields and internal
_pad_token_idare coherent with the new flow. Please ensure docs reflect these shapes.Confirm docs mention
_pad_token_idis internal-only and set by configure_generation_config.nemo_rl/models/policy/lm_policy.py (2)
734-741: Checkpoint format guard LGTMCondition correctly blocks
model_save_formatfor DTensorPolicyWorker (_v2=False).
583-586: Verify that_pad_token_idis properly injected beforegenerate()is calledThe code at line 585 directly accesses
self.cfg["generation"]["_pad_token_id"]without defensive fallback. Whileconfigure_generation_config()reliably sets this value and current usage patterns (across all algorithms and tests) consistently call it beforePolicyinstantiation, the direct dict access creates a brittleness risk.Verification confirmed:
configure_generation_config()properly setsconfig["_pad_token_id"] = tokenizer.pad_token_id- All algorithm paths (sft.py, grpo.py, distillation.py, dpo.py, rm.py) call
configure_generation_config()before instantiatingPolicy- All test files follow the same pattern
generate()asserts generation config is not None but does not validate_pad_token_idpresenceThe pattern holds in current code, but the direct dict access (not
.get()) means any future path that bypassesconfigure_generation_config()before callinggenerate()will raiseKeyError. Consider adding explicit validation ingenerate()or documenting this as a required precondition.examples/configs/sft.yaml (2)
36-39: dtensor_cfg.env_vars default LGTMExplicit empty mapping is fine and self-documenting.
75-97: megatron_cfg.env_vars and defer_fp32_logits LGTMThese align with worker checks. Ensure docs briefly define
defer_fp32_logits.Confirm other exemplar configs that set
logprob_chunk_sizealso setdefer_fp32_logits: true.nemo_rl/models/generation/vllm/vllm_worker.py (2)
543-544: Use of_pad_token_idfor input validation LGTMSwitching to the internal pad token keeps checks consistent.
576-578: Padding generated sequences with_pad_token_idLGTMCorrect pad value used when constructing full outputs.
nemo_rl/models/generation/vllm/vllm_generation.py (3)
121-129: Explicitmodel_namerequirement LGTMRuntime validation is appropriate since workers read
model_namedirectly.
452-455: from_batches padding LGTMUsing
_pad_token_idfor aggregation is consistent with the new contract.
502-505: generate_text padding LGTMSame note; consistent usage.
nemo_rl/models/policy/megatron_policy_worker.py (2)
291-294: FP32-cast gating viadefer_fp32_logitsLGTMClear, explicit wiring into model creation.
640-647: Runtime guard forlogprob_chunk_sizeis properly maintainedVerification confirms all recipes that set
policy.logprob_chunk_sizealso setmegatron_cfg.defer_fp32_logits: true. Only one recipe uses this setting (grpo-math-qwen3-30ba3b-megatron-tp4-32k.yaml), and it correctly configures both parameters. The assertion is sound and prevents the inconsistent numerical mode.
| split: NotRequired[str | None] | ||
| shuffle: bool | ||
| seed: NotRequired[int | None] |
There was a problem hiding this comment.
Keep shuffle optional or update every config
Making shuffle a required key here breaks config validation for existing configs. For instance, examples/configs/sft_openmathinstruct2_megatron.yaml in this same PR still omits data.shuffle, so the TypeAdapter[DataConfig] validation path will now raise “Field required: shuffle.” Please either revert this to NotRequired[bool] or add the key (with documented purpose/default) to every data config shipped with the repo before merging. Based on learnings
🤖 Prompt for AI Agents
In nemo_rl/data/__init__.py around lines 33 to 35, the data config type made
shuffle required which breaks validation for existing configs; change the type
back to NotRequired[bool] or add data.shuffle to every shipped config file (with
a short comment/doc and a sensible default such as false) so
TypeAdapter[DataConfig] no longer raises "Field required: shuffle." Ensure
tests/config examples are updated accordingly if you opt to add the key across
configs.
| generation_outputs = BatchedDataDict.from_batches( | ||
| ordered_batched_data_dicts, | ||
| pad_value_dict={"output_ids": pad_token_id, "logprobs": 0.0}, | ||
| pad_value_dict={"output_ids": tokenizer.pad_token_id, "logprobs": 0.0}, | ||
| ) |
There was a problem hiding this comment.
Reinstate config-derived pad token for async batching
BatchedDataDict.from_batches expects a real pad ID, and a lot of our policies feed one via policy_generation.cfg["pad_token_id"] (especially LLaMA-style tokenizers where tokenizer.pad_token_id is left as None). By dropping that override, async generation now either pads with None (raising a runtime error when torch builds the padded tensor) or with the tokenizer’s default instead of the policy’s configured _pad_token_id, so the sequences we push back into the rollout log are malformed. Please restore the config fallback:
- generation_outputs = BatchedDataDict.from_batches(
- ordered_batched_data_dicts,
- pad_value_dict={"output_ids": tokenizer.pad_token_id, "logprobs": 0.0},
- )
+ pad_token_id = policy_generation.cfg.get("pad_token_id", tokenizer.pad_token_id)
+ generation_outputs = BatchedDataDict.from_batches(
+ ordered_batched_data_dicts,
+ pad_value_dict={"output_ids": pad_token_id, "logprobs": 0.0},
+ )🤖 Prompt for AI Agents
In nemo_rl/experience/rollouts.py around lines 161 to 164, the pad_value_dict
currently uses tokenizer.pad_token_id which can be None for some tokenizers;
restore the policy-config fallback by using
policy_generation.cfg["pad_token_id"] (or .get with tokenizer.pad_token_id as
fallback) as the pad id so BatchedDataDict.from_batches always receives a real
pad ID (ensure you coerce/validate it is not None and keep "logprobs": 0.0
as-is).
| class OptionalResourcesConfig(TypedDict): | ||
| # Same as ResourcesConfig, but fields can be null and are validated in grpo.py | ||
| gpus_per_node: int | None | ||
| num_nodes: int | None | ||
|
|
||
|
|
||
| class ColocationConfig(TypedDict): | ||
| enabled: bool | ||
| resources: NotRequired[ResourcesConfig] | ||
| resources: OptionalResourcesConfig | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Avoid breaking change: make ColocationConfig.resources optional
resources changed from NotRequired[...] to a required OptionalResourcesConfig. This is likely a silent breaking change for YAMLs that omit it. Either revert to NotRequired or update all exemplar configs.
-class ColocationConfig(TypedDict):
+class ColocationConfig(TypedDict):
enabled: bool
- resources: OptionalResourcesConfig
+ resources: NotRequired[OptionalResourcesConfig]🏁 Script executed:
#!/bin/bash
# Find configs that define 'colocated:' without nested 'resources:'
rg -nP -g 'examples/**/*.yaml' -C2 'colocated:\s*\n(?!.*\n\s*resources:)' Length of output: 0
---
🏁 Script executed:
```shell
#!/bin/bash
# Find all YAML files in examples/ that contain 'colocated:'
echo "=== YAML files containing 'colocated:' ==="
fd -e yaml -e yml . examples/ | while read f; do
if grep -q "colocated:" "$f" 2>/dev/null; then
echo "File: $f"
grep -n -A5 "colocated:" "$f"
fi
done
echo -e "\n=== Search for GenerationConfig in code ==="
rg -n "class GenerationConfig" --type=py
echo -e "\n=== Search for resources field in YAML examples ==="
rg -n "resources:" --type-add 'yaml:*.{yaml,yml}' --type=yaml -C2 examples/ | head -50
Length of output: 370
🏁 Script executed:
#!/bin/bash
# Get GenerationConfig and surrounding context
echo "=== GenerationConfig definition ==="
sed -n '100,180p' git/nemo_rl/models/generation/interfaces.py
echo -e "\n=== Search for grpo.py ==="
find . -name "grpo.py" -o -name "*grpo*" 2>/dev/null | head -20
echo -e "\n=== Search for ColocationConfig usage ==="
rg -n "ColocationConfig" --type=py -B2 -A2
echo -e "\n=== Search for resources field access ==="
rg -n "\.resources" --type=py nemo_rl/ | head -30Length of output: 3411
🏁 Script executed:
#!/bin/bash
# Check grpo.py validation logic around resource access
echo "=== grpo.py validation logic ==="
rg -n -B5 -A3 "policy.generation.colocated.resources" nemo_rl/algorithms/grpo.py | head -80
echo -e "\n=== Sample YAML configs with colocated ==="
# Check a few YAML configs for colocated structure
for f in ./examples/configs/grpo_math_8B.yaml ./examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml; do
if [ -f "$f" ]; then
echo "File: $f"
grep -n -A10 "colocated:" "$f"
echo "---"
fi
doneLength of output: 2451
🏁 Script executed:
#!/bin/bash
# Find where inference_gpus_per_node and inference_nodes are assigned
echo "=== Extract grpo.py validation section (200-400 lines) ==="
rg -n -B40 "if colocated_inference:" nemo_rl/algorithms/grpo.py | head -100
echo -e "\n=== Check for .get() or safe access patterns ==="
rg -n "colocated\[" nemo_rl/algorithms/grpo.py
echo -e "\n=== Full context around colocated_inference check ==="
sed -n '270,400p' nemo_rl/algorithms/grpo.pyLength of output: 9678
🏁 Script executed:
#!/bin/bash
# Search for YAML files that might have colocated defined
echo "=== YAML files with 'colocated:' keyword ==="
fd -e yaml -e yml . examples/configs | xargs grep -l "colocated:" 2>/dev/null | head -10
echo -e "\n=== Check specific config files for colocated structure ==="
grep -n -A5 "colocated:" examples/configs/grpo_math_8B.yaml examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml 2>/dev/null | head -50
echo -e "\n=== Check non-colocated example specifically ==="
grep -n -B2 -A8 "colocated:" examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml 2>/dev/nullLength of output: 869
Make ColocationConfig.resources optional to preserve backward compatibility
The change making resources required in ColocationConfig is a breaking change. The code at line 351 in grpo.py directly accesses generation_config["colocated"]["resources"] without guards when colocated.enabled is false. If a YAML config has colocated: {enabled: false} without a resources field, it will fail with a KeyError.
Since GenerationConfig.colocated is NotRequired (users may provide colocated configs), any provided colocated config should have an optional resources field to maintain backward compatibility:
class ColocationConfig(TypedDict):
enabled: bool
- resources: OptionalResourcesConfig
+ resources: NotRequired[OptionalResourcesConfig]Additionally, consider adding validation guards in grpo.py around line 351 to handle the case where resources is missing, or document the implicit requirement in a comment.
🤖 Prompt for AI Agents
In nemo_rl/models/generation/interfaces.py around lines 107 to 116, the
ColocationConfig currently requires a resources key which breaks configs like
colocated: {enabled: false}; change the TypedDict so resources is optional (use
typing.NotRequired or mark the field as total=False) and type it as
OptionalResourcesConfig, and also update grpo.py around line 351 to guard access
(check for "resources" key or use dict.get before indexing) so missing resources
does not raise KeyError.
| class DTensorConfigDisabled(TypedDict): | ||
| enabled: Literal[False] | ||
|
|
||
|
|
||
| class DTensorConfig(TypedDict): | ||
| enabled: bool | ||
| env_vars: NotRequired[dict[str, str]] | ||
| enabled: Literal[True] | ||
| env_vars: NotRequired[dict[str, str] | None] | ||
| _v2: NotRequired[bool] | ||
| cpu_offload: NotRequired[bool] | ||
| sequence_parallel: NotRequired[bool] | ||
| activation_checkpointing: NotRequired[bool] | ||
| tensor_parallel_size: NotRequired[int] | ||
| context_parallel_size: NotRequired[int] | ||
| custom_parallel_plan: NotRequired[str] | ||
| clear_cache_every_n_steps: NotRequired[int] | ||
| cpu_offload: bool | ||
| sequence_parallel: bool | ||
| activation_checkpointing: bool | ||
| tensor_parallel_size: int | ||
| context_parallel_size: int | ||
| custom_parallel_plan: str | None | ||
| clear_cache_every_n_steps: NotRequired[int | None] | ||
|
|
||
|
|
||
| class SequencePackingConfigDisabled(TypedDict): | ||
| enabled: Literal[False] | ||
|
|
||
|
|
There was a problem hiding this comment.
Disabled TypedDicts must allow legacy fields
DTensorConfigDisabled, SequencePackingConfigDisabled, and DynamicBatchingConfigDisabled now only declare enabled. Pydantic’s TypedDict validation forbids keys that aren’t declared, so any existing config that keeps the old knobs while toggling enabled: false (see the GRPO YAML in this PR) will now raise extra_forbidden during validation. Please reintroduce these fields as NotRequired[...] (or otherwise loosen the schema) so we don’t break every config that relied on the previous structure. (hugovk-typing.readthedocs.io)
🤖 Prompt for AI Agents
In nemo_rl/models/policy/__init__.py around lines 20 to 40, the Disabled
TypedDicts (DTensorConfigDisabled, SequencePackingConfigDisabled and the missing
DynamicBatchingConfigDisabled) only declare enabled which causes Pydantic to
forbid legacy keys; restore the previous optional keys on each Disabled
TypedDict by adding the same fields present in their Enabled counterparts as
NotRequired[...] with the same types (e.g., for DTensorConfigDisabled add
env_vars: NotRequired[dict[str,str]|None], _v2: NotRequired[bool], cpu_offload:
NotRequired[bool], sequence_parallel: NotRequired[bool],
activation_checkpointing: NotRequired[bool], tensor_parallel_size:
NotRequired[int], context_parallel_size: NotRequired[int], custom_parallel_plan:
NotRequired[str|None], clear_cache_every_n_steps: NotRequired[int|None] and
similarly make the SequencePackingConfigDisabled and
DynamicBatchingConfigDisabled include the formerly allowed optional knobs as
NotRequired) so existing configs with extra keys validate when enabled is false.
What does this PR do ?
cp:
append to hf_overrides rather than overwriting(#1413) intor0.4.0Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
Release Notes
Chores
Documentation
Tests