cp: `append to hf_overrides rather than overwriting` (#1413) into `r0.4.0` by chtruong814 · Pull Request #1460 · NVIDIA-NeMo/RL

chtruong814 · 2025-10-31T23:15:34Z

What does this PR do ?

cp: append to hf_overrides rather than overwriting (#1413) into r0.4.0

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

Release Notes

Chores
- Updated configuration files with refined settings for training recipes, including removal of generation stop token specifications and addition of new optimization and parallelization options.
- Reorganized linting configuration structure.
Documentation
- Enhanced configuration type definitions to support nullable and optional fields for greater flexibility in model setup and generation parameters.
Tests
- Improved configuration validation with enhanced error reporting and Pydantic-based type checking across all configuration sections.

coderabbitai · 2025-10-31T23:45:47Z

📝 Walkthrough

Walkthrough

This PR extends configuration support for multiple training algorithms (distillation, DPO, SFT, GRPO, RM), introduces an internal padding key migration from pad_token_id to _pad_token_id, restructures policy configuration types to support disabled variants, expands eval dataset configurations, and refactors config validation to use Pydantic TypeAdapter.

Changes

Cohort / File(s)	Summary
Configuration YAML Extensions `.pre-commit-config.yaml`, `examples/configs/distillation_math.yaml`, `examples/configs/distillation_math_megatron.yaml`, `examples/configs/dpo.yaml`, `examples/configs/sft.yaml`, `examples/configs/sft_openmathinstruct2_megatron.yaml`, `examples/configs/grpo_math_1B.yaml`, `examples/configs/vlm_grpo_3B.yaml`, `examples/configs/vlm_grpo_3B_megatron.yaml`	Added new Megatron optimization fields (`defer_fp32_logits`, `optimizer_cpu_offload`, `optimizer_offload_fraction`, `use_custom_fsdp`), environment variable mappings (`env_vars`), and extended pre-commit distillation checks.
GRPO Recipe Cleanup `examples/configs/recipes/llm/grpo-*.yaml` (8 files)	Removed `stop_token_ids` configuration entries from generation blocks across GRPO Llama 3.1, Llama 3.2, and Qwen recipe variants.
Distillation Recipe Updates `examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml`	Removed `save_period: 10` from checkpointing configuration.
Padding Key Migration `nemo_rl/experience/rollouts.py`, `nemo_rl/models/generation/__init__.py`, `nemo_rl/models/generation/vllm/vllm_generation.py`, `nemo_rl/models/generation/vllm/vllm_worker.py`, `nemo_rl/models/generation/vllm/vllm_worker_async.py`, `nemo_rl/models/policy/lm_policy.py`	Transitioned from public `pad_token_id` key to internal `_pad_token_id` for padding configuration, with warning safeguard in generation config initialization.
Generation Type System Updates `docs/design-docs/generation.md`, `nemo_rl/models/generation/interfaces.py`	Made `top_k` nullable (`int \| None`), added internal `_pad_token_id` field, made `stop_token_ids` and `stop_strings` nullable, and introduced `OptionalResourcesConfig` for resources.
Policy Configuration Restructuring `nemo_rl/models/policy/__init__.py`	Introduced Disabled variant TypedDicts (`DTensorConfigDisabled`, `SequencePackingConfigDisabled`, `MegatronConfigDisabled`, `DynamicBatchingConfigDisabled`), refactored config unions to support enabled/disabled states, and broadened optional field typing.
Loss Function Type Updates `nemo_rl/algorithms/loss_functions.py`	Changed `ClippedPGLossConfig.ratio_clip_c` from `float` to `float \| None` to enable dual-clipping disabling.
Eval Data Configuration Expansion `nemo_rl/data/__init__.py`	Introduced new eval-specific TypedDicts (`MMLUEvalDataConfig`, `MMLUProEvalDataConfig`, `AIMEEvalDataConfig`, `GPQAEvalDataConfig`, `MathEvalDataConfig`, `LocalMathEvalDataConfig`), unified via `EvalDataConfigType` alias, and relaxed `DataConfig` field typing to support nullable strings and optional integers.
Environment Configuration Updates `nemo_rl/environments/math_environment.py`	Converted `MathEnvConfig` fields to use `NotRequired` with nullable union types (`stop_strings: NotRequired[list[str] \| None]`, `verifier_type: NotRequired[str \| None]`).
Eval Integration `nemo_rl/evals/eval.py`	Replaced `MathDataConfig` with `EvalDataConfigType`, introduced `_PassThroughMathConfig` wrapper, and updated `MasterConfig` typing.
Megatron Policy Worker `nemo_rl/models/policy/megatron_policy_worker.py`	Flipped default `wrap_cast_model_output_to_fp32` from `None` to `False`, added assertion enforcing `defer_fp32_logits=True` when `logprob_chunk_size` is set.
Checkpoint Configuration `nemo_rl/utils/checkpoint.py`	Expanded `model_save_format` type from `str` to `str \| None` with clarified documentation.
VllmGeneration Strictness `nemo_rl/models/generation/vllm/vllm_generation.py`	Changed `top_k` retrieval from optional `dict.get()` to strict index access, added `model_name` validation as required config key.
VllmWorker Override Handling `nemo_rl/models/generation/vllm/vllm_worker.py`	Introduced safe dict initialization for `hf_overrides` (ensures dict structure before merging), updated all padding lookups to use `_pad_token_id`.
Tooling & Test Infrastructure `.pre-commit-config.yaml`, `tools/config_cli.py`, `pyproject.toml`	Added distillation algorithm to config minimization workflow, extended Ruff lint per-file-ignores, reorganized configuration under `[tool.ruff.lint.per-file-ignores]`.
Config Validation Refactor `tests/unit/test_config_validation.py`	Replaced recursive TypedDict validator with Pydantic-based `validate_config_section()` using `TypeAdapter`, centralizing validation and improving error formatting.
Test Suite Updates `tests/unit/test_recipes_and_test_suites.py`	Added `"rm": "examples/configs/rm.yaml"` to `ALGO_MAPPING_TO_BASE_YAML`, removed `test_all_recipes_can_merge_configs_with_base_config` test function.
Generation Test Updates `tests/unit/models/generation/test_vllm_generation.py`, `tests/unit/models/generation/test_vllm_large_model.py`	Migrated pad token key from `"pad_token_id"` to `"_pad_token_id"` in async generation path and fallback logic.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring extra attention:

nemo_rl/models/policy/__init__.py: Substantial TypedDict restructuring with disabled variants and union-type handling; validate correctness of enabled: Literal[True] constraints and disabled config exclusivity.
nemo_rl/data/__init__.py: New eval config TypedDict hierarchy; verify field unions and optional/NotRequired consistency across all eval config classes.
Padding key migration (_pad_token_id vs pad_token_id): Ensure all lookups are consistent across generation, workers, and policy; verify no regression in fallback logic (e.g., in test_vllm_large_model.py).
tests/unit/test_config_validation.py: Pydantic validation replacement; confirm error messages are helpful and all algorithm-specific config paths are properly dispatched.
nemo_rl/models/policy/megatron_policy_worker.py: New assertion coupling logprob_chunk_size to defer_fp32_logits=True; verify this constraint is documented and all call sites respect it.

Possibly related PRs

fix: append to hf_overrides rather than overwriting #1413: Modifies vllm_worker.py's _patch_vllm_sampler to ensure hf_overrides is initialized as dict and merged safely, directly related to override handling changes in this PR.
chore: major version bump (torch 2.8, vllm 0.11, ray 2.49) & SP fixes #1334: Modifies same Megatron config fields (defer_fp32_logits, optimizer settings) across multiple example configs, overlaps with configuration extension scope.
chore: use pydantic for yaml test validation #1382: Refactors tests/unit/test_config_validation.py to use Pydantic TypeAdapter-based validation, directly related to config validation restructuring in this PR.

Suggested labels

cherry-pick, r0.4.0, Run CICD

Suggested reviewers

yuki-97

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Results For Major Changes	⚠️ Warning	This PR contains major structural and breaking changes affecting core configuration systems (PolicyConfig, GenerationConfig, DataConfig), generation backends, and training/inference pipelines across 40+ files. The review comments confirm multiple breaking changes and backward compatibility issues. However, the PR description lacks any test results, regression analysis, performance benchmarks, or documentation demonstrating that these changes do not introduce regressions in numerics, convergence, or performance. The PR explicitly marks pre-review items (tests, documentation) as incomplete and provides only an ellipsis in the additional information section, suggesting insufficient preparation for merge.	To pass this check, the PR description should include: (1) test results from unit/integration tests demonstrating the changes work correctly; (2) if affecting generation/inference, benchmarks showing no performance regression; (3) if affecting training, validation that convergence is not impacted; (4) evidence that all existing recipe configurations still validate and function correctly; (5) completion of the pre-review checklist items marked incomplete. Alternatively, if these are intentional breaking changes, the PR should clearly document the migration path and update all affected configurations accordingly.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title references "append to hf_overrides rather than overwriting," which is a real change present in the changeset, specifically visible in the `nemo_rl/models/generation/vllm/vllm_worker.py` modifications where `hf_overrides` is initialized and merged rather than replaced. However, the PR contains far more substantial changes than this single fix suggests. The changeset includes massive type system refactoring across multiple files (changing `top_k`, `stop_token_ids`, and other fields from concrete types to nullable variants), extensive config schema updates, new TypedDict definitions for policy configurations, data configuration enhancements, test infrastructure updates, pre-commit workflow additions, and numerous other modifications. The title accurately describes one specific change but significantly undersells the scope and nature of this large, multi-faceted PR.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chtruong/cp-1413-r0.4.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-11-01T06:28:41Z

ℹ️ File Consistency Check

Check based on commit: 110685e (PR #1460 from chtruong/cp-1413-r0.4.0)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

terrykong · 2025-11-01T06:29:17Z

should go in after #1458

github-actions · 2025-11-01T06:29:31Z

ℹ️ File Consistency Check

Check based on commit: a89b4d0 (PR #1460 from chtruong/cp-1413-r0.4.0)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/dtensor_policy_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: Terry Kong <terryk@nvidia.com>

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com>

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

nemo_rl/models/generation/vllm/vllm_generation.py (1)
92-103: Avoid in-code defaults; require sampling params from YAML

top_p is fetched with a default (1.0). Per nemo_rl guidelines, avoid non-None defaults in code; YAML should be the single source of truth.
-        top_p: float = self.cfg.get("top_p", 1.0)
+        top_p: float = self.cfg["top_p"]
Also consider adding a short assertion message mirroring the top_k check if missing.
examples/configs/vlm_grpo_3B_megatron.yaml (1)

59-77: Config will fail validation once disabled schema ships

The new disabled TypedDict variants only accept the enabled flag, so keeping cpu_offload, sequence_parallel, tensor_parallel_size, etc. while enabled: false will now trigger an extra_forbidden validation error when PolicyConfig is parsed. Please either strip these fields when the feature is disabled or relax the schema so the existing knobs remain optional, otherwise this config won’t load. (hugovk-typing.readthedocs.io)

🧹 Nitpick comments (2)

nemo_rl/models/generation/__init__.py (1)
30-36: Make the warning actionable with stacklevel

Add stacklevel=2 so the warning points at the caller site.
-        warnings.warn(
+        warnings.warn(
             "'_pad_token_id' found in generation config and will be overridden with tokenizer.pad_token_id. "
             "Note: '_pad_token_id' is intended for internal use and has no effect when set in user-provided configs.",
-            UserWarning,
+            UserWarning,
+            stacklevel=2,
         )
nemo_rl/models/generation/vllm/vllm_worker.py (1)
378-382: hf_overrides: merge instead of overwrite — good

Initializing to dict and updating preserves existing overrides.

If nested sub-maps (e.g., rope_scaling) need merging, consider a shallow+deep update helper:
def deep_update(dst, src):
    for k, v in src.items():
        if isinstance(v, dict) and isinstance(dst.get(k), dict):
            deep_update(dst[k], v)
        else:
            dst[k] = v
Use instead of plain update when required.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e941b4e and 17c3e37.

📒 Files selected for processing (42)

.pre-commit-config.yaml (1 hunks)
docs/design-docs/generation.md (1 hunks)
examples/configs/distillation_math.yaml (1 hunks)
examples/configs/distillation_math_megatron.yaml (1 hunks)
examples/configs/dpo.yaml (2 hunks)
examples/configs/grpo_math_1B.yaml (2 hunks)
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml (0 hunks)
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-e2e.yaml (0 hunks)
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v3.yaml (0 hunks)
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml (0 hunks)
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml (0 hunks)
examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml (0 hunks)
examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml (0 hunks)
examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml (0 hunks)
examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml (0 hunks)
examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml (0 hunks)
examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml (0 hunks)
examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml (0 hunks)
examples/configs/sft.yaml (3 hunks)
examples/configs/sft_openmathinstruct2_megatron.yaml (1 hunks)
examples/configs/vlm_grpo_3B.yaml (2 hunks)
examples/configs/vlm_grpo_3B_megatron.yaml (2 hunks)
nemo_rl/algorithms/loss_functions.py (1 hunks)
nemo_rl/data/__init__.py (2 hunks)
nemo_rl/environments/math_environment.py (2 hunks)
nemo_rl/evals/eval.py (2 hunks)
nemo_rl/experience/rollouts.py (1 hunks)
nemo_rl/models/generation/__init__.py (2 hunks)
nemo_rl/models/generation/interfaces.py (2 hunks)
nemo_rl/models/generation/vllm/vllm_generation.py (4 hunks)
nemo_rl/models/generation/vllm/vllm_worker.py (3 hunks)
nemo_rl/models/generation/vllm/vllm_worker_async.py (2 hunks)
nemo_rl/models/policy/__init__.py (5 hunks)
nemo_rl/models/policy/lm_policy.py (2 hunks)
nemo_rl/models/policy/megatron_policy_worker.py (3 hunks)
nemo_rl/utils/checkpoint.py (2 hunks)
pyproject.toml (1 hunks)
tests/unit/models/generation/test_vllm_generation.py (1 hunks)
tests/unit/models/generation/test_vllm_large_model.py (1 hunks)
tests/unit/test_config_validation.py (1 hunks)
tests/unit/test_recipes_and_test_suites.py (1 hunks)
tools/config_cli.py (1 hunks)

💤 Files with no reviewable changes (12)

examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml
examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml
examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml
examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml
examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml
examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-e2e.yaml
examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v3.yaml

🧰 Additional context used

📓 Path-based instructions (4)

examples/configs/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/.yaml

Files:

examples/configs/grpo_math_1B.yaml
examples/configs/sft_openmathinstruct2_megatron.yaml
examples/configs/vlm_grpo_3B_megatron.yaml
examples/configs/distillation_math.yaml
examples/configs/vlm_grpo_3B.yaml
examples/configs/dpo.yaml
examples/configs/distillation_math_megatron.yaml
examples/configs/sft.yaml

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

nemo_rl/models/policy/lm_policy.py
tests/unit/test_recipes_and_test_suites.py
nemo_rl/models/generation/vllm/vllm_worker.py
nemo_rl/models/generation/vllm/vllm_worker_async.py
tools/config_cli.py
nemo_rl/evals/eval.py
nemo_rl/utils/checkpoint.py
nemo_rl/models/generation/__init__.py
tests/unit/models/generation/test_vllm_generation.py
tests/unit/models/generation/test_vllm_large_model.py
nemo_rl/algorithms/loss_functions.py
tests/unit/test_config_validation.py
nemo_rl/environments/math_environment.py
nemo_rl/experience/rollouts.py
nemo_rl/models/generation/interfaces.py
nemo_rl/data/__init__.py
nemo_rl/models/policy/megatron_policy_worker.py
nemo_rl/models/policy/__init__.py
nemo_rl/models/generation/vllm/vllm_generation.py

nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

nemo_rl/models/policy/lm_policy.py
nemo_rl/models/generation/vllm/vllm_worker.py
nemo_rl/models/generation/vllm/vllm_worker_async.py
nemo_rl/evals/eval.py
nemo_rl/utils/checkpoint.py
nemo_rl/models/generation/__init__.py
nemo_rl/algorithms/loss_functions.py
nemo_rl/environments/math_environment.py
nemo_rl/experience/rollouts.py
nemo_rl/models/generation/interfaces.py
nemo_rl/data/__init__.py
nemo_rl/models/policy/megatron_policy_worker.py
nemo_rl/models/policy/__init__.py
nemo_rl/models/generation/vllm/vllm_generation.py

docs/**/*.md

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When a markdown doc under docs/**/*.md is added or renamed, update docs/index.md to include it in the appropriate section

Files:

docs/design-docs/generation.md

🧠 Learnings (15)

📚 Learning: 2025-10-30T20:50:44.126Z

Learnt from: adil-a
Repo: NVIDIA-NeMo/RL PR: 1440
File: examples/configs/sft_automodel.yaml:48-58
Timestamp: 2025-10-30T20:50:44.126Z
Learning: In DTensor configurations for MoE (Mixture of Experts) models, expert_parallel_size and data_parallel_size can be applied together without multiplying the GPU requirements. Expert Parallelism (EP) only applies to MoE layers, while Data Parallelism/FSDP applies to non-MoE layers. Therefore, configurations like expert_parallel_size: 8 and data_parallel_size: 8 are valid on an 8-GPU cluster for MoE models.

Applied to files:

examples/configs/sft_openmathinstruct2_megatron.yaml

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/vlm/*.yaml : VLM recipe YAML filenames must follow: vlm_<algo>-<model>-<nodes>n<gpus>g-<strategy>[-modifiers][.vN].yaml

Applied to files:

tests/unit/test_recipes_and_test_suites.py
tools/config_cli.py
.pre-commit-config.yaml

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : LLM recipe YAML filenames must follow: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml

Applied to files:

tests/unit/test_recipes_and_test_suites.py
tools/config_cli.py
.pre-commit-config.yaml

📚 Learning: 2025-09-10T05:29:34.349Z

Learnt from: bxyu-nvidia
Repo: NVIDIA-NeMo/RL PR: 1110
File: nemo_rl/models/generation/vllm/vllm_worker_async.py:98-105
Timestamp: 2025-09-10T05:29:34.349Z
Learning: In the _maybe_correct_merged_tokens function in nemo_rl/models/generation/vllm/vllm_worker_async.py, the loop condition `len(candidate_token_ids) < len(actual_token_ids) - 1` is intentionally designed to prevent accessing the final token in actual_token_ids, likely to handle specific tokenization edge cases in the vLLM HTTP server integration.

Applied to files:

nemo_rl/models/generation/vllm/vllm_worker_async.py
tests/unit/models/generation/test_vllm_generation.py
tests/unit/models/generation/test_vllm_large_model.py

📚 Learning: 2025-09-10T05:34:35.406Z

Learnt from: bxyu-nvidia
Repo: NVIDIA-NeMo/RL PR: 1110
File: nemo_rl/models/generation/vllm/vllm_worker_async.py:346-359
Timestamp: 2025-09-10T05:34:35.406Z
Learning: In nemo_rl/models/generation/vllm/vllm_worker_async.py, the HTTP server intentionally uses different path structures: `/v1/chat/completions` is under the `/v1` prefix while `/tokenize` is at the root level without the `/v1` prefix. This is the intended design.

Applied to files:

nemo_rl/models/generation/vllm/vllm_worker_async.py
tests/unit/models/generation/test_vllm_generation.py

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.yaml : When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name

Applied to files:

tools/config_cli.py
.pre-commit-config.yaml

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.{yaml,sh} : Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script

Applied to files:

tools/config_cli.py
.pre-commit-config.yaml

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code

Applied to files:

nemo_rl/evals/eval.py
nemo_rl/utils/checkpoint.py
nemo_rl/algorithms/loss_functions.py
nemo_rl/environments/math_environment.py
nemo_rl/data/__init__.py
nemo_rl/models/policy/__init__.py

📚 Learning: 2025-09-17T01:52:21.399Z

Learnt from: ffrujeri
Repo: NVIDIA-NeMo/RL PR: 1023
File: nemo_rl/utils/checkpoint.py:58-65
Timestamp: 2025-09-17T01:52:21.399Z
Learning: model_state_dict_keys is not intended to be part of the nemo-rl CheckpointingConfig TypedDict - it's handled at the automodel implementation layer, not as a general checkpointing configuration parameter.

Applied to files:

nemo_rl/utils/checkpoint.py
nemo_rl/models/policy/__init__.py

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : Express configuration optionality via TypedDict using typing.NotRequired

Applied to files:

nemo_rl/utils/checkpoint.py
nemo_rl/environments/math_environment.py
nemo_rl/models/generation/interfaces.py
nemo_rl/data/__init__.py
nemo_rl/models/policy/__init__.py

📚 Learning: 2025-09-19T03:00:58.662Z

Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.

Applied to files:

nemo_rl/models/generation/__init__.py
tests/unit/models/generation/test_vllm_large_model.py
nemo_rl/models/generation/vllm/vllm_generation.py

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults

Applied to files:

nemo_rl/data/__init__.py
nemo_rl/models/policy/__init__.py

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.yaml : Recipe YAMLs under examples/configs/recipes/** are runnable snapshots and may omit documentation

Applied to files:

.pre-commit-config.yaml

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/vlm/*.sh : VLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension

Applied to files:

.pre-commit-config.yaml

📚 Learning: 2025-09-19T02:44:38.451Z

Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:73-84
Timestamp: 2025-09-19T02:44:38.451Z
Learning: The scheduler configuration format with a separate "milestones: [20]" entry (not wrapped under name/kwargs) is a valid and established pattern used across GRPO, DPO, and distillation configs in the NeMo RL codebase. This format specifies transition points between different schedulers (e.g., LinearLR for warmup steps, then ConstantLR).

Applied to files:

nemo_rl/models/policy/__init__.py

🧬 Code graph analysis (9)

nemo_rl/models/generation/vllm/vllm_worker.py (1)

nemo_rl/models/generation/interfaces.py (1)

verify_right_padding (23-99)

nemo_rl/models/generation/vllm/vllm_worker_async.py (1)

nemo_rl/models/generation/interfaces.py (1)

verify_right_padding (23-99)

nemo_rl/evals/eval.py (3)

nemo_rl/environments/math_environment.py (1)

MathEnvConfig (42-46)

nemo_rl/models/generation/interfaces.py (1)

GenerationConfig (118-131)

nemo_rl/models/policy/__init__.py (1)

TokenizerConfig (129-133)

nemo_rl/models/generation/__init__.py (2)

tests/unit/models/generation/test_vllm_generation.py (1)

tokenizer (238-241)

tests/unit/models/generation/test_vllm_large_model.py (1)

tokenizer (82-85)

tests/unit/models/generation/test_vllm_generation.py (2)

tests/unit/environments/test_retriever.py (1)

tokenizer (84-93)

tests/unit/environments/test_code_environment.py (1)

tokenizer (85-94)

tests/unit/models/generation/test_vllm_large_model.py (2)

tests/unit/environments/test_retriever.py (1)

tokenizer (84-93)

tests/unit/environments/test_code_environment.py (1)

tokenizer (85-94)

tests/unit/test_config_validation.py (5)

tests/unit/data/packing/test_algorithms.py (1)

algorithms (97-104)

nemo_rl/evals/eval.py (1)

MasterConfig (57-63)

nemo_rl/algorithms/distillation.py (1)

MasterConfig (110-121)

nemo_rl/algorithms/grpo.py (1)

MasterConfig (161-169)

tools/config_cli.py (1)

load_config_with_inheritance (100-141)

nemo_rl/experience/rollouts.py (2)

tests/unit/environments/test_retriever.py (1)

tokenizer (84-93)

tests/unit/environments/test_code_environment.py (1)

tokenizer (85-94)

nemo_rl/models/policy/__init__.py (2)

nemo_rl/models/generation/interfaces.py (1)

GenerationConfig (118-131)

nemo_rl/models/policy/megatron_policy_worker.py (1)

freeze_moe_router (251-263)

🪛 Ruff (0.14.2)

nemo_rl/models/generation/__init__.py

31-31: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

tests/unit/test_config_validation.py

49-49: Avoid specifying long messages outside the exception class

(TRY003)

104-104: Avoid specifying long messages outside the exception class

(TRY003)

127-127: Local variable config_type is assigned to but never used

Remove assignment to unused variable config_type

(F841)

129-131: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Docs_Tests
GitHub Check: Post submodule check comment / Comment on PR
GitHub Check: Post automodel integration comment / Comment on PR

🔇 Additional comments (25)

tests/unit/models/generation/test_vllm_generation.py (1)

440-440: LGTM! Padding key migration aligned with PR objectives.

The change from "pad_token_id" to "_pad_token_id" correctly reflects the broader codebase migration to use an internal padding key. The fallback to tokenizer.pad_token_id ensures safety if the key is missing.

tests/unit/models/generation/test_vllm_large_model.py (1)

171-171: LGTM! Consistent padding key migration.

The update to use "_pad_token_id" with a fallback to tokenizer.pad_token_id is consistent with the parallel change in test_vllm_generation.py and aligns with the PR's internal padding key migration.

nemo_rl/models/generation/vllm/vllm_worker_async.py (2)

531-531: LGTM! Padding verification updated to use internal key.

The right-padding verification now correctly uses self.cfg["_pad_token_id"] instead of "pad_token_id". This aligns with the PR's migration to the internal padding key, which is populated by configure_generation_config.

639-639: LGTM! Output tensor initialization uses internal padding key.

The change to use self.cfg["_pad_token_id"] for filling the output tensor is consistent with the broader padding key migration across the vLLM generation code.

examples/configs/vlm_grpo_3B.yaml (2)

96-96: LGTM! Explicit boolean default for defer_fp32_logits.

Changing from null to explicit False improves config clarity and aligns with the TypedDict typing updates across the codebase.

119-122: LGTM! New optimizer CPU offload configuration.

The addition of optimizer_cpu_offload and optimizer_offload_fraction with explicit defaults provides clear control over optimizer offloading behavior. The section comment "# optimizer cpu offload" helps document the purpose.

examples/configs/distillation_math_megatron.yaml (1)

60-60: LGTM! Explicit boolean default for defer_fp32_logits.

Changing from null to False provides an explicit default for the Megatron configuration, consistent with similar updates across distillation and GRPO configs.

examples/configs/distillation_math.yaml (1)

106-106: LGTM! Explicit boolean default for defer_fp32_logits.

The explicit False default is clearer than null and aligns with the parallel change in distillation_math_megatron.yaml and broader Megatron configuration standardization.

.pre-commit-config.yaml (1)

74-74: LGTM! Distillation workflow added to minimize-check.

The new distillation minimize-check workflow correctly follows the established pattern for dpo, grpo, and sft recipes. This ensures distillation recipes under examples/configs/recipes/llm/distillation-*.yaml are properly minimized before merge, consistent with the new distillation base config at examples/configs/distillation_math.yaml.

examples/configs/dpo.yaml (2)

116-116: LGTM! New defer_fp32_logits configuration field.

Adding defer_fp32_logits: False provides explicit control over FP32 logits behavior in the Megatron configuration, consistent with similar additions across DPO, SFT, GRPO, and distillation configs.

159-159: LGTM! New use_custom_fsdp configuration field.

The addition of use_custom_fsdp: false under distributed_data_parallel_config provides explicit control over custom FSDP usage, consistent with the parallel addition in examples/configs/sft_openmathinstruct2_megatron.yaml.

tools/config_cli.py (1)

49-55: Example loop: distillation inclusion LGTM

The added algo and base_config branch look correct and consistent with the other cases. No action needed.

nemo_rl/models/generation/__init__.py (1)

36-36: Internal key assignment LGTM

Overriding to tokenizer.pad_token_id is correct; keeps user config surface clean.

nemo_rl/models/generation/interfaces.py (1)

125-132: GenerationConfig shape updates LGTM

Nullable top_k/top_p fields and internal _pad_token_id are coherent with the new flow. Please ensure docs reflect these shapes.

Confirm docs mention _pad_token_id is internal-only and set by configure_generation_config.

nemo_rl/models/policy/lm_policy.py (2)

734-741: Checkpoint format guard LGTM

Condition correctly blocks model_save_format for DTensorPolicyWorker (_v2=False).

583-586: Verify that _pad_token_id is properly injected before generate() is called

The code at line 585 directly accesses self.cfg["generation"]["_pad_token_id"] without defensive fallback. While configure_generation_config() reliably sets this value and current usage patterns (across all algorithms and tests) consistently call it before Policy instantiation, the direct dict access creates a brittleness risk.

Verification confirmed:

configure_generation_config() properly sets config["_pad_token_id"] = tokenizer.pad_token_id

All algorithm paths (sft.py, grpo.py, distillation.py, dpo.py, rm.py) call configure_generation_config() before instantiating Policy

All test files follow the same pattern

generate() asserts generation config is not None but does not validate _pad_token_id presence

The pattern holds in current code, but the direct dict access (not .get()) means any future path that bypasses configure_generation_config() before calling generate() will raise KeyError. Consider adding explicit validation in generate() or documenting this as a required precondition.

examples/configs/sft.yaml (2)

36-39: dtensor_cfg.env_vars default LGTM

Explicit empty mapping is fine and self-documenting.

75-97: megatron_cfg.env_vars and defer_fp32_logits LGTM

These align with worker checks. Ensure docs briefly define defer_fp32_logits.

Confirm other exemplar configs that set logprob_chunk_size also set defer_fp32_logits: true.

nemo_rl/models/generation/vllm/vllm_worker.py (2)

543-544: Use of _pad_token_id for input validation LGTM

Switching to the internal pad token keeps checks consistent.

576-578: Padding generated sequences with _pad_token_id LGTM

Correct pad value used when constructing full outputs.

nemo_rl/models/generation/vllm/vllm_generation.py (3)

121-129: Explicit model_name requirement LGTM

Runtime validation is appropriate since workers read model_name directly.

452-455: from_batches padding LGTM

Using _pad_token_id for aggregation is consistent with the new contract.

502-505: generate_text padding LGTM

Same note; consistent usage.

nemo_rl/models/policy/megatron_policy_worker.py (2)

291-294: FP32-cast gating via defer_fp32_logits LGTM

Clear, explicit wiring into model creation.

640-647: Runtime guard for logprob_chunk_size is properly maintained

Verification confirms all recipes that set policy.logprob_chunk_size also set megatron_cfg.defer_fp32_logits: true. Only one recipe uses this setting (grpo-math-qwen3-30ba3b-megatron-tp4-32k.yaml), and it correctly configures both parameters. The assertion is sound and prevents the inconsistent numerical mode.

coderabbitai · 2025-11-02T03:59:58Z

+    split: NotRequired[str | None]
+    shuffle: bool
+    seed: NotRequired[int | None]


⚠️ Potential issue | 🟠 Major

Keep shuffle optional or update every config

Making shuffle a required key here breaks config validation for existing configs. For instance, examples/configs/sft_openmathinstruct2_megatron.yaml in this same PR still omits data.shuffle, so the TypeAdapter[DataConfig] validation path will now raise “Field required: shuffle.” Please either revert this to NotRequired[bool] or add the key (with documented purpose/default) to every data config shipped with the repo before merging. Based on learnings

🤖 Prompt for AI Agents

In nemo_rl/data/__init__.py around lines 33 to 35, the data config type made shuffle required which breaks validation for existing configs; change the type back to NotRequired[bool] or add data.shuffle to every shipped config file (with a short comment/doc and a sensible default such as false) so TypeAdapter[DataConfig] no longer raises "Field required: shuffle." Ensure tests/config examples are updated accordingly if you opt to add the key across configs.

coderabbitai · 2025-11-02T03:59:58Z

    generation_outputs = BatchedDataDict.from_batches(
        ordered_batched_data_dicts,
-        pad_value_dict={"output_ids": pad_token_id, "logprobs": 0.0},
+        pad_value_dict={"output_ids": tokenizer.pad_token_id, "logprobs": 0.0},
    )


⚠️ Potential issue | 🔴 Critical

Reinstate config-derived pad token for async batching

BatchedDataDict.from_batches expects a real pad ID, and a lot of our policies feed one via policy_generation.cfg["pad_token_id"] (especially LLaMA-style tokenizers where tokenizer.pad_token_id is left as None). By dropping that override, async generation now either pads with None (raising a runtime error when torch builds the padded tensor) or with the tokenizer’s default instead of the policy’s configured _pad_token_id, so the sequences we push back into the rollout log are malformed. Please restore the config fallback:

- generation_outputs = BatchedDataDict.from_batches( - ordered_batched_data_dicts, - pad_value_dict={"output_ids": tokenizer.pad_token_id, "logprobs": 0.0}, - ) + pad_token_id = policy_generation.cfg.get("pad_token_id", tokenizer.pad_token_id) + generation_outputs = BatchedDataDict.from_batches( + ordered_batched_data_dicts, + pad_value_dict={"output_ids": pad_token_id, "logprobs": 0.0}, + )

🤖 Prompt for AI Agents

In nemo_rl/experience/rollouts.py around lines 161 to 164, the pad_value_dict currently uses tokenizer.pad_token_id which can be None for some tokenizers; restore the policy-config fallback by using policy_generation.cfg["pad_token_id"] (or .get with tokenizer.pad_token_id as fallback) as the pad id so BatchedDataDict.from_batches always receives a real pad ID (ensure you coerce/validate it is not None and keep "logprobs": 0.0 as-is).

coderabbitai · 2025-11-02T03:59:58Z

+class OptionalResourcesConfig(TypedDict):
+    # Same as ResourcesConfig, but fields can be null and are validated in grpo.py
+    gpus_per_node: int | None
+    num_nodes: int | None
+
+
 class ColocationConfig(TypedDict):
    enabled: bool
-    resources: NotRequired[ResourcesConfig]
+    resources: OptionalResourcesConfig



🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Avoid breaking change: make ColocationConfig.resources optional

resources changed from NotRequired[...] to a required OptionalResourcesConfig. This is likely a silent breaking change for YAMLs that omit it. Either revert to NotRequired or update all exemplar configs.

-class ColocationConfig(TypedDict): +class ColocationConfig(TypedDict): enabled: bool - resources: OptionalResourcesConfig + resources: NotRequired[OptionalResourcesConfig]

🏁 Script executed:

#!/bin/bash # Find configs that define 'colocated:' without nested 'resources:' rg -nP -g 'examples/**/*.yaml' -C2 'colocated:\s*\n(?!.*\n\s*resources:)'

Length of output: 0

--- 🏁 Script executed: ```shell #!/bin/bash # Find all YAML files in examples/ that contain 'colocated:' echo "=== YAML files containing 'colocated:' ===" fd -e yaml -e yml . examples/ | while read f; do if grep -q "colocated:" "$f" 2>/dev/null; then echo "File: $f" grep -n -A5 "colocated:" "$f" fi done echo -e "\n=== Search for GenerationConfig in code ===" rg -n "class GenerationConfig" --type=py echo -e "\n=== Search for resources field in YAML examples ===" rg -n "resources:" --type-add 'yaml:*.{yaml,yml}' --type=yaml -C2 examples/ | head -50

Length of output: 370

🏁 Script executed:

#!/bin/bash # Get GenerationConfig and surrounding context echo "=== GenerationConfig definition ===" sed -n '100,180p' git/nemo_rl/models/generation/interfaces.py echo -e "\n=== Search for grpo.py ===" find . -name "grpo.py" -o -name "*grpo*" 2>/dev/null | head -20 echo -e "\n=== Search for ColocationConfig usage ===" rg -n "ColocationConfig" --type=py -B2 -A2 echo -e "\n=== Search for resources field access ===" rg -n "\.resources" --type=py nemo_rl/ | head -30

Length of output: 3411

🏁 Script executed:

#!/bin/bash # Check grpo.py validation logic around resource access echo "=== grpo.py validation logic ===" rg -n -B5 -A3 "policy.generation.colocated.resources" nemo_rl/algorithms/grpo.py | head -80 echo -e "\n=== Sample YAML configs with colocated ===" # Check a few YAML configs for colocated structure for f in ./examples/configs/grpo_math_8B.yaml ./examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml; do if [ -f "$f" ]; then echo "File: $f" grep -n -A10 "colocated:" "$f" echo "---" fi done

Length of output: 2451

🏁 Script executed:

#!/bin/bash # Find where inference_gpus_per_node and inference_nodes are assigned echo "=== Extract grpo.py validation section (200-400 lines) ===" rg -n -B40 "if colocated_inference:" nemo_rl/algorithms/grpo.py | head -100 echo -e "\n=== Check for .get() or safe access patterns ===" rg -n "colocated\[" nemo_rl/algorithms/grpo.py echo -e "\n=== Full context around colocated_inference check ===" sed -n '270,400p' nemo_rl/algorithms/grpo.py

Length of output: 9678

🏁 Script executed:

#!/bin/bash # Search for YAML files that might have colocated defined echo "=== YAML files with 'colocated:' keyword ===" fd -e yaml -e yml . examples/configs | xargs grep -l "colocated:" 2>/dev/null | head -10 echo -e "\n=== Check specific config files for colocated structure ===" grep -n -A5 "colocated:" examples/configs/grpo_math_8B.yaml examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml 2>/dev/null | head -50 echo -e "\n=== Check non-colocated example specifically ===" grep -n -B2 -A8 "colocated:" examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml 2>/dev/null

Length of output: 869

Make ColocationConfig.resources optional to preserve backward compatibility

The change making resources required in ColocationConfig is a breaking change. The code at line 351 in grpo.py directly accesses generation_config["colocated"]["resources"] without guards when colocated.enabled is false. If a YAML config has colocated: {enabled: false} without a resources field, it will fail with a KeyError.

Since GenerationConfig.colocated is NotRequired (users may provide colocated configs), any provided colocated config should have an optional resources field to maintain backward compatibility:

class ColocationConfig(TypedDict): enabled: bool - resources: OptionalResourcesConfig + resources: NotRequired[OptionalResourcesConfig]

Additionally, consider adding validation guards in grpo.py around line 351 to handle the case where resources is missing, or document the implicit requirement in a comment.

🤖 Prompt for AI Agents

In nemo_rl/models/generation/interfaces.py around lines 107 to 116, the ColocationConfig currently requires a resources key which breaks configs like colocated: {enabled: false}; change the TypedDict so resources is optional (use typing.NotRequired or mark the field as total=False) and type it as OptionalResourcesConfig, and also update grpo.py around line 351 to guard access (check for "resources" key or use dict.get before indexing) so missing resources does not raise KeyError.

coderabbitai · 2025-11-02T03:59:58Z

+class DTensorConfigDisabled(TypedDict):
+    enabled: Literal[False]
+
+
 class DTensorConfig(TypedDict):
-    enabled: bool
-    env_vars: NotRequired[dict[str, str]]
+    enabled: Literal[True]
+    env_vars: NotRequired[dict[str, str] | None]
    _v2: NotRequired[bool]
-    cpu_offload: NotRequired[bool]
-    sequence_parallel: NotRequired[bool]
-    activation_checkpointing: NotRequired[bool]
-    tensor_parallel_size: NotRequired[int]
-    context_parallel_size: NotRequired[int]
-    custom_parallel_plan: NotRequired[str]
-    clear_cache_every_n_steps: NotRequired[int]
+    cpu_offload: bool
+    sequence_parallel: bool
+    activation_checkpointing: bool
+    tensor_parallel_size: int
+    context_parallel_size: int
+    custom_parallel_plan: str | None
+    clear_cache_every_n_steps: NotRequired[int | None]
+
+
+class SequencePackingConfigDisabled(TypedDict):
+    enabled: Literal[False]




⚠️ Potential issue | 🔴 Critical

Disabled TypedDicts must allow legacy fields

DTensorConfigDisabled, SequencePackingConfigDisabled, and DynamicBatchingConfigDisabled now only declare enabled. Pydantic’s TypedDict validation forbids keys that aren’t declared, so any existing config that keeps the old knobs while toggling enabled: false (see the GRPO YAML in this PR) will now raise extra_forbidden during validation. Please reintroduce these fields as NotRequired[...] (or otherwise loosen the schema) so we don’t break every config that relied on the previous structure. (hugovk-typing.readthedocs.io)

🤖 Prompt for AI Agents

In nemo_rl/models/policy/__init__.py around lines 20 to 40, the Disabled TypedDicts (DTensorConfigDisabled, SequencePackingConfigDisabled and the missing DynamicBatchingConfigDisabled) only declare enabled which causes Pydantic to forbid legacy keys; restore the previous optional keys on each Disabled TypedDict by adding the same fields present in their Enabled counterparts as NotRequired[...] with the same types (e.g., for DTensorConfigDisabled add env_vars: NotRequired[dict[str,str]|None], _v2: NotRequired[bool], cpu_offload: NotRequired[bool], sequence_parallel: NotRequired[bool], activation_checkpointing: NotRequired[bool], tensor_parallel_size: NotRequired[int], context_parallel_size: NotRequired[int], custom_parallel_plan: NotRequired[str|None], clear_cache_every_n_steps: NotRequired[int|None] and similarly make the SequencePackingConfigDisabled and DynamicBatchingConfigDisabled include the formerly allowed optional knobs as NotRequired) so existing configs with extra keys validate when enabled is false.

chtruong814 requested review from a team as code owners October 31, 2025 23:15

github-actions Bot added Documentation Improvements or additions to documentation CI Relating to CI labels Oct 31, 2025

terrykong force-pushed the chtruong/cp-1413-r0.4.0 branch from 79269af to 110685e Compare November 1, 2025 06:28

terrykong force-pushed the chtruong/cp-1413-r0.4.0 branch from 110685e to a89b4d0 Compare November 1, 2025 06:28

github-actions Bot removed the CI Relating to CI label Nov 1, 2025

terrykong force-pushed the chtruong/cp-1413-r0.4.0 branch from a89b4d0 to 967c6d9 Compare November 2, 2025 03:44

terrykong requested a review from a team as a code owner November 2, 2025 03:44

terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Nov 2, 2025

terrykong had a problem deploying to nemo-ci November 2, 2025 03:45 — with GitHub Actions Error

terrykong and others added 3 commits November 2, 2025 03:45

chore: use pydantic for yaml test validation (#1382)

6440cf5

Signed-off-by: Terry Kong <terryk@nvidia.com>

fix mcore opt in vlm configs

18df7a4

Signed-off-by: Terry Kong <terryk@nvidia.com>

fix: append to hf_overrides rather than overwriting (#1413)

17c3e37

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong force-pushed the chtruong/cp-1413-r0.4.0 branch from 967c6d9 to 17c3e37 Compare November 2, 2025 03:45

terrykong removed the CI:L1 Run doctests, unit tests, and functional tests label Nov 2, 2025

terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Nov 2, 2025

terrykong approved these changes Nov 2, 2025

View reviewed changes

terrykong temporarily deployed to nemo-ci November 2, 2025 03:46 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci November 2, 2025 03:47 — with GitHub Actions Inactive

coderabbitai Bot reviewed Nov 2, 2025

View reviewed changes

terrykong temporarily deployed to nemo-ci November 2, 2025 07:48 — with GitHub Actions Inactive

terrykong merged commit 4f6ab6b into r0.4.0 Nov 2, 2025
43 of 44 checks passed

terrykong deleted the chtruong/cp-1413-r0.4.0 branch November 2, 2025 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cp: `append to hf_overrides rather than overwriting` (#1413) into `r0.4.0`#1460

cp: `append to hf_overrides rather than overwriting` (#1413) into `r0.4.0`#1460
terrykong merged 3 commits intor0.4.0from
chtruong/cp-1413-r0.4.0

chtruong814 commented Oct 31, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Oct 31, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

github-actions Bot commented Nov 1, 2025

Uh oh!

terrykong commented Nov 1, 2025

Uh oh!

github-actions Bot commented Nov 1, 2025

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Nov 2, 2025

Uh oh!

coderabbitai Bot Nov 2, 2025

Uh oh!

coderabbitai Bot Nov 2, 2025

Uh oh!

coderabbitai Bot Nov 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chtruong814 commented Oct 31, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

github-actions Bot commented Nov 1, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

terrykong commented Nov 1, 2025

Uh oh!

github-actions Bot commented Nov 1, 2025

ℹ️ File Consistency Check

✅ DTensor Policy Worker Synchronization Check

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chtruong814 commented Oct 31, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Oct 31, 2025 •

edited

Loading