Skip to content

cp: append to hf_overrides rather than overwriting (#1413) into r0.4.0#1460

Merged
terrykong merged 3 commits intor0.4.0from
chtruong/cp-1413-r0.4.0
Nov 2, 2025
Merged

cp: append to hf_overrides rather than overwriting (#1413) into r0.4.0#1460
terrykong merged 3 commits intor0.4.0from
chtruong/cp-1413-r0.4.0

Conversation

@chtruong814
Copy link
Copy Markdown
Contributor

@chtruong814 chtruong814 commented Oct 31, 2025

What does this PR do ?

cp: append to hf_overrides rather than overwriting (#1413) into r0.4.0

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

Release Notes

  • Chores

    • Updated configuration files with refined settings for training recipes, including removal of generation stop token specifications and addition of new optimization and parallelization options.
    • Reorganized linting configuration structure.
  • Documentation

    • Enhanced configuration type definitions to support nullable and optional fields for greater flexibility in model setup and generation parameters.
  • Tests

    • Improved configuration validation with enhanced error reporting and Pydantic-based type checking across all configuration sections.

@chtruong814 chtruong814 requested review from a team as code owners October 31, 2025 23:15
@github-actions github-actions Bot added Documentation Improvements or additions to documentation CI Relating to CI labels Oct 31, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 31, 2025

📝 Walkthrough

Walkthrough

This PR extends configuration support for multiple training algorithms (distillation, DPO, SFT, GRPO, RM), introduces an internal padding key migration from pad_token_id to _pad_token_id, restructures policy configuration types to support disabled variants, expands eval dataset configurations, and refactors config validation to use Pydantic TypeAdapter.

Changes

Cohort / File(s) Summary
Configuration YAML Extensions
.pre-commit-config.yaml, examples/configs/distillation_math.yaml, examples/configs/distillation_math_megatron.yaml, examples/configs/dpo.yaml, examples/configs/sft.yaml, examples/configs/sft_openmathinstruct2_megatron.yaml, examples/configs/grpo_math_1B.yaml, examples/configs/vlm_grpo_3B.yaml, examples/configs/vlm_grpo_3B_megatron.yaml
Added new Megatron optimization fields (defer_fp32_logits, optimizer_cpu_offload, optimizer_offload_fraction, use_custom_fsdp), environment variable mappings (env_vars), and extended pre-commit distillation checks.
GRPO Recipe Cleanup
examples/configs/recipes/llm/grpo-*.yaml (8 files)
Removed stop_token_ids configuration entries from generation blocks across GRPO Llama 3.1, Llama 3.2, and Qwen recipe variants.
Distillation Recipe Updates
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml
Removed save_period: 10 from checkpointing configuration.
Padding Key Migration
nemo_rl/experience/rollouts.py, nemo_rl/models/generation/__init__.py, nemo_rl/models/generation/vllm/vllm_generation.py, nemo_rl/models/generation/vllm/vllm_worker.py, nemo_rl/models/generation/vllm/vllm_worker_async.py, nemo_rl/models/policy/lm_policy.py
Transitioned from public pad_token_id key to internal _pad_token_id for padding configuration, with warning safeguard in generation config initialization.
Generation Type System Updates
docs/design-docs/generation.md, nemo_rl/models/generation/interfaces.py
Made top_k nullable (int | None), added internal _pad_token_id field, made stop_token_ids and stop_strings nullable, and introduced OptionalResourcesConfig for resources.
Policy Configuration Restructuring
nemo_rl/models/policy/__init__.py
Introduced Disabled variant TypedDicts (DTensorConfigDisabled, SequencePackingConfigDisabled, MegatronConfigDisabled, DynamicBatchingConfigDisabled), refactored config unions to support enabled/disabled states, and broadened optional field typing.
Loss Function Type Updates
nemo_rl/algorithms/loss_functions.py
Changed ClippedPGLossConfig.ratio_clip_c from float to float | None to enable dual-clipping disabling.
Eval Data Configuration Expansion
nemo_rl/data/__init__.py
Introduced new eval-specific TypedDicts (MMLUEvalDataConfig, MMLUProEvalDataConfig, AIMEEvalDataConfig, GPQAEvalDataConfig, MathEvalDataConfig, LocalMathEvalDataConfig), unified via EvalDataConfigType alias, and relaxed DataConfig field typing to support nullable strings and optional integers.
Environment Configuration Updates
nemo_rl/environments/math_environment.py
Converted MathEnvConfig fields to use NotRequired with nullable union types (stop_strings: NotRequired[list[str] | None], verifier_type: NotRequired[str | None]).
Eval Integration
nemo_rl/evals/eval.py
Replaced MathDataConfig with EvalDataConfigType, introduced _PassThroughMathConfig wrapper, and updated MasterConfig typing.
Megatron Policy Worker
nemo_rl/models/policy/megatron_policy_worker.py
Flipped default wrap_cast_model_output_to_fp32 from None to False, added assertion enforcing defer_fp32_logits=True when logprob_chunk_size is set.
Checkpoint Configuration
nemo_rl/utils/checkpoint.py
Expanded model_save_format type from str to str | None with clarified documentation.
VllmGeneration Strictness
nemo_rl/models/generation/vllm/vllm_generation.py
Changed top_k retrieval from optional dict.get() to strict index access, added model_name validation as required config key.
VllmWorker Override Handling
nemo_rl/models/generation/vllm/vllm_worker.py
Introduced safe dict initialization for hf_overrides (ensures dict structure before merging), updated all padding lookups to use _pad_token_id.
Tooling & Test Infrastructure
.pre-commit-config.yaml, tools/config_cli.py, pyproject.toml
Added distillation algorithm to config minimization workflow, extended Ruff lint per-file-ignores, reorganized configuration under [tool.ruff.lint.per-file-ignores].
Config Validation Refactor
tests/unit/test_config_validation.py
Replaced recursive TypedDict validator with Pydantic-based validate_config_section() using TypeAdapter, centralizing validation and improving error formatting.
Test Suite Updates
tests/unit/test_recipes_and_test_suites.py
Added "rm": "examples/configs/rm.yaml" to ALGO_MAPPING_TO_BASE_YAML, removed test_all_recipes_can_merge_configs_with_base_config test function.
Generation Test Updates
tests/unit/models/generation/test_vllm_generation.py, tests/unit/models/generation/test_vllm_large_model.py
Migrated pad token key from "pad_token_id" to "_pad_token_id" in async generation path and fallback logic.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring extra attention:

  • nemo_rl/models/policy/__init__.py: Substantial TypedDict restructuring with disabled variants and union-type handling; validate correctness of enabled: Literal[True] constraints and disabled config exclusivity.
  • nemo_rl/data/__init__.py: New eval config TypedDict hierarchy; verify field unions and optional/NotRequired consistency across all eval config classes.
  • Padding key migration (_pad_token_id vs pad_token_id): Ensure all lookups are consistent across generation, workers, and policy; verify no regression in fallback logic (e.g., in test_vllm_large_model.py).
  • tests/unit/test_config_validation.py: Pydantic validation replacement; confirm error messages are helpful and all algorithm-specific config paths are properly dispatched.
  • nemo_rl/models/policy/megatron_policy_worker.py: New assertion coupling logprob_chunk_size to defer_fp32_logits=True; verify this constraint is documented and all call sites respect it.

Possibly related PRs

Suggested labels

cherry-pick, r0.4.0, Run CICD

Suggested reviewers

  • yuki-97

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Test Results For Major Changes ⚠️ Warning This PR contains major structural and breaking changes affecting core configuration systems (PolicyConfig, GenerationConfig, DataConfig), generation backends, and training/inference pipelines across 40+ files. The review comments confirm multiple breaking changes and backward compatibility issues. However, the PR description lacks any test results, regression analysis, performance benchmarks, or documentation demonstrating that these changes do not introduce regressions in numerics, convergence, or performance. The PR explicitly marks pre-review items (tests, documentation) as incomplete and provides only an ellipsis in the additional information section, suggesting insufficient preparation for merge. To pass this check, the PR description should include: (1) test results from unit/integration tests demonstrating the changes work correctly; (2) if affecting generation/inference, benchmarks showing no performance regression; (3) if affecting training, validation that convergence is not impacted; (4) evidence that all existing recipe configurations still validate and function correctly; (5) completion of the pre-review checklist items marked incomplete. Alternatively, if these are intentional breaking changes, the PR should clearly document the migration path and update all affected configurations accordingly.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title references "append to hf_overrides rather than overwriting," which is a real change present in the changeset, specifically visible in the nemo_rl/models/generation/vllm/vllm_worker.py modifications where hf_overrides is initialized and merged rather than replaced. However, the PR contains far more substantial changes than this single fix suggests. The changeset includes massive type system refactoring across multiple files (changing top_k, stop_token_ids, and other fields from concrete types to nullable variants), extensive config schema updates, new TypedDict definitions for policy configurations, data configuration enhancements, test infrastructure updates, pre-commit workflow additions, and numerous other modifications. The title accurately describes one specific change but significantly undersells the scope and nature of this large, multi-faceted PR.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chtruong/cp-1413-r0.4.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@terrykong terrykong force-pushed the chtruong/cp-1413-r0.4.0 branch from 79269af to 110685e Compare November 1, 2025 06:28
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Nov 1, 2025

ℹ️ File Consistency Check

Check based on commit: 110685e (PR #1460 from chtruong/cp-1413-r0.4.0)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@terrykong terrykong force-pushed the chtruong/cp-1413-r0.4.0 branch from 110685e to a89b4d0 Compare November 1, 2025 06:28
@github-actions github-actions Bot removed the CI Relating to CI label Nov 1, 2025
@terrykong
Copy link
Copy Markdown
Collaborator

should go in after #1458

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Nov 1, 2025

ℹ️ File Consistency Check

Check based on commit: a89b4d0 (PR #1460 from chtruong/cp-1413-r0.4.0)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@terrykong terrykong force-pushed the chtruong/cp-1413-r0.4.0 branch from a89b4d0 to 967c6d9 Compare November 2, 2025 03:44
@terrykong terrykong requested a review from a team as a code owner November 2, 2025 03:44
@terrykong terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Nov 2, 2025
terrykong and others added 3 commits November 2, 2025 03:45
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong force-pushed the chtruong/cp-1413-r0.4.0 branch from 967c6d9 to 17c3e37 Compare November 2, 2025 03:45
@terrykong terrykong removed the CI:L1 Run doctests, unit tests, and functional tests label Nov 2, 2025
@terrykong terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Nov 2, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
nemo_rl/models/generation/vllm/vllm_generation.py (1)

92-103: Avoid in-code defaults; require sampling params from YAML

top_p is fetched with a default (1.0). Per nemo_rl guidelines, avoid non-None defaults in code; YAML should be the single source of truth.

-        top_p: float = self.cfg.get("top_p", 1.0)
+        top_p: float = self.cfg["top_p"]

Also consider adding a short assertion message mirroring the top_k check if missing.

examples/configs/vlm_grpo_3B_megatron.yaml (1)

59-77: Config will fail validation once disabled schema ships

The new disabled TypedDict variants only accept the enabled flag, so keeping cpu_offload, sequence_parallel, tensor_parallel_size, etc. while enabled: false will now trigger an extra_forbidden validation error when PolicyConfig is parsed. Please either strip these fields when the feature is disabled or relax the schema so the existing knobs remain optional, otherwise this config won’t load. (hugovk-typing.readthedocs.io)

🧹 Nitpick comments (2)
nemo_rl/models/generation/__init__.py (1)

30-36: Make the warning actionable with stacklevel

Add stacklevel=2 so the warning points at the caller site.

-        warnings.warn(
+        warnings.warn(
             "'_pad_token_id' found in generation config and will be overridden with tokenizer.pad_token_id. "
             "Note: '_pad_token_id' is intended for internal use and has no effect when set in user-provided configs.",
-            UserWarning,
+            UserWarning,
+            stacklevel=2,
         )
nemo_rl/models/generation/vllm/vllm_worker.py (1)

378-382: hf_overrides: merge instead of overwrite — good

Initializing to dict and updating preserves existing overrides.

If nested sub-maps (e.g., rope_scaling) need merging, consider a shallow+deep update helper:

def deep_update(dst, src):
    for k, v in src.items():
        if isinstance(v, dict) and isinstance(dst.get(k), dict):
            deep_update(dst[k], v)
        else:
            dst[k] = v

Use instead of plain update when required.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e941b4e and 17c3e37.

📒 Files selected for processing (42)
  • .pre-commit-config.yaml (1 hunks)
  • docs/design-docs/generation.md (1 hunks)
  • examples/configs/distillation_math.yaml (1 hunks)
  • examples/configs/distillation_math_megatron.yaml (1 hunks)
  • examples/configs/dpo.yaml (2 hunks)
  • examples/configs/grpo_math_1B.yaml (2 hunks)
  • examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-e2e.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v3.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml (0 hunks)
  • examples/configs/sft.yaml (3 hunks)
  • examples/configs/sft_openmathinstruct2_megatron.yaml (1 hunks)
  • examples/configs/vlm_grpo_3B.yaml (2 hunks)
  • examples/configs/vlm_grpo_3B_megatron.yaml (2 hunks)
  • nemo_rl/algorithms/loss_functions.py (1 hunks)
  • nemo_rl/data/__init__.py (2 hunks)
  • nemo_rl/environments/math_environment.py (2 hunks)
  • nemo_rl/evals/eval.py (2 hunks)
  • nemo_rl/experience/rollouts.py (1 hunks)
  • nemo_rl/models/generation/__init__.py (2 hunks)
  • nemo_rl/models/generation/interfaces.py (2 hunks)
  • nemo_rl/models/generation/vllm/vllm_generation.py (4 hunks)
  • nemo_rl/models/generation/vllm/vllm_worker.py (3 hunks)
  • nemo_rl/models/generation/vllm/vllm_worker_async.py (2 hunks)
  • nemo_rl/models/policy/__init__.py (5 hunks)
  • nemo_rl/models/policy/lm_policy.py (2 hunks)
  • nemo_rl/models/policy/megatron_policy_worker.py (3 hunks)
  • nemo_rl/utils/checkpoint.py (2 hunks)
  • pyproject.toml (1 hunks)
  • tests/unit/models/generation/test_vllm_generation.py (1 hunks)
  • tests/unit/models/generation/test_vllm_large_model.py (1 hunks)
  • tests/unit/test_config_validation.py (1 hunks)
  • tests/unit/test_recipes_and_test_suites.py (1 hunks)
  • tools/config_cli.py (1 hunks)
💤 Files with no reviewable changes (12)
  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml
  • examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml
  • examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml
  • examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml
  • examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml
  • examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-e2e.yaml
  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml
  • examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v3.yaml
🧰 Additional context used
📓 Path-based instructions (4)
examples/configs/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/
.yaml

Files:

  • examples/configs/grpo_math_1B.yaml
  • examples/configs/sft_openmathinstruct2_megatron.yaml
  • examples/configs/vlm_grpo_3B_megatron.yaml
  • examples/configs/distillation_math.yaml
  • examples/configs/vlm_grpo_3B.yaml
  • examples/configs/dpo.yaml
  • examples/configs/distillation_math_megatron.yaml
  • examples/configs/sft.yaml
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

  • nemo_rl/models/policy/lm_policy.py
  • tests/unit/test_recipes_and_test_suites.py
  • nemo_rl/models/generation/vllm/vllm_worker.py
  • nemo_rl/models/generation/vllm/vllm_worker_async.py
  • tools/config_cli.py
  • nemo_rl/evals/eval.py
  • nemo_rl/utils/checkpoint.py
  • nemo_rl/models/generation/__init__.py
  • tests/unit/models/generation/test_vllm_generation.py
  • tests/unit/models/generation/test_vllm_large_model.py
  • nemo_rl/algorithms/loss_functions.py
  • tests/unit/test_config_validation.py
  • nemo_rl/environments/math_environment.py
  • nemo_rl/experience/rollouts.py
  • nemo_rl/models/generation/interfaces.py
  • nemo_rl/data/__init__.py
  • nemo_rl/models/policy/megatron_policy_worker.py
  • nemo_rl/models/policy/__init__.py
  • nemo_rl/models/generation/vllm/vllm_generation.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

  • nemo_rl/models/policy/lm_policy.py
  • nemo_rl/models/generation/vllm/vllm_worker.py
  • nemo_rl/models/generation/vllm/vllm_worker_async.py
  • nemo_rl/evals/eval.py
  • nemo_rl/utils/checkpoint.py
  • nemo_rl/models/generation/__init__.py
  • nemo_rl/algorithms/loss_functions.py
  • nemo_rl/environments/math_environment.py
  • nemo_rl/experience/rollouts.py
  • nemo_rl/models/generation/interfaces.py
  • nemo_rl/data/__init__.py
  • nemo_rl/models/policy/megatron_policy_worker.py
  • nemo_rl/models/policy/__init__.py
  • nemo_rl/models/generation/vllm/vllm_generation.py
docs/**/*.md

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When a markdown doc under docs/**/*.md is added or renamed, update docs/index.md to include it in the appropriate section

Files:

  • docs/design-docs/generation.md
🧠 Learnings (15)
📚 Learning: 2025-10-30T20:50:44.126Z
Learnt from: adil-a
Repo: NVIDIA-NeMo/RL PR: 1440
File: examples/configs/sft_automodel.yaml:48-58
Timestamp: 2025-10-30T20:50:44.126Z
Learning: In DTensor configurations for MoE (Mixture of Experts) models, expert_parallel_size and data_parallel_size can be applied together without multiplying the GPU requirements. Expert Parallelism (EP) only applies to MoE layers, while Data Parallelism/FSDP applies to non-MoE layers. Therefore, configurations like expert_parallel_size: 8 and data_parallel_size: 8 are valid on an 8-GPU cluster for MoE models.

Applied to files:

  • examples/configs/sft_openmathinstruct2_megatron.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/vlm/*.yaml : VLM recipe YAML filenames must follow: vlm_<algo>-<model>-<nodes>n<gpus>g-<strategy>[-modifiers][.vN].yaml

Applied to files:

  • tests/unit/test_recipes_and_test_suites.py
  • tools/config_cli.py
  • .pre-commit-config.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : LLM recipe YAML filenames must follow: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml

Applied to files:

  • tests/unit/test_recipes_and_test_suites.py
  • tools/config_cli.py
  • .pre-commit-config.yaml
📚 Learning: 2025-09-10T05:29:34.349Z
Learnt from: bxyu-nvidia
Repo: NVIDIA-NeMo/RL PR: 1110
File: nemo_rl/models/generation/vllm/vllm_worker_async.py:98-105
Timestamp: 2025-09-10T05:29:34.349Z
Learning: In the _maybe_correct_merged_tokens function in nemo_rl/models/generation/vllm/vllm_worker_async.py, the loop condition `len(candidate_token_ids) < len(actual_token_ids) - 1` is intentionally designed to prevent accessing the final token in actual_token_ids, likely to handle specific tokenization edge cases in the vLLM HTTP server integration.

Applied to files:

  • nemo_rl/models/generation/vllm/vllm_worker_async.py
  • tests/unit/models/generation/test_vllm_generation.py
  • tests/unit/models/generation/test_vllm_large_model.py
📚 Learning: 2025-09-10T05:34:35.406Z
Learnt from: bxyu-nvidia
Repo: NVIDIA-NeMo/RL PR: 1110
File: nemo_rl/models/generation/vllm/vllm_worker_async.py:346-359
Timestamp: 2025-09-10T05:34:35.406Z
Learning: In nemo_rl/models/generation/vllm/vllm_worker_async.py, the HTTP server intentionally uses different path structures: `/v1/chat/completions` is under the `/v1` prefix while `/tokenize` is at the root level without the `/v1` prefix. This is the intended design.

Applied to files:

  • nemo_rl/models/generation/vllm/vllm_worker_async.py
  • tests/unit/models/generation/test_vllm_generation.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.yaml : When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name

Applied to files:

  • tools/config_cli.py
  • .pre-commit-config.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.{yaml,sh} : Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script

Applied to files:

  • tools/config_cli.py
  • .pre-commit-config.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code

Applied to files:

  • nemo_rl/evals/eval.py
  • nemo_rl/utils/checkpoint.py
  • nemo_rl/algorithms/loss_functions.py
  • nemo_rl/environments/math_environment.py
  • nemo_rl/data/__init__.py
  • nemo_rl/models/policy/__init__.py
📚 Learning: 2025-09-17T01:52:21.399Z
Learnt from: ffrujeri
Repo: NVIDIA-NeMo/RL PR: 1023
File: nemo_rl/utils/checkpoint.py:58-65
Timestamp: 2025-09-17T01:52:21.399Z
Learning: model_state_dict_keys is not intended to be part of the nemo-rl CheckpointingConfig TypedDict - it's handled at the automodel implementation layer, not as a general checkpointing configuration parameter.

Applied to files:

  • nemo_rl/utils/checkpoint.py
  • nemo_rl/models/policy/__init__.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : Express configuration optionality via TypedDict using typing.NotRequired

Applied to files:

  • nemo_rl/utils/checkpoint.py
  • nemo_rl/environments/math_environment.py
  • nemo_rl/models/generation/interfaces.py
  • nemo_rl/data/__init__.py
  • nemo_rl/models/policy/__init__.py
📚 Learning: 2025-09-19T03:00:58.662Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.

Applied to files:

  • nemo_rl/models/generation/__init__.py
  • tests/unit/models/generation/test_vllm_large_model.py
  • nemo_rl/models/generation/vllm/vllm_generation.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults

Applied to files:

  • nemo_rl/data/__init__.py
  • nemo_rl/models/policy/__init__.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.yaml : Recipe YAMLs under examples/configs/recipes/** are runnable snapshots and may omit documentation

Applied to files:

  • .pre-commit-config.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/vlm/*.sh : VLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension

Applied to files:

  • .pre-commit-config.yaml
📚 Learning: 2025-09-19T02:44:38.451Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:73-84
Timestamp: 2025-09-19T02:44:38.451Z
Learning: The scheduler configuration format with a separate "milestones: [20]" entry (not wrapped under name/kwargs) is a valid and established pattern used across GRPO, DPO, and distillation configs in the NeMo RL codebase. This format specifies transition points between different schedulers (e.g., LinearLR for warmup steps, then ConstantLR).

Applied to files:

  • nemo_rl/models/policy/__init__.py
🧬 Code graph analysis (9)
nemo_rl/models/generation/vllm/vllm_worker.py (1)
nemo_rl/models/generation/interfaces.py (1)
  • verify_right_padding (23-99)
nemo_rl/models/generation/vllm/vllm_worker_async.py (1)
nemo_rl/models/generation/interfaces.py (1)
  • verify_right_padding (23-99)
nemo_rl/evals/eval.py (3)
nemo_rl/environments/math_environment.py (1)
  • MathEnvConfig (42-46)
nemo_rl/models/generation/interfaces.py (1)
  • GenerationConfig (118-131)
nemo_rl/models/policy/__init__.py (1)
  • TokenizerConfig (129-133)
nemo_rl/models/generation/__init__.py (2)
tests/unit/models/generation/test_vllm_generation.py (1)
  • tokenizer (238-241)
tests/unit/models/generation/test_vllm_large_model.py (1)
  • tokenizer (82-85)
tests/unit/models/generation/test_vllm_generation.py (2)
tests/unit/environments/test_retriever.py (1)
  • tokenizer (84-93)
tests/unit/environments/test_code_environment.py (1)
  • tokenizer (85-94)
tests/unit/models/generation/test_vllm_large_model.py (2)
tests/unit/environments/test_retriever.py (1)
  • tokenizer (84-93)
tests/unit/environments/test_code_environment.py (1)
  • tokenizer (85-94)
tests/unit/test_config_validation.py (5)
tests/unit/data/packing/test_algorithms.py (1)
  • algorithms (97-104)
nemo_rl/evals/eval.py (1)
  • MasterConfig (57-63)
nemo_rl/algorithms/distillation.py (1)
  • MasterConfig (110-121)
nemo_rl/algorithms/grpo.py (1)
  • MasterConfig (161-169)
tools/config_cli.py (1)
  • load_config_with_inheritance (100-141)
nemo_rl/experience/rollouts.py (2)
tests/unit/environments/test_retriever.py (1)
  • tokenizer (84-93)
tests/unit/environments/test_code_environment.py (1)
  • tokenizer (85-94)
nemo_rl/models/policy/__init__.py (2)
nemo_rl/models/generation/interfaces.py (1)
  • GenerationConfig (118-131)
nemo_rl/models/policy/megatron_policy_worker.py (1)
  • freeze_moe_router (251-263)
🪛 Ruff (0.14.2)
nemo_rl/models/generation/__init__.py

31-31: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

tests/unit/test_config_validation.py

49-49: Avoid specifying long messages outside the exception class

(TRY003)


104-104: Avoid specifying long messages outside the exception class

(TRY003)


127-127: Local variable config_type is assigned to but never used

Remove assignment to unused variable config_type

(F841)


129-131: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Docs_Tests
  • GitHub Check: Post submodule check comment / Comment on PR
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (25)
tests/unit/models/generation/test_vllm_generation.py (1)

440-440: LGTM! Padding key migration aligned with PR objectives.

The change from "pad_token_id" to "_pad_token_id" correctly reflects the broader codebase migration to use an internal padding key. The fallback to tokenizer.pad_token_id ensures safety if the key is missing.

tests/unit/models/generation/test_vllm_large_model.py (1)

171-171: LGTM! Consistent padding key migration.

The update to use "_pad_token_id" with a fallback to tokenizer.pad_token_id is consistent with the parallel change in test_vllm_generation.py and aligns with the PR's internal padding key migration.

nemo_rl/models/generation/vllm/vllm_worker_async.py (2)

531-531: LGTM! Padding verification updated to use internal key.

The right-padding verification now correctly uses self.cfg["_pad_token_id"] instead of "pad_token_id". This aligns with the PR's migration to the internal padding key, which is populated by configure_generation_config.


639-639: LGTM! Output tensor initialization uses internal padding key.

The change to use self.cfg["_pad_token_id"] for filling the output tensor is consistent with the broader padding key migration across the vLLM generation code.

examples/configs/vlm_grpo_3B.yaml (2)

96-96: LGTM! Explicit boolean default for defer_fp32_logits.

Changing from null to explicit False improves config clarity and aligns with the TypedDict typing updates across the codebase.


119-122: LGTM! New optimizer CPU offload configuration.

The addition of optimizer_cpu_offload and optimizer_offload_fraction with explicit defaults provides clear control over optimizer offloading behavior. The section comment "# optimizer cpu offload" helps document the purpose.

examples/configs/distillation_math_megatron.yaml (1)

60-60: LGTM! Explicit boolean default for defer_fp32_logits.

Changing from null to False provides an explicit default for the Megatron configuration, consistent with similar updates across distillation and GRPO configs.

examples/configs/distillation_math.yaml (1)

106-106: LGTM! Explicit boolean default for defer_fp32_logits.

The explicit False default is clearer than null and aligns with the parallel change in distillation_math_megatron.yaml and broader Megatron configuration standardization.

.pre-commit-config.yaml (1)

74-74: LGTM! Distillation workflow added to minimize-check.

The new distillation minimize-check workflow correctly follows the established pattern for dpo, grpo, and sft recipes. This ensures distillation recipes under examples/configs/recipes/llm/distillation-*.yaml are properly minimized before merge, consistent with the new distillation base config at examples/configs/distillation_math.yaml.

examples/configs/dpo.yaml (2)

116-116: LGTM! New defer_fp32_logits configuration field.

Adding defer_fp32_logits: False provides explicit control over FP32 logits behavior in the Megatron configuration, consistent with similar additions across DPO, SFT, GRPO, and distillation configs.


159-159: LGTM! New use_custom_fsdp configuration field.

The addition of use_custom_fsdp: false under distributed_data_parallel_config provides explicit control over custom FSDP usage, consistent with the parallel addition in examples/configs/sft_openmathinstruct2_megatron.yaml.

tools/config_cli.py (1)

49-55: Example loop: distillation inclusion LGTM

The added algo and base_config branch look correct and consistent with the other cases. No action needed.

nemo_rl/models/generation/__init__.py (1)

36-36: Internal key assignment LGTM

Overriding to tokenizer.pad_token_id is correct; keeps user config surface clean.

nemo_rl/models/generation/interfaces.py (1)

125-132: GenerationConfig shape updates LGTM

Nullable top_k/top_p fields and internal _pad_token_id are coherent with the new flow. Please ensure docs reflect these shapes.

Confirm docs mention _pad_token_id is internal-only and set by configure_generation_config.

nemo_rl/models/policy/lm_policy.py (2)

734-741: Checkpoint format guard LGTM

Condition correctly blocks model_save_format for DTensorPolicyWorker (_v2=False).


583-586: Verify that _pad_token_id is properly injected before generate() is called

The code at line 585 directly accesses self.cfg["generation"]["_pad_token_id"] without defensive fallback. While configure_generation_config() reliably sets this value and current usage patterns (across all algorithms and tests) consistently call it before Policy instantiation, the direct dict access creates a brittleness risk.

Verification confirmed:

  • configure_generation_config() properly sets config["_pad_token_id"] = tokenizer.pad_token_id
  • All algorithm paths (sft.py, grpo.py, distillation.py, dpo.py, rm.py) call configure_generation_config() before instantiating Policy
  • All test files follow the same pattern
  • generate() asserts generation config is not None but does not validate _pad_token_id presence

The pattern holds in current code, but the direct dict access (not .get()) means any future path that bypasses configure_generation_config() before calling generate() will raise KeyError. Consider adding explicit validation in generate() or documenting this as a required precondition.

examples/configs/sft.yaml (2)

36-39: dtensor_cfg.env_vars default LGTM

Explicit empty mapping is fine and self-documenting.


75-97: megatron_cfg.env_vars and defer_fp32_logits LGTM

These align with worker checks. Ensure docs briefly define defer_fp32_logits.

Confirm other exemplar configs that set logprob_chunk_size also set defer_fp32_logits: true.

nemo_rl/models/generation/vllm/vllm_worker.py (2)

543-544: Use of _pad_token_id for input validation LGTM

Switching to the internal pad token keeps checks consistent.


576-578: Padding generated sequences with _pad_token_id LGTM

Correct pad value used when constructing full outputs.

nemo_rl/models/generation/vllm/vllm_generation.py (3)

121-129: Explicit model_name requirement LGTM

Runtime validation is appropriate since workers read model_name directly.


452-455: from_batches padding LGTM

Using _pad_token_id for aggregation is consistent with the new contract.


502-505: generate_text padding LGTM

Same note; consistent usage.

nemo_rl/models/policy/megatron_policy_worker.py (2)

291-294: FP32-cast gating via defer_fp32_logits LGTM

Clear, explicit wiring into model creation.


640-647: Runtime guard for logprob_chunk_size is properly maintained

Verification confirms all recipes that set policy.logprob_chunk_size also set megatron_cfg.defer_fp32_logits: true. Only one recipe uses this setting (grpo-math-qwen3-30ba3b-megatron-tp4-32k.yaml), and it correctly configures both parameters. The assertion is sound and prevents the inconsistent numerical mode.

Comment thread nemo_rl/data/__init__.py
Comment on lines +33 to +35
split: NotRequired[str | None]
shuffle: bool
seed: NotRequired[int | None]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep shuffle optional or update every config

Making shuffle a required key here breaks config validation for existing configs. For instance, examples/configs/sft_openmathinstruct2_megatron.yaml in this same PR still omits data.shuffle, so the TypeAdapter[DataConfig] validation path will now raise “Field required: shuffle.” Please either revert this to NotRequired[bool] or add the key (with documented purpose/default) to every data config shipped with the repo before merging. Based on learnings

🤖 Prompt for AI Agents
In nemo_rl/data/__init__.py around lines 33 to 35, the data config type made
shuffle required which breaks validation for existing configs; change the type
back to NotRequired[bool] or add data.shuffle to every shipped config file (with
a short comment/doc and a sensible default such as false) so
TypeAdapter[DataConfig] no longer raises "Field required: shuffle." Ensure
tests/config examples are updated accordingly if you opt to add the key across
configs.

Comment on lines 161 to 164
generation_outputs = BatchedDataDict.from_batches(
ordered_batched_data_dicts,
pad_value_dict={"output_ids": pad_token_id, "logprobs": 0.0},
pad_value_dict={"output_ids": tokenizer.pad_token_id, "logprobs": 0.0},
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Reinstate config-derived pad token for async batching

BatchedDataDict.from_batches expects a real pad ID, and a lot of our policies feed one via policy_generation.cfg["pad_token_id"] (especially LLaMA-style tokenizers where tokenizer.pad_token_id is left as None). By dropping that override, async generation now either pads with None (raising a runtime error when torch builds the padded tensor) or with the tokenizer’s default instead of the policy’s configured _pad_token_id, so the sequences we push back into the rollout log are malformed. Please restore the config fallback:

-    generation_outputs = BatchedDataDict.from_batches(
-        ordered_batched_data_dicts,
-        pad_value_dict={"output_ids": tokenizer.pad_token_id, "logprobs": 0.0},
-    )
+    pad_token_id = policy_generation.cfg.get("pad_token_id", tokenizer.pad_token_id)
+    generation_outputs = BatchedDataDict.from_batches(
+        ordered_batched_data_dicts,
+        pad_value_dict={"output_ids": pad_token_id, "logprobs": 0.0},
+    )
🤖 Prompt for AI Agents
In nemo_rl/experience/rollouts.py around lines 161 to 164, the pad_value_dict
currently uses tokenizer.pad_token_id which can be None for some tokenizers;
restore the policy-config fallback by using
policy_generation.cfg["pad_token_id"] (or .get with tokenizer.pad_token_id as
fallback) as the pad id so BatchedDataDict.from_batches always receives a real
pad ID (ensure you coerce/validate it is not None and keep "logprobs": 0.0
as-is).

Comment on lines +107 to 116
class OptionalResourcesConfig(TypedDict):
# Same as ResourcesConfig, but fields can be null and are validated in grpo.py
gpus_per_node: int | None
num_nodes: int | None


class ColocationConfig(TypedDict):
enabled: bool
resources: NotRequired[ResourcesConfig]
resources: OptionalResourcesConfig

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Avoid breaking change: make ColocationConfig.resources optional

resources changed from NotRequired[...] to a required OptionalResourcesConfig. This is likely a silent breaking change for YAMLs that omit it. Either revert to NotRequired or update all exemplar configs.

-class ColocationConfig(TypedDict):
+class ColocationConfig(TypedDict):
     enabled: bool
-    resources: OptionalResourcesConfig
+    resources: NotRequired[OptionalResourcesConfig]

🏁 Script executed:

#!/bin/bash
# Find configs that define 'colocated:' without nested 'resources:'
rg -nP -g 'examples/**/*.yaml' -C2 'colocated:\s*\n(?!.*\n\s*resources:)' 

Length of output: 0



---

🏁 Script executed:

```shell
#!/bin/bash
# Find all YAML files in examples/ that contain 'colocated:'
echo "=== YAML files containing 'colocated:' ==="
fd -e yaml -e yml . examples/ | while read f; do
  if grep -q "colocated:" "$f" 2>/dev/null; then
    echo "File: $f"
    grep -n -A5 "colocated:" "$f"
  fi
done

echo -e "\n=== Search for GenerationConfig in code ==="
rg -n "class GenerationConfig" --type=py

echo -e "\n=== Search for resources field in YAML examples ==="
rg -n "resources:" --type-add 'yaml:*.{yaml,yml}' --type=yaml -C2 examples/ | head -50

Length of output: 370


🏁 Script executed:

#!/bin/bash
# Get GenerationConfig and surrounding context
echo "=== GenerationConfig definition ==="
sed -n '100,180p' git/nemo_rl/models/generation/interfaces.py

echo -e "\n=== Search for grpo.py ==="
find . -name "grpo.py" -o -name "*grpo*" 2>/dev/null | head -20

echo -e "\n=== Search for ColocationConfig usage ==="
rg -n "ColocationConfig" --type=py -B2 -A2

echo -e "\n=== Search for resources field access ==="
rg -n "\.resources" --type=py nemo_rl/ | head -30

Length of output: 3411


🏁 Script executed:

#!/bin/bash
# Check grpo.py validation logic around resource access
echo "=== grpo.py validation logic ==="
rg -n -B5 -A3 "policy.generation.colocated.resources" nemo_rl/algorithms/grpo.py | head -80

echo -e "\n=== Sample YAML configs with colocated ==="
# Check a few YAML configs for colocated structure
for f in ./examples/configs/grpo_math_8B.yaml ./examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml; do
  if [ -f "$f" ]; then
    echo "File: $f"
    grep -n -A10 "colocated:" "$f"
    echo "---"
  fi
done

Length of output: 2451


🏁 Script executed:

#!/bin/bash
# Find where inference_gpus_per_node and inference_nodes are assigned
echo "=== Extract grpo.py validation section (200-400 lines) ==="
rg -n -B40 "if colocated_inference:" nemo_rl/algorithms/grpo.py | head -100

echo -e "\n=== Check for .get() or safe access patterns ==="
rg -n "colocated\[" nemo_rl/algorithms/grpo.py

echo -e "\n=== Full context around colocated_inference check ==="
sed -n '270,400p' nemo_rl/algorithms/grpo.py

Length of output: 9678


🏁 Script executed:

#!/bin/bash
# Search for YAML files that might have colocated defined
echo "=== YAML files with 'colocated:' keyword ==="
fd -e yaml -e yml . examples/configs | xargs grep -l "colocated:" 2>/dev/null | head -10

echo -e "\n=== Check specific config files for colocated structure ==="
grep -n -A5 "colocated:" examples/configs/grpo_math_8B.yaml examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml 2>/dev/null | head -50

echo -e "\n=== Check non-colocated example specifically ==="
grep -n -B2 -A8 "colocated:" examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml 2>/dev/null

Length of output: 869


Make ColocationConfig.resources optional to preserve backward compatibility

The change making resources required in ColocationConfig is a breaking change. The code at line 351 in grpo.py directly accesses generation_config["colocated"]["resources"] without guards when colocated.enabled is false. If a YAML config has colocated: {enabled: false} without a resources field, it will fail with a KeyError.

Since GenerationConfig.colocated is NotRequired (users may provide colocated configs), any provided colocated config should have an optional resources field to maintain backward compatibility:

class ColocationConfig(TypedDict):
    enabled: bool
-    resources: OptionalResourcesConfig
+    resources: NotRequired[OptionalResourcesConfig]

Additionally, consider adding validation guards in grpo.py around line 351 to handle the case where resources is missing, or document the implicit requirement in a comment.

🤖 Prompt for AI Agents
In nemo_rl/models/generation/interfaces.py around lines 107 to 116, the
ColocationConfig currently requires a resources key which breaks configs like
colocated: {enabled: false}; change the TypedDict so resources is optional (use
typing.NotRequired or mark the field as total=False) and type it as
OptionalResourcesConfig, and also update grpo.py around line 351 to guard access
(check for "resources" key or use dict.get before indexing) so missing resources
does not raise KeyError.

Comment on lines +20 to 40
class DTensorConfigDisabled(TypedDict):
enabled: Literal[False]


class DTensorConfig(TypedDict):
enabled: bool
env_vars: NotRequired[dict[str, str]]
enabled: Literal[True]
env_vars: NotRequired[dict[str, str] | None]
_v2: NotRequired[bool]
cpu_offload: NotRequired[bool]
sequence_parallel: NotRequired[bool]
activation_checkpointing: NotRequired[bool]
tensor_parallel_size: NotRequired[int]
context_parallel_size: NotRequired[int]
custom_parallel_plan: NotRequired[str]
clear_cache_every_n_steps: NotRequired[int]
cpu_offload: bool
sequence_parallel: bool
activation_checkpointing: bool
tensor_parallel_size: int
context_parallel_size: int
custom_parallel_plan: str | None
clear_cache_every_n_steps: NotRequired[int | None]


class SequencePackingConfigDisabled(TypedDict):
enabled: Literal[False]


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Disabled TypedDicts must allow legacy fields

DTensorConfigDisabled, SequencePackingConfigDisabled, and DynamicBatchingConfigDisabled now only declare enabled. Pydantic’s TypedDict validation forbids keys that aren’t declared, so any existing config that keeps the old knobs while toggling enabled: false (see the GRPO YAML in this PR) will now raise extra_forbidden during validation. Please reintroduce these fields as NotRequired[...] (or otherwise loosen the schema) so we don’t break every config that relied on the previous structure. (hugovk-typing.readthedocs.io)

🤖 Prompt for AI Agents
In nemo_rl/models/policy/__init__.py around lines 20 to 40, the Disabled
TypedDicts (DTensorConfigDisabled, SequencePackingConfigDisabled and the missing
DynamicBatchingConfigDisabled) only declare enabled which causes Pydantic to
forbid legacy keys; restore the previous optional keys on each Disabled
TypedDict by adding the same fields present in their Enabled counterparts as
NotRequired[...] with the same types (e.g., for DTensorConfigDisabled add
env_vars: NotRequired[dict[str,str]|None], _v2: NotRequired[bool], cpu_offload:
NotRequired[bool], sequence_parallel: NotRequired[bool],
activation_checkpointing: NotRequired[bool], tensor_parallel_size:
NotRequired[int], context_parallel_size: NotRequired[int], custom_parallel_plan:
NotRequired[str|None], clear_cache_every_n_steps: NotRequired[int|None] and
similarly make the SequencePackingConfigDisabled and
DynamicBatchingConfigDisabled include the formerly allowed optional knobs as
NotRequired) so existing configs with extra keys validate when enabled is false.

@terrykong terrykong merged commit 4f6ab6b into r0.4.0 Nov 2, 2025
43 of 44 checks passed
@terrykong terrykong deleted the chtruong/cp-1413-r0.4.0 branch November 2, 2025 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests Documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants