feat: add support for nemotron-nas with custom plan. by joyang-nv · Pull Request #1180 · NVIDIA-NeMo/RL

joyang-nv · 2025-09-22T16:59:48Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

New Features
- Added a GRPO training recipe for LLaMA/Nemotron 49B with vLLM generation, dynamic batching, and comprehensive hyperparameters.
- Introduced a custom parallelization plan to optimize sharding/layout for key model components.
Bug Fixes
- Improved compatibility with models not using the attention interface by automatically omitting incompatible flash attention settings during training and log-prob computation, enhancing stability.
Chores
- Updated Automodel subproject reference; no functional changes for end-users.

github-actions · 2025-09-22T17:00:12Z

⚠️ File Consistency Check

Check based on commit: 166e979 (PR #1180 from joyang/nemotron_custom_plan)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-09-22T17:00:22Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 166e979 (PR #1180 from joyang/nemotron_custom_plan)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

coderabbitai · 2025-09-22T17:10:49Z

📝 Walkthrough

Walkthrough

Updates a third-party submodule reference. Adds a new GRPO training YAML recipe. Introduces a custom parallelization plan for LLaMA/Nemotron. Modifies DTensorPolicyWorkerV2 to detect attention interface usage from model config, conditionally strip flash attention kwargs, and replace unshard context managers with torch.no_grad() in train and logprob paths.

Changes

Cohort / File(s)	Summary of Changes
Submodule update `3rdparty/Automodel-workspace/Automodel`	Update subproject reference from commit 7b55cabc... to 277a8a8d...; no functional code changes in this diff.
GRPO recipe config `examples/configs/recipes/llm/grpo-math-llama-nemotron-super-49b-v.5-4n8g-fsdp2tp8.yaml`	Add new comprehensive GRPO YAML configuration covering policy/model, tokenizer, dtensor/Megatron, optimizer/scheduler, dynamic batching, sequence packing, vLLM generation backend, data, logging, and cluster resources.
Custom parallel plan `examples/configs/recipes/llm/llama_nemotron_super_49b_custom_plan.py`	Add `custom_parallel_plan: dict[str, ParallelStyle]` mapping module patterns to parallel strategies (PrepareModuleInput/Output, RowwiseParallel, ColwiseParallel) with layout settings.
Policy worker attention handling `nemo_rl/models/policy/dtensor_policy_worker_v2.py`	Add `check_model_use_attention_interface(...)` to read model config and detect `_attn_implementation`. Set `self.model_use_attention_interface`. In train/get_logprobs, replace `unshard_fsdp2_model` with `torch.no_grad()` and drop `flash_attn_kwargs` when attention interface is not used. Remove unused import.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Trainer
  participant Worker as DTensorPolicyWorkerV2
  participant HF as HuggingFace Hub
  participant Model

  rect rgba(230,240,250,0.5)
    note over Worker: Initialization
    Trainer->>Worker: __init__(model_name, ...)
    Worker->>HF: load config (trust_remote_code=True)
    HF-->>Worker: config (_attn_implementation)
    Worker->>Worker: model_use_attention_interface = (_attn_implementation != "eager")
  end

  par Train step
    Trainer->>Worker: train(batch, model_args)
    rect rgba(240,230,250,0.4)
      alt model_use_attention_interface == False
        Worker->>Worker: drop flash_attn_kwargs from model_args
      else model_use_attention_interface == True
        Worker->>Worker: keep flash_attn_kwargs
      end
    end
    Worker->>Worker: torch.no_grad() context
    Worker->>Model: forward/generate(batch, model_args)
    Model-->>Worker: outputs
    Worker-->>Trainer: loss/metrics
  and Get logprobs
    Trainer->>Worker: get_logprobs(inputs, model_args)
    rect rgba(240,230,250,0.4)
      alt model_use_attention_interface == False
        Worker->>Worker: drop flash_attn_kwargs
      else
        Worker->>Worker: keep flash_attn_kwargs
      end
    end
    Worker->>Worker: torch.no_grad() context
    Worker->>Model: forward(inputs, model_args)
    Model-->>Worker: logprobs
    Worker-->>Trainer: logprobs
  end

  note over Worker,Model: unshard_fsdp2_model removed in favor of torch.no_grad()

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	This PR adds a new feature (nemotron-nas support with a custom parallel plan), introduces a new GRPO training config, and modifies DTensorPolicyWorkerV2 behavior regarding attention interface handling and unsharding, which can impact numerics and performance. The PR description, however, contains no test results, convergence evidence, or before/after performance numbers or context. Given the scope and potential runtime impact, these changes qualify as major and require documented testing to verify no regressions.	Please update the PR description with testing details: include unit/integration test results, a short GRPO training run showing stable loss/reward curves and reproducible seeds, and before/after throughput, latency, and memory numbers for representative models (e.g., nemotron-nas and llama-nemotron-super-49b) with the provided configs and hardware specs. Explicitly validate that replacing unshard_fsdp2_model with torch.no_grad and conditionally dropping flash_attn_kwargs do not change numerics or degrade performance. If available, link to W&B/MLflow runs and add a brief note on the attention-interface detection logic with a sanity check across a few pretrained configs.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "feat: add support for nemotron-nas with custom plan." is concise and directly summarizes the primary change—adding support for Nemotron NAS via a custom parallelization plan—which aligns with the added examples/configs/recipes and the new custom_parallel_plan file in the changeset; it is clear and focused for reviewers.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch joyang/nemotron_custom_plan

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d3c43c and 166e979.

📒 Files selected for processing (4)

3rdparty/Automodel-workspace/Automodel (1 hunks)
examples/configs/recipes/llm/grpo-math-llama-nemotron-super-49b-v.5-4n8g-fsdp2tp8.yaml (1 hunks)
examples/configs/recipes/llm/llama_nemotron_super_49b_custom_plan.py (1 hunks)
nemo_rl/models/policy/dtensor_policy_worker_v2.py (5 hunks)

🧰 Additional context used

📓 Path-based instructions (6)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

examples/configs/recipes/llm/llama_nemotron_super_49b_custom_plan.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

examples/configs/recipes/**

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Place recipe YAMLs under examples/configs/recipes//

Files:

examples/configs/recipes/llm/llama_nemotron_super_49b_custom_plan.py
examples/configs/recipes/llm/grpo-math-llama-nemotron-super-49b-v.5-4n8g-fsdp2tp8.yaml

examples/configs/recipes/**/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/recipes/**/*.yaml: Recipe YAMLs under examples/configs/recipes/** are runnable snapshots and may omit documentation
When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name

Files:

examples/configs/recipes/llm/grpo-math-llama-nemotron-super-49b-v.5-4n8g-fsdp2tp8.yaml

examples/configs/recipes/llm/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

LLM recipe YAML filenames must follow: --ng-[-modifiers][-long][.vN].yaml

Files:

examples/configs/recipes/llm/grpo-math-llama-nemotron-super-49b-v.5-4n8g-fsdp2tp8.yaml

examples/configs/recipes/**/*.{yaml,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script

Files:

examples/configs/recipes/llm/grpo-math-llama-nemotron-super-49b-v.5-4n8g-fsdp2tp8.yaml

nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

nemo_rl/models/policy/dtensor_policy_worker_v2.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Lint check

🔇 Additional comments (8)

3rdparty/Automodel-workspace/Automodel (1)

1-1: Submodule bump: document rationale, risks, and verify upstream delta before merge.

Script output: Old SHA 7b55cabc0a3b1d8b03b6c1f680c030ea2c8eaa77 → New SHA 277a8a8d951f6d8bf030d34915cfa61b88eebffd. Decoding .gitmodules failed in the execution environment (base64 not available), so upstream commits could not be listed.

Add a brief note linking to the upstream repo and the compare/commit subjects between these SHAs; confirm no breaking API/config changes, license updates, or security advisories were introduced.

Ensure CI runs git submodule init/update (or equivalent) so builds are reproducible.

If the upstream repo is private or you cannot add the note, run the provided script locally and paste the upstream commit subjects or a compare URL into the PR.

examples/configs/recipes/llm/grpo-math-llama-nemotron-super-49b-v.5-4n8g-fsdp2tp8.yaml (2)

1-177: LGTM! Comprehensive GRPO configuration follows recipe YAML conventions.

This new recipe YAML correctly follows the naming convention for LLM recipes and provides a complete configuration for GRPO training with the Nemotron Super 49B model. The configuration appears well-structured with appropriate hyperparameters, resource allocation, and integration with the custom parallelization plan.

58-58: Confirmed: module path and symbol exist.
examples/configs/recipes/llm/llama_nemotron_super_49b_custom_plan.py defines custom_parallel_plan (line 24).

examples/configs/recipes/llm/llama_nemotron_super_49b_custom_plan.py (2)

1-23: LGTM! Proper copyright header and imports.

The file follows the NVIDIA copyright header requirements and imports the necessary parallelization components from PyTorch's distributed tensor module.

24-49: Well-designed custom parallelization plan for Nemotron's NAS architecture.

The parallelization plan appropriately handles the unique characteristics of the Nemotron model's Neural Architecture Search (NAS) structure. Based on the web search results, Nemotron uses skip attention where "in some blocks, the attention is skipped entirely, or replaced with a single linear layer", making this custom plan necessary for proper tensor parallelization.

Key design decisions that look correct:

PrepareModuleInput for attention modules to handle attention mask layouts

ColwiseParallel for attention projections (q_proj, k_proj, v_proj) with use_local_output=False

RowwiseParallel for output projections (o_proj, down_proj) with replicated outputs

PrepareModuleOutput for rotary embeddings to ensure proper layout handling

Shard(-1) placement for lm_head to distribute vocabulary across tensor parallel ranks

nemo_rl/models/policy/dtensor_policy_worker_v2.py (3)

182-184: Good addition of attention interface detection.

The initialization of model_use_attention_interface by calling the new method provides proper runtime detection of model capabilities. This aligns with the unique architecture of Nemotron models.

897-897: Improved context management by replacing unshard_fsdp2_model with torch.no_grad().

This change simplifies the context management and removes the dependency on unshard_fsdp2_model, which appears to be the correct approach for the logprob computation path.

699-704: Correct conditional removal of flash_attn_kwargs.

The logic correctly removes flash_attn_kwargs when the model doesn't use the attention interface, preventing potential argument errors. This is consistent with the handling for VLM models and reward models in the same code paths.

Also applies to: 1015-1020

github-actions · 2025-09-24T14:33:58Z

⚠️ File Consistency Check

Check based on commit: 02f4575 (PR #1180 from joyang/nemotron_custom_plan)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-09-24T14:35:20Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 02f4575 (PR #1180 from joyang/nemotron_custom_plan)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

github-actions · 2025-09-24T17:23:37Z

⚠️ File Consistency Check

Check based on commit: d2fbfde (PR #1180 from joyang/nemotron_custom_plan)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-09-24T17:23:59Z

✅ Submodule Fast-Forward Check Results

Check based on commit: d2fbfde (PR #1180 from joyang/nemotron_custom_plan)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

github-actions · 2025-09-25T01:31:34Z

⚠️ File Consistency Check

Check based on commit: 65f96eb (PR #1180 from joyang/nemotron_custom_plan)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-09-25T01:31:44Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 65f96eb (PR #1180 from joyang/nemotron_custom_plan)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

Signed-off-by: Jonas Yang <joyang@nvidia.com>

github-actions · 2025-09-25T06:53:21Z

⚠️ File Consistency Check

Check based on commit: af2a100 (PR #1180 from joyang/nemotron_custom_plan)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-09-25T06:57:35Z

✅ Submodule Fast-Forward Check Results

Check based on commit: af2a100 (PR #1180 from joyang/nemotron_custom_plan)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

Signed-off-by: Jonas Yang <joyang@nvidia.com>

github-actions · 2025-09-25T08:45:35Z

⚠️ File Consistency Check

Check based on commit: 2037827 (PR #1180 from joyang/nemotron_custom_plan)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-09-25T08:45:43Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 2037827 (PR #1180 from joyang/nemotron_custom_plan)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

Signed-off-by: Jonas Yang <joyang@nvidia.com>

Signed-off-by: Jonas Yang <joyang@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

joyang-nv requested review from a team as code owners September 22, 2025 16:59

coderabbitai Bot reviewed Sep 22, 2025

View reviewed changes

Comment thread nemo_rl/models/policy/dtensor_policy_worker_v2.py Outdated

terrykong reviewed Sep 22, 2025

View reviewed changes

Comment thread examples/configs/recipes/llm/llama_nemotron_super_49b_custom_plan.py

terrykong reviewed Sep 22, 2025

View reviewed changes

Comment thread examples/configs/recipes/llm/grpo-math-llama-nemotron-super-49b-v.5-4n8g-fsdp2tp8.yaml

adil-a reviewed Sep 23, 2025

View reviewed changes

Comment thread nemo_rl/models/policy/dtensor_policy_worker_v2.py Outdated

joyang-nv force-pushed the joyang/nemotron_custom_plan branch from 166e979 to 02f4575 Compare September 24, 2025 14:33

joyang-nv requested a review from a team as a code owner September 24, 2025 14:33

joyang-nv added the CI:L1 Run doctests, unit tests, and functional tests label Sep 24, 2025

joyang-nv temporarily deployed to nemo-ci September 24, 2025 17:24 — with GitHub Actions Inactive

joyang-nv requested review from adil-a and terrykong September 24, 2025 17:24

joyang-nv temporarily deployed to nemo-ci September 24, 2025 17:47 — with GitHub Actions Inactive

joyang-nv added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Sep 25, 2025

joyang-nv temporarily deployed to nemo-ci September 25, 2025 01:32 — with GitHub Actions Inactive

joyang-nv temporarily deployed to nemo-ci September 25, 2025 01:36 — with GitHub Actions Inactive

joyang-nv added 3 commits September 24, 2025 23:52

feat: add support for nemotron-nas with custom plan.

400e5b6

Signed-off-by: Jonas Yang <joyang@nvidia.com>

Adding to nightly test suite.

1061757

Signed-off-by: Jonas Yang <joyang@nvidia.com>

Follow up review.

5428a6e

Signed-off-by: Jonas Yang <joyang@nvidia.com>

joyang-nv force-pushed the joyang/nemotron_custom_plan branch from 65f96eb to af2a100 Compare September 25, 2025 06:52

joyang-nv added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Sep 25, 2025

joyang-nv temporarily deployed to nemo-ci September 25, 2025 06:53 — with GitHub Actions Inactive

Fix ci.

2037827

Signed-off-by: Jonas Yang <joyang@nvidia.com>

joyang-nv force-pushed the joyang/nemotron_custom_plan branch from af2a100 to 2037827 Compare September 25, 2025 08:45

joyang-nv added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Sep 25, 2025

joyang-nv temporarily deployed to nemo-ci September 25, 2025 08:48 — with GitHub Actions Inactive

joyang-nv had a problem deploying to nemo-ci September 25, 2025 08:49 — with GitHub Actions Error

joyang-nv added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Sep 25, 2025

joyang-nv temporarily deployed to nemo-ci September 25, 2025 08:51 — with GitHub Actions Inactive

joyang-nv temporarily deployed to nemo-ci September 25, 2025 08:52 — with GitHub Actions Inactive

joyang-nv temporarily deployed to nemo-ci September 25, 2025 10:15 — with GitHub Actions Inactive

terrykong approved these changes Sep 25, 2025

View reviewed changes

terrykong enabled auto-merge (squash) September 25, 2025 15:56

terrykong merged commit 56a6225 into main Sep 25, 2025
51 of 56 checks passed

terrykong deleted the joyang/nemotron_custom_plan branch September 25, 2025 16:04

terrykong mentioned this pull request Oct 1, 2025

draft: add llama nemtron super #635

Closed

4 tasks

PrinsYin pushed a commit to PrinsYin/RL that referenced this pull request Nov 30, 2025

feat: add support for nemotron-nas with custom plan. (NVIDIA-NeMo#1180)

a951201

Signed-off-by: Jonas Yang <joyang@nvidia.com>

This was referenced Dec 24, 2025

feat: RL support for custom moe models in dtensor v2 #1695

Merged

Add FP8 support to nano-v3 branch #1704

Open

yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026

feat: add support for nemotron-nas with custom plan. (NVIDIA-NeMo#1180)

d420813

Signed-off-by: Jonas Yang <joyang@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

Conversation

joyang-nv commented Sep 22, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Sep 22, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Sep 22, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

coderabbitai Bot commented Sep 22, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Sep 24, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Sep 24, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions Bot commented Sep 24, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Sep 24, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions Bot commented Sep 25, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Sep 25, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions Bot commented Sep 25, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Sep 25, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions Bot commented Sep 25, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Sep 25, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

joyang-nv commented Sep 22, 2025 •

edited by coderabbitai Bot

Loading