feat: DTensorPolicyV2 GPT-OSS SFT support by adil-a · Pull Request #1470 · NVIDIA-NeMo/RL

adil-a · 2025-11-04T18:23:52Z

What does this PR do ?

Adds GPT-OSS SFT using AutoModel custom models + DeepEP.

To run, launch the nightly container and run

NRL_FORCE_REBUILD_VENVS=true uv run examples/run_sft.py --config examples/configs/recipes/llm/sft-gpt-oss-20b-1n8g-fsdp8ep8-automodel.yaml cluster.gpus_per_node=8 logger.wandb_enabled=false

GPT OSS SFT nightly on Squad

GRPO Qwen 2.5 7b nightly

Llama 3.1 8b lora nightly

DPO Llama 3.1 8b nightly

github-actions · 2025-11-05T06:50:10Z

⚠️ File Consistency Check

Check based on commit: e936ebf (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-05T06:50:34Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: e936ebf (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/5e995e9535e63cbe3358dc2bd81a8ed3a696cee7/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

github-actions · 2025-11-05T06:52:10Z

⚠️ File Consistency Check

Check based on commit: 7df0cc5 (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-05T06:52:42Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: 7df0cc5 (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/5e995e9535e63cbe3358dc2bd81a8ed3a696cee7/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

github-actions · 2025-11-05T06:58:14Z

⚠️ File Consistency Check

Check based on commit: b4139f1 (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-05T06:58:26Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: b4139f1 (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/5e995e9535e63cbe3358dc2bd81a8ed3a696cee7/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

github-actions · 2025-11-05T07:00:34Z

⚠️ File Consistency Check

Check based on commit: a55a2f1 (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-05T07:00:36Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: a55a2f1 (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/5e995e9535e63cbe3358dc2bd81a8ed3a696cee7/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

coderabbitai · 2025-11-05T07:01:32Z

📝 Walkthrough

Walkthrough

The pull request updates the Automodel submodule pointer, integrates Automodel Checkpointer into DTensorPolicyWorkerV2 for improved checkpoint handling, adds a comprehensive SFT training configuration example, removes the legacy automodel_checkpoint utility module, updates package dependencies (grouped_gemm, transformer-engine, deep_ep), and enhances venv setup with CUDA architecture configuration.

Changes

Cohort / File(s)	Summary
Automodel Submodule & Configuration `3rdparty/Automodel-workspace/Automodel`, `examples/configs/sft_automodel.yaml`	Submodule pointer updated to commit 5e995e9535e63cbe3358dc2bd81a8ed3a696cee7. New comprehensive YAML configuration for SFT training with detailed sections for model scale (gpt-oss-20b), mixed-precision settings, distributed training, optimizer/scheduler, and logging backends (wandb, tensorboard, mlflow).
Checkpoint Management Integration `nemo_rl/models/policy/dtensor_policy_worker_v2.py`	Replaced direct state-dict loading with Automodel Checkpointer workflow. Added checkpoint format detection (detect_checkpoint_format, _infer_checkpoint_root), persistent Checkpointer management (_ensure_checkpointer), and updated save/load paths. New imports from nemo_automodel (Checkpointer, FSDP2Manager, moe_parallelize_model). Integrated MOE/adapter model parallelization and distributed gradient utilities.
Checkpoint Utility Migration `nemo_rl/utils/automodel_checkpoint.py`	Entire file removed; functionality delegated to integrated Checkpointer in DTensorPolicyWorkerV2 and Automodel APIs.
Import Path Updates `nemo_rl/models/policy/utils.py`	Import path for Nemo automodel classes updated from `nemo_automodel.components._transformers.auto_model` to `nemo_automodel._transformers.auto_model`.
Infrastructure & Dependencies `nemo_rl/utils/venvs.py`, `pyproject.toml`	venvs.py: Added explicit shutil.rmtree cleanup and TORCH_CUDA_ARCH_LIST environment setup ("9.0 10.0 12.0"). pyproject.toml: Added grouped_gemm and transformer-engine[pytorch] (2.8.0), updated deep_ep commit hash, extended override-dependencies (opencv-python-headless>=4.11.0), modified build-isolation mappings (causal-conv1d, grouped-gemm).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant DTensorPolicyWorkerV2
    participant Checkpointer
    participant FSDP2Manager
    participant HFModel
    participant Cache

    User->>DTensorPolicyWorkerV2: load_checkpoint(path, ...)
    DTensorPolicyWorkerV2->>DTensorPolicyWorkerV2: detect_checkpoint_format(path)
    DTensorPolicyWorkerV2->>DTensorPolicyWorkerV2: _infer_checkpoint_root(path)
    DTensorPolicyWorkerV2->>HFModel: initialize empty model
    DTensorPolicyWorkerV2->>FSDP2Manager: parallelize model (FSDP2)
    DTensorPolicyWorkerV2->>DTensorPolicyWorkerV2: _ensure_checkpointer(config)
    DTensorPolicyWorkerV2->>Checkpointer: load via Automodel API
    Checkpointer->>Cache: fetch or download weights
    Checkpointer->>HFModel: apply loaded state dict
    HFModel-->>DTensorPolicyWorkerV2: model ready
    DTensorPolicyWorkerV2-->>User: checkpoint loaded

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

nemo_rl/models/policy/dtensor_policy_worker_v2.py: Substantial refactoring with new Checkpointer integration, format detection logic, FSDP2Manager usage, and MOE parallelization paths; requires careful validation of state-dict handling and rank-distributed checkpoint flow.
File deletion (nemo_rl/utils/automodel_checkpoint.py): Verify that all call sites have been migrated to the new Checkpointer-based approach in DTensorPolicyWorkerV2.
Import path changes: Confirm that the updated import path in utils.py and new imports in dtensor_policy_worker_v2.py align with Automodel's actual structure.
pyproject.toml dependency updates: Validate that new build-isolation mappings (grouped-gemm, causal-conv1d) and pinned versions are compatible with existing toolchain.

Possibly related PRs

chore: major version bump (torch 2.8, vllm 0.11, ray 2.49) & SP fixes #1334: Updates the same 3rdparty/Automodel-workspace/Automodel submodule pointer and Automodel integration patterns.
feat: Implement safetensors checkpointing format support using nemo-automodel #1023: Touches the same automodel checkpoint integration (nemo_rl/utils/automodel_checkpoint.py and DTensorPolicyWorkerV2 checkpoint workflows).

Suggested labels

CI:L1, CI

Suggested reviewers

terrykong
yaoyu-33

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR contains major changes (checkpoint system refactoring, utility module removal, new format detection) but PR description lacks test results documentation, all testing checklist items remain unchecked, and review comments indicate critical untested issues.	Document test results including unit/integration tests, performance comparisons, convergence validation, fix critical review issues, and check all testing checklist items with evidence of local test execution.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding GPT-OSS SFT (Supervised Fine-Tuning) support to DTensorPolicyV2, which is directly reflected in the primary code changes to dtensor_policy_worker_v2.py and the new sft_automodel.yaml configuration.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch hemil/automodel-moe

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8762f57 and 7df0cc5.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

3rdparty/Automodel-workspace/Automodel (1 hunks)
examples/configs/sft_automodel.yaml (1 hunks)
nemo_rl/models/policy/dtensor_policy_worker_v2.py (20 hunks)
nemo_rl/models/policy/utils.py (1 hunks)
nemo_rl/utils/automodel_checkpoint.py (0 hunks)
nemo_rl/utils/venvs.py (2 hunks)
pyproject.toml (4 hunks)

💤 Files with no reviewable changes (1)

nemo_rl/utils/automodel_checkpoint.py

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

nemo_rl/models/policy/utils.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py
nemo_rl/utils/venvs.py

nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

nemo_rl/models/policy/utils.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py
nemo_rl/utils/venvs.py

examples/configs/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/.yaml

Files:

examples/configs/sft_automodel.yaml

🧠 Learnings (4)

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults

Applied to files:

nemo_rl/models/policy/utils.py

📚 Learning: 2025-09-17T01:52:21.399Z

Learnt from: ffrujeri
Repo: NVIDIA-NeMo/RL PR: 1023
File: nemo_rl/utils/checkpoint.py:58-65
Timestamp: 2025-09-17T01:52:21.399Z
Learning: model_state_dict_keys is not intended to be part of the nemo-rl CheckpointingConfig TypedDict - it's handled at the automodel implementation layer, not as a general checkpointing configuration parameter.

Applied to files:

nemo_rl/models/policy/dtensor_policy_worker_v2.py

📚 Learning: 2025-10-30T20:50:44.126Z

Learnt from: adil-a
Repo: NVIDIA-NeMo/RL PR: 1440
File: examples/configs/sft_automodel.yaml:48-58
Timestamp: 2025-10-30T20:50:44.126Z
Learning: In DTensor configurations for MoE (Mixture of Experts) models, expert_parallel_size and data_parallel_size can be applied together without multiplying the GPU requirements. Expert Parallelism (EP) only applies to MoE layers, while Data Parallelism/FSDP applies to non-MoE layers. Therefore, configurations like expert_parallel_size: 8 and data_parallel_size: 8 are valid on an 8-GPU cluster for MoE models.

Applied to files:

nemo_rl/models/policy/dtensor_policy_worker_v2.py

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to **/*.sh : Use `uv run` to execute Python scripts in shell/driver scripts instead of activating virtualenvs and calling `python` directly

Applied to files:

nemo_rl/utils/venvs.py

🧬 Code graph analysis (1)

nemo_rl/models/policy/dtensor_policy_worker_v2.py (3)

nemo_rl/utils/checkpoint.py (1)

CheckpointingConfig (36-67)

nemo_rl/models/policy/dtensor_policy_worker.py (3)

create_context_parallel_ctx (456-480)

get_cpu_state_dict (104-134)

load_checkpoint (1920-1930)

nemo_rl/models/dtensor/parallelize.py (1)

to_local_if_dtensor (709-715)

🪛 GitHub Actions: Automodel Integration and Submodule Checks

3rdparty/Automodel-workspace/Automodel

[error] 1-1: One or more submodules are not fast-forwarded. Automodel: Commits have DIVERGED from a common ancestor. Please ensure submodule commits are fast-forwards of the main branch.

🪛 Ruff (0.14.3)

nemo_rl/models/policy/dtensor_policy_worker_v2.py

2105-2105: Loop control variable root not used within loop body

Rename unused root to _root

(B007)

2105-2105: Loop control variable dirs not used within loop body

Rename unused dirs to _dirs

(B007)

🔇 Additional comments (1)

nemo_rl/models/policy/utils.py (1)

32-32: Verify the new import path is valid in the updated nemo_automodel submodule.

The import change appears scoped and intentional—only _transformers is being reorganized out of .components, while other submodules remain unchanged. The try-except block provides appropriate fallback handling, and the imported classes are actively used in AUTOMODEL_FACTORY and resolve_model_class().

However, I could not locate test files confirming the new import path works. Please verify:

The new path (nemo_automodel._transformers.auto_model) exists and is importable in the updated nemo_automodel submodule version

Tests pass with this change, or if untested, manually confirm the import succeeds at runtime

NEMO_AUTOMODEL_AVAILABLE resolves to True in your environment after this change

github-actions · 2025-11-05T07:07:06Z

⚠️ File Consistency Check

Check based on commit: 39bd74c (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-05T07:07:17Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: 39bd74c (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/5e995e9535e63cbe3358dc2bd81a8ed3a696cee7/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

github-actions · 2025-11-05T08:30:30Z

⚠️ File Consistency Check

Check based on commit: 4e151cb (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-05T08:30:54Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: 4e151cb (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/5e995e9535e63cbe3358dc2bd81a8ed3a696cee7/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

github-actions · 2025-11-05T18:02:21Z

⚠️ File Consistency Check

Check based on commit: 4b6ce6d (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-05T18:02:33Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: 4b6ce6d (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/5e995e9535e63cbe3358dc2bd81a8ed3a696cee7/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

github-actions · 2025-11-05T18:12:56Z

⚠️ File Consistency Check

Check based on commit: ef2f92c (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-05T18:13:27Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: ef2f92c (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/5e995e9535e63cbe3358dc2bd81a8ed3a696cee7/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

github-actions · 2025-11-05T19:34:39Z

⚠️ File Consistency Check

Check based on commit: 1eef903 (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: Hemil Desai <hemild@nvidia.com>

github-actions · 2025-12-22T00:37:23Z

⚠️ File Consistency Check

Check based on commit: 7c80ed3 (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-12-22T00:37:50Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: 7c80ed3 (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/910f4e0402ec3af0c3b8642639f0347732067630/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/1d42deb98169fd94b54c714c0fe4bf308fe7115a/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

Signed-off-by: Hemil Desai <hemild@nvidia.com>

github-actions · 2025-12-22T09:39:21Z

⚠️ File Consistency Check

Check based on commit: 4db9d65 (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-12-22T09:39:54Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: 4db9d65 (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/910f4e0402ec3af0c3b8642639f0347732067630/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/1d42deb98169fd94b54c714c0fe4bf308fe7115a/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

Signed-off-by: Hemil Desai <hemild@nvidia.com>

github-actions · 2025-12-22T18:43:13Z

⚠️ File Consistency Check

Check based on commit: fc7e68f (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-12-22T18:43:47Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: fc7e68f (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/910f4e0402ec3af0c3b8642639f0347732067630/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/1d42deb98169fd94b54c714c0fe4bf308fe7115a/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

Signed-off-by: Hemil Desai <hemild@nvidia.com>

github-actions · 2025-12-22T21:37:17Z

⚠️ File Consistency Check

Check based on commit: 9659ac4 (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-12-22T21:38:04Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: 9659ac4 (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/910f4e0402ec3af0c3b8642639f0347732067630/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/1d42deb98169fd94b54c714c0fe4bf308fe7115a/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

adil-a changed the title ~~Hemil/automodel moe~~ feat: DTensorPolicyV2 GPT-OSS support Nov 5, 2025

adil-a marked this pull request as ready for review November 5, 2025 06:50

adil-a requested review from a team as code owners November 5, 2025 06:50

coderabbitai Bot reviewed Nov 5, 2025

View reviewed changes

Comment thread 3rdparty/Automodel-workspace/Automodel Outdated

Comment thread nemo_rl/models/policy/dtensor_policy_worker_v2.py

Comment thread nemo_rl/utils/venvs.py Outdated

adil-a commented Nov 5, 2025

View reviewed changes

Comment thread nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py Outdated

adil-a commented Nov 5, 2025

View reviewed changes

Comment thread pyproject.toml

adil-a added the CI:L0 Run doctests and unit tests label Nov 5, 2025

adil-a temporarily deployed to nemo-ci November 5, 2025 19:32 — with GitHub Actions Inactive

adil-a requested a review from a team as a code owner November 5, 2025 19:34

hemildesai temporarily deployed to nemo-ci December 21, 2025 17:25 — with GitHub Actions Inactive

hemildesai temporarily deployed to nemo-ci December 21, 2025 19:16 — with GitHub Actions Inactive

fix

7c80ed3

Signed-off-by: Hemil Desai <hemild@nvidia.com>

hemildesai added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Dec 22, 2025

hemildesai temporarily deployed to nemo-ci December 22, 2025 00:37 — with GitHub Actions Inactive

hemildesai temporarily deployed to nemo-ci December 22, 2025 00:41 — with GitHub Actions Inactive

hemildesai temporarily deployed to nemo-ci December 22, 2025 02:33 — with GitHub Actions Inactive

fix

4db9d65

Signed-off-by: Hemil Desai <hemild@nvidia.com>

hemildesai removed the CI:L1 Run doctests, unit tests, and functional tests label Dec 22, 2025

hemildesai added the CI:L1 Run doctests, unit tests, and functional tests label Dec 22, 2025

hemildesai temporarily deployed to nemo-ci December 22, 2025 09:39 — with GitHub Actions Inactive

hemildesai added 2 commits December 22, 2025 18:40

fix

b501317

Signed-off-by: Hemil Desai <hemild@nvidia.com>

Merge branch 'main' into hemil/automodel-moe

fc7e68f

terrykong reviewed Dec 22, 2025

View reviewed changes

Comment thread nemo_rl/models/policy/__init__.py

fix

9659ac4

Signed-off-by: Hemil Desai <hemild@nvidia.com>

yuki-97 approved these changes Dec 23, 2025

View reviewed changes

This was referenced Dec 23, 2025

cp: feat: DTensorPolicyV2 GPT-OSS SFT support (1470) into r0.5.0 #1690

Merged

feat: RL support for custom moe models in dtensor v2 #1695

Merged

coderabbitai Bot mentioned this pull request Jan 9, 2026

feat: refactor init of dtensor policy v2 #1709

Merged

hemildesai mentioned this pull request Jan 20, 2026

DPO support for gpt-oss with DTensor and Megatron Backend #1795

Closed

coderabbitai Bot mentioned this pull request Jan 21, 2026

chore: cuda13 support #1803

Open

4 tasks

Conversation

adil-a commented Nov 4, 2025 • edited by hemildesai Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Uh oh!

github-actions Bot commented Nov 5, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Nov 5, 2025

❌ Submodule Fast-Forward Check Failed

❌ Submodules that need attention:

Uh oh!

github-actions Bot commented Nov 5, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Nov 5, 2025

❌ Submodule Fast-Forward Check Failed

❌ Submodules that need attention:

Uh oh!

github-actions Bot commented Nov 5, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Nov 5, 2025

❌ Submodule Fast-Forward Check Failed

❌ Submodules that need attention:

Uh oh!

github-actions Bot commented Nov 5, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Nov 5, 2025

❌ Submodule Fast-Forward Check Failed

❌ Submodules that need attention:

Uh oh!

coderabbitai Bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Nov 5, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Nov 5, 2025

❌ Submodule Fast-Forward Check Failed

❌ Submodules that need attention:

Uh oh!

github-actions Bot commented Nov 5, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Nov 5, 2025

❌ Submodule Fast-Forward Check Failed

❌ Submodules that need attention:

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Nov 5, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Nov 5, 2025

❌ Submodule Fast-Forward Check Failed

❌ Submodules that need attention:

Uh oh!

adil-a commented Nov 4, 2025 •

edited by hemildesai

Loading

coderabbitai Bot commented Nov 5, 2025 •

edited

Loading