feat: DTensorPolicyV2 GPT-OSS SFT support#1470
Conversation
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: e936ebf (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: 7df0cc5 (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: b4139f1 (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: a55a2f1 (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
📝 WalkthroughWalkthroughThe pull request updates the Automodel submodule pointer, integrates Automodel Checkpointer into DTensorPolicyWorkerV2 for improved checkpoint handling, adds a comprehensive SFT training configuration example, removes the legacy automodel_checkpoint utility module, updates package dependencies (grouped_gemm, transformer-engine, deep_ep), and enhances venv setup with CUDA architecture configuration. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant DTensorPolicyWorkerV2
participant Checkpointer
participant FSDP2Manager
participant HFModel
participant Cache
User->>DTensorPolicyWorkerV2: load_checkpoint(path, ...)
DTensorPolicyWorkerV2->>DTensorPolicyWorkerV2: detect_checkpoint_format(path)
DTensorPolicyWorkerV2->>DTensorPolicyWorkerV2: _infer_checkpoint_root(path)
DTensorPolicyWorkerV2->>HFModel: initialize empty model
DTensorPolicyWorkerV2->>FSDP2Manager: parallelize model (FSDP2)
DTensorPolicyWorkerV2->>DTensorPolicyWorkerV2: _ensure_checkpointer(config)
DTensorPolicyWorkerV2->>Checkpointer: load via Automodel API
Checkpointer->>Cache: fetch or download weights
Checkpointer->>HFModel: apply loaded state dict
HFModel-->>DTensorPolicyWorkerV2: model ready
DTensorPolicyWorkerV2-->>User: checkpoint loaded
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
3rdparty/Automodel-workspace/Automodel(1 hunks)examples/configs/sft_automodel.yaml(1 hunks)nemo_rl/models/policy/dtensor_policy_worker_v2.py(20 hunks)nemo_rl/models/policy/utils.py(1 hunks)nemo_rl/utils/automodel_checkpoint.py(0 hunks)nemo_rl/utils/venvs.py(2 hunks)pyproject.toml(4 hunks)
💤 Files with no reviewable changes (1)
- nemo_rl/utils/automodel_checkpoint.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts
Files:
nemo_rl/models/policy/utils.pynemo_rl/models/policy/dtensor_policy_worker_v2.pynemo_rl/utils/venvs.py
nemo_rl/**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)
Files:
nemo_rl/models/policy/utils.pynemo_rl/models/policy/dtensor_policy_worker_v2.pynemo_rl/utils/venvs.py
examples/configs/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/.yaml
Files:
examples/configs/sft_automodel.yaml
🧠 Learnings (4)
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Applied to files:
nemo_rl/models/policy/utils.py
📚 Learning: 2025-09-17T01:52:21.399Z
Learnt from: ffrujeri
Repo: NVIDIA-NeMo/RL PR: 1023
File: nemo_rl/utils/checkpoint.py:58-65
Timestamp: 2025-09-17T01:52:21.399Z
Learning: model_state_dict_keys is not intended to be part of the nemo-rl CheckpointingConfig TypedDict - it's handled at the automodel implementation layer, not as a general checkpointing configuration parameter.
Applied to files:
nemo_rl/models/policy/dtensor_policy_worker_v2.py
📚 Learning: 2025-10-30T20:50:44.126Z
Learnt from: adil-a
Repo: NVIDIA-NeMo/RL PR: 1440
File: examples/configs/sft_automodel.yaml:48-58
Timestamp: 2025-10-30T20:50:44.126Z
Learning: In DTensor configurations for MoE (Mixture of Experts) models, expert_parallel_size and data_parallel_size can be applied together without multiplying the GPU requirements. Expert Parallelism (EP) only applies to MoE layers, while Data Parallelism/FSDP applies to non-MoE layers. Therefore, configurations like expert_parallel_size: 8 and data_parallel_size: 8 are valid on an 8-GPU cluster for MoE models.
Applied to files:
nemo_rl/models/policy/dtensor_policy_worker_v2.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to **/*.sh : Use `uv run` to execute Python scripts in shell/driver scripts instead of activating virtualenvs and calling `python` directly
Applied to files:
nemo_rl/utils/venvs.py
🧬 Code graph analysis (1)
nemo_rl/models/policy/dtensor_policy_worker_v2.py (3)
nemo_rl/utils/checkpoint.py (1)
CheckpointingConfig(36-67)nemo_rl/models/policy/dtensor_policy_worker.py (3)
create_context_parallel_ctx(456-480)get_cpu_state_dict(104-134)load_checkpoint(1920-1930)nemo_rl/models/dtensor/parallelize.py (1)
to_local_if_dtensor(709-715)
🪛 GitHub Actions: Automodel Integration and Submodule Checks
3rdparty/Automodel-workspace/Automodel
[error] 1-1: One or more submodules are not fast-forwarded. Automodel: Commits have DIVERGED from a common ancestor. Please ensure submodule commits are fast-forwards of the main branch.
🪛 Ruff (0.14.3)
nemo_rl/models/policy/dtensor_policy_worker_v2.py
2105-2105: Loop control variable root not used within loop body
Rename unused root to _root
(B007)
2105-2105: Loop control variable dirs not used within loop body
Rename unused dirs to _dirs
(B007)
🔇 Additional comments (1)
nemo_rl/models/policy/utils.py (1)
32-32: Verify the new import path is valid in the updated nemo_automodel submodule.The import change appears scoped and intentional—only
_transformersis being reorganized out of.components, while other submodules remain unchanged. The try-except block provides appropriate fallback handling, and the imported classes are actively used inAUTOMODEL_FACTORYandresolve_model_class().However, I could not locate test files confirming the new import path works. Please verify:
- The new path (
nemo_automodel._transformers.auto_model) exists and is importable in the updated nemo_automodel submodule version- Tests pass with this change, or if untested, manually confirm the import succeeds at runtime
NEMO_AUTOMODEL_AVAILABLEresolves toTruein your environment after this change
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: 39bd74c (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: 4e151cb (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: 4b6ce6d (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: ef2f92c (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
|
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: 7c80ed3 (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: 4db9d65 (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: fc7e68f (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
|
❌ Submodule Fast-Forward Check FailedCheck based on commit: 9659ac4 (PR #1470 from ❌ Submodules that need attention:Automodel: ❌ Commits have DIVERGED from a common ancestor Please ensure all submodule commits are fast-forwards of the main branch before merging. |
What does this PR do ?
Adds GPT-OSS SFT using AutoModel custom models + DeepEP.
To run, launch the nightly container and run
GPT OSS SFT nightly on Squad
GRPO Qwen 2.5 7b nightly
Llama 3.1 8b lora nightly
DPO Llama 3.1 8b nightly
