fix: Fix DCP-to-HF conversion for model-wrapped checkpoints by RayenTian · Pull Request #1881 · NVIDIA-NeMo/RL

RayenTian · 2026-02-05T03:31:57Z

Summary

The file layout differs between dtensor v1 and v2. Using policy/weights as the ckpt path for v2 cannot find data, and vice versa.
DCP exports wrap weights under {"model": ...} in dtensor v1, while dtensor v2 saves a flat state_dict.

In this PR, we:

Fix DCP-to-HF conversion to handle both wrapped ({"model": ...}) and flat state dict formats.
Align dtensor v1 and v2 to the same layout: model weights in .../weights/model and optimizer in .../optimizer/optim.
Update documentation examples to use the correct DCP weights path (.../optimizer/optim).
Align checkpoint conversion guidance across README and guides.

Related issue

The file structure of V1

results/grpo_tp2_v1/step_1/
├── config.yaml
├── policy
│   ├── optimizer
│   │   ├── __0_0.distcp
│   │   ├── __1_0.distcp
│   │   ├── __2_0.distcp
│   │   ├── __3_0.distcp
│   │   ├── __4_0.distcp
│   │   ├── __5_0.distcp
│   │   ├── __6_0.distcp
│   │   └── __7_0.distcp
│   ├── tokenizer
│   │   ├── added_tokens.json
│   │   ├── chat_template.jinja
│   │   ├── merges.txt
│   │   ├── special_tokens_map.json
│   │   ├── tokenizer_config.json
│   │   ├── tokenizer.json
│   │   └── vocab.json
│   └── weights
│       ├── __0_0.distcp
│       ├── __1_0.distcp
│       ├── __2_0.distcp
│       ├── __3_0.distcp
│       ├── __4_0.distcp
│       ├── __5_0.distcp
│       ├── __6_0.distcp
│       └── __7_0.distcp
├── train_dataloader.pt
└── training_info.json

The structure of V2

results/grpo_tp2_v2/step_1
├── config.yaml
├── policy
│   ├── optimizer
│   │   └── optim
│   │       ├── __0_0.distcp
│   │       ├── __1_0.distcp
│   │       ├── __2_0.distcp
│   │       ├── __3_0.distcp
│   │       ├── __4_0.distcp
│   │       ├── __5_0.distcp
│   │       ├── __6_0.distcp
│   │       └── __7_0.distcp
│   ├── tokenizer
│   │   ├── added_tokens.json
│   │   ├── chat_template.jinja
│   │   ├── merges.txt
│   │   ├── special_tokens_map.json
│   │   ├── tokenizer_config.json
│   │   ├── tokenizer.json
│   │   └── vocab.json
│   └── weights
│       └── model
│           ├── __0_0.distcp
│           ├── __1_0.distcp
│           ├── __2_0.distcp
│           ├── __3_0.distcp
│           ├── __4_0.distcp
│           ├── __5_0.distcp
│           ├── __6_0.distcp
│           └── __7_0.distcp
├── train_dataloader.pt
└── training_info.json

Test

uv run examples/converters/convert_dcp_to_hf.py \
--config=results/grpo_tp2_v2/step_1/config.yaml \
--dcp-ckpt-path=results/grpo_tp2_v2/step_1/policy/weights\
--hf-ckpt-path=results/grpo_tp2_v2/step_1-hf


uv run examples/converters/convert_dcp_to_hf.py \
--config=results/grpo_tp2_v1/step_1/config.yaml \
--dcp-ckpt-path=results/grpo_tp2_v1/step_1/policy/weights\
--hf-ckpt-path=results/grpo_tp2_v1/step_1-hf

Summary by CodeRabbit

Documentation
- Updated DCP-to-HF conversion examples across multiple guides with corrected checkpoint file paths for model conversion.
Improvements
- Enhanced state dictionary handling during model conversion to support flexible checkpoint file structures without requiring directory-based inputs.

coderabbitai · 2026-02-05T03:35:42Z

📝 Walkthrough

Walkthrough

Updates DCP-to-HF checkpoint conversion documentation across multiple guide files to reference a specific model file path instead of a directory. Modifies the conversion logic to conditionally handle state dict structures, supporting both nested {"model": ...} and flat formats without forced assertions or unnecessary rewrites.

Changes

Cohort / File(s)	Summary
Documentation Path Updates `README.md`, `docs/about/evaluation.md`, `docs/design-docs/checkpointing.md`, `docs/guides/eval.md`, `docs/guides/grpo-deepscaler.md`, `docs/guides/sft-openmathinstruct2.md`	Updated `--dcp-ckpt-path` argument in DCP-to-HF conversion examples from `results/grpo/step_170/policy/weights/` to `results/grpo/step_170/policy/weights/model`, pointing to a specific model file instead of a directory.
Conversion Logic `nemo_rl/utils/native_checkpoint.py`	Modified `convert_dcp_to_hf` to reload saved weights and conditionally rewrite the file only if state dict is exactly `{"model"}`. Removes unconditional assertion and rewrite, now handles both nested and non-nested state dict formats.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: fixing DCP-to-HF conversion for model-wrapped checkpoints, which is the core objective of the PR.
Linked Issues check	✅ Passed	The PR addresses the core issue `#1427`: updating conversion logic to handle wrapped state dicts and correcting DCP weights path to resolve FileNotFoundError in checkpoint conversion.
Out of Scope Changes check	✅ Passed	All changes are directly related to fixing DCP-to-HF conversion: logic updates in native_checkpoint.py and consistent documentation updates across multiple guides reflecting the corrected path.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Test Results For Major Changes	✅ Passed	Minor bug fix with targeted code changes (+4/-5 lines) and documentation path corrections, qualifying for pass per policy.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ruit/ckpt_bug_issue_1427

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

terrykong

is this a fix for dtensor v1? or v2?

github-actions · 2026-02-05T07:33:26Z

⚠️ File Consistency Check

Check based on commit: 0fed948 (PR #1881 from ruit/ckpt_bug_issue_1427)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

RayenTian · 2026-02-05T07:38:57Z

is this a fix for dtensor v1? or v2?

The issue is because of:

the different file layouts between dtensor v1 and v2. Using policy/weights as the ckpt path for v2 cannot find data, and vice versa.
DCP exports wrap weights under {"model": ...} in dtensor v1, while dtensor v2 saves a flat state_dict.

This PR, we unify the path and weights state_dict. Now it works well for both dtensor v1 and v2 ckpt.

yuki-97 · 2026-02-05T14:48:37Z

thanks for the fix!

can you help adding an unit test for dtensor v2 like test_convert_dcp_to_hf in tests/unit/utils/test_native_checkpoint.py to guard it?

yuki-97

Also just curious from the PR description that seems v1 will save the tokenizer but v2 won't?

github-actions · 2026-02-09T06:33:46Z

⚠️ File Consistency Check

Check based on commit: 2a5f5a2 (PR #1881 from ruit/ckpt_bug_issue_1427)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2026-02-09T06:38:06Z

⚠️ File Consistency Check

Check based on commit: 453a291 (PR #1881 from ruit/ckpt_bug_issue_1427)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2026-02-10T01:53:32Z

⚠️ File Consistency Check

Check based on commit: 9627a78 (PR #1881 from ruit/ckpt_bug_issue_1427)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: ruit <ruit@nvidia.com>

…ror if no metadata file is found for Dtensor V1 or V2 checkpoint paths. Signed-off-by: ruit <ruit@nvidia.com>

github-actions · 2026-02-10T02:09:15Z

⚠️ File Consistency Check

Check based on commit: e572ba5 (PR #1881 from ruit/ckpt_bug_issue_1427)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

…eMo#1881) Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

Signed-off-by: ruit <ruit@nvidia.com>

…eMo#1881) Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Aniket Singh Yadav <singhyadavaniket43@gmail.com>

RayenTian requested review from a team as code owners February 5, 2026 03:31

github-actions Bot added the Documentation Improvements or additions to documentation label Feb 5, 2026

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Feb 5, 2026

RayenTian had a problem deploying to nemo-ci February 5, 2026 03:34 — with GitHub Actions Error

RayenTian removed the CI:L1 Run doctests, unit tests, and functional tests label Feb 5, 2026

RayenTian force-pushed the ruit/ckpt_bug_issue_1427 branch from 38c628f to b7ec34f Compare February 5, 2026 03:35

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Feb 5, 2026

RayenTian had a problem deploying to nemo-ci February 5, 2026 03:36 — with GitHub Actions Error

RayenTian requested review from guyueh1 and yuki-97 February 5, 2026 05:40

terrykong reviewed Feb 5, 2026

View reviewed changes

RayenTian requested review from a team as code owners February 5, 2026 07:32

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 5, 2026

RayenTian temporarily deployed to nemo-ci February 5, 2026 07:39 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci February 5, 2026 08:59 — with GitHub Actions Inactive

yuki-97 mentioned this pull request Feb 5, 2026

fix: add missing functional test #1883

Merged

yuki-97 reviewed Feb 9, 2026

View reviewed changes

Comment thread nemo_rl/models/policy/workers/dtensor_policy_worker.py Outdated

RayenTian force-pushed the ruit/ckpt_bug_issue_1427 branch from 0fed948 to 2a5f5a2 Compare February 9, 2026 06:33

RayenTian requested a review from a team as a code owner February 9, 2026 06:33

RayenTian force-pushed the ruit/ckpt_bug_issue_1427 branch from 453a291 to 54f7d1f Compare February 9, 2026 06:40

github-actions Bot removed the Documentation Improvements or additions to documentation label Feb 9, 2026

RayenTian temporarily deployed to nemo-ci February 9, 2026 07:09 — with GitHub Actions Inactive

yuki-97 reviewed Feb 9, 2026

View reviewed changes

Comment thread nemo_rl/utils/native_checkpoint.py

RayenTian removed the CI:L1 Run doctests, unit tests, and functional tests label Feb 10, 2026

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Feb 10, 2026

RayenTian had a problem deploying to nemo-ci February 10, 2026 01:54 — with GitHub Actions Error

RayenTian added 6 commits February 9, 2026 18:08

update condition judge

1ba02e2

Signed-off-by: ruit <ruit@nvidia.com>

fix path in markdown

ed17d6d

Signed-off-by: ruit <ruit@nvidia.com>

unify the ckpt path of dtesnor v1 and v2

d19f140

Signed-off-by: ruit <ruit@nvidia.com>

add unit test

a3a4545

Signed-off-by: ruit <ruit@nvidia.com>

revert ckpt path in docs

4f972a9

Signed-off-by: ruit <ruit@nvidia.com>

Enhance error handling in convert_dcp_to_hf by raising FileNotFoundEr…

e572ba5

…ror if no metadata file is found for Dtensor V1 or V2 checkpoint paths. Signed-off-by: ruit <ruit@nvidia.com>

RayenTian force-pushed the ruit/ckpt_bug_issue_1427 branch from 9627a78 to e572ba5 Compare February 10, 2026 02:08

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 10, 2026

RayenTian temporarily deployed to nemo-ci February 10, 2026 02:09 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci February 10, 2026 02:43 — with GitHub Actions Inactive

yuki-97 approved these changes Feb 10, 2026

View reviewed changes

RayenTian temporarily deployed to nemo-ci February 10, 2026 09:36 — with GitHub Actions Inactive

yuki-97 enabled auto-merge (squash) February 10, 2026 12:59

yuki-97 merged commit 314c272 into main Feb 10, 2026
44 of 45 checks passed

yuki-97 deleted the ruit/ckpt_bug_issue_1427 branch February 10, 2026 15:08

yuki-97 mentioned this pull request Feb 10, 2026

Convert dcp checkpoint to HF doesn't work now. #1391

Closed

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

fix: Fix DCP-to-HF conversion for model-wrapped checkpoints (#1881)

e0bb3d2

Signed-off-by: ruit <ruit@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

fix: Fix DCP-to-HF conversion for model-wrapped checkpoints (#1881)

cde8378

Signed-off-by: ruit <ruit@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 9, 2026

fix: Fix DCP-to-HF conversion for model-wrapped checkpoints (#1881)

10068c4

Signed-off-by: ruit <ruit@nvidia.com>

Aniketsy pushed a commit to Aniketsy/RL that referenced this pull request Mar 29, 2026

fix: Fix DCP-to-HF conversion for model-wrapped checkpoints (NVIDIA-N…

ca0873e

…eMo#1881) Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Aniket Singh Yadav <singhyadavaniket43@gmail.com>

Conversation

RayenTian commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related issue

The file structure of V1

The structure of V2

Test

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Feb 5, 2026

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

RayenTian commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuki-97 commented Feb 5, 2026

Uh oh!

yuki-97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Feb 9, 2026

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Feb 9, 2026

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

Uh oh!

github-actions Bot commented Feb 10, 2026

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Feb 10, 2026

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RayenTian commented Feb 5, 2026 •

edited

Loading

coderabbitai Bot commented Feb 5, 2026 •

edited

Loading

RayenTian commented Feb 5, 2026 •

edited

Loading