Skip to content

fix: Fix DCP-to-HF conversion for model-wrapped checkpoints#1881

Merged
yuki-97 merged 6 commits intomainfrom
ruit/ckpt_bug_issue_1427
Feb 10, 2026
Merged

fix: Fix DCP-to-HF conversion for model-wrapped checkpoints#1881
yuki-97 merged 6 commits intomainfrom
ruit/ckpt_bug_issue_1427

Conversation

@RayenTian
Copy link
Copy Markdown
Contributor

@RayenTian RayenTian commented Feb 5, 2026

Summary

  1. The file layout differs between dtensor v1 and v2. Using policy/weights as the ckpt path for v2 cannot find data, and vice versa.
  2. DCP exports wrap weights under {"model": ...} in dtensor v1, while dtensor v2 saves a flat state_dict.

In this PR, we:

  • Fix DCP-to-HF conversion to handle both wrapped ({"model": ...}) and flat state dict formats.
  • Align dtensor v1 and v2 to the same layout: model weights in .../weights/model and optimizer in .../optimizer/optim.
  • Update documentation examples to use the correct DCP weights path (.../optimizer/optim).
  • Align checkpoint conversion guidance across README and guides.

Related issue

The file structure of V1

results/grpo_tp2_v1/step_1/
├── config.yaml
├── policy
│   ├── optimizer
│   │   ├── __0_0.distcp
│   │   ├── __1_0.distcp
│   │   ├── __2_0.distcp
│   │   ├── __3_0.distcp
│   │   ├── __4_0.distcp
│   │   ├── __5_0.distcp
│   │   ├── __6_0.distcp
│   │   └── __7_0.distcp
│   ├── tokenizer
│   │   ├── added_tokens.json
│   │   ├── chat_template.jinja
│   │   ├── merges.txt
│   │   ├── special_tokens_map.json
│   │   ├── tokenizer_config.json
│   │   ├── tokenizer.json
│   │   └── vocab.json
│   └── weights
│       ├── __0_0.distcp
│       ├── __1_0.distcp
│       ├── __2_0.distcp
│       ├── __3_0.distcp
│       ├── __4_0.distcp
│       ├── __5_0.distcp
│       ├── __6_0.distcp
│       └── __7_0.distcp
├── train_dataloader.pt
└── training_info.json

The structure of V2

results/grpo_tp2_v2/step_1
├── config.yaml
├── policy
│   ├── optimizer
│   │   └── optim
│   │       ├── __0_0.distcp
│   │       ├── __1_0.distcp
│   │       ├── __2_0.distcp
│   │       ├── __3_0.distcp
│   │       ├── __4_0.distcp
│   │       ├── __5_0.distcp
│   │       ├── __6_0.distcp
│   │       └── __7_0.distcp
│   ├── tokenizer
│   │   ├── added_tokens.json
│   │   ├── chat_template.jinja
│   │   ├── merges.txt
│   │   ├── special_tokens_map.json
│   │   ├── tokenizer_config.json
│   │   ├── tokenizer.json
│   │   └── vocab.json
│   └── weights
│       └── model
│           ├── __0_0.distcp
│           ├── __1_0.distcp
│           ├── __2_0.distcp
│           ├── __3_0.distcp
│           ├── __4_0.distcp
│           ├── __5_0.distcp
│           ├── __6_0.distcp
│           └── __7_0.distcp
├── train_dataloader.pt
└── training_info.json

Test

uv run examples/converters/convert_dcp_to_hf.py \
--config=results/grpo_tp2_v2/step_1/config.yaml \
--dcp-ckpt-path=results/grpo_tp2_v2/step_1/policy/weights\
--hf-ckpt-path=results/grpo_tp2_v2/step_1-hf


uv run examples/converters/convert_dcp_to_hf.py \
--config=results/grpo_tp2_v1/step_1/config.yaml \
--dcp-ckpt-path=results/grpo_tp2_v1/step_1/policy/weights\
--hf-ckpt-path=results/grpo_tp2_v1/step_1-hf

Summary by CodeRabbit

  • Documentation

    • Updated DCP-to-HF conversion examples across multiple guides with corrected checkpoint file paths for model conversion.
  • Improvements

    • Enhanced state dictionary handling during model conversion to support flexible checkpoint file structures without requiring directory-based inputs.

@RayenTian RayenTian requested review from a team as code owners February 5, 2026 03:31
@github-actions github-actions Bot added the Documentation Improvements or additions to documentation label Feb 5, 2026
@RayenTian RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Feb 5, 2026
@RayenTian RayenTian removed the CI:L1 Run doctests, unit tests, and functional tests label Feb 5, 2026
@RayenTian RayenTian force-pushed the ruit/ckpt_bug_issue_1427 branch from 38c628f to b7ec34f Compare February 5, 2026 03:35
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 5, 2026

📝 Walkthrough

Walkthrough

Updates DCP-to-HF checkpoint conversion documentation across multiple guide files to reference a specific model file path instead of a directory. Modifies the conversion logic to conditionally handle state dict structures, supporting both nested {"model": ...} and flat formats without forced assertions or unnecessary rewrites.

Changes

Cohort / File(s) Summary
Documentation Path Updates
README.md, docs/about/evaluation.md, docs/design-docs/checkpointing.md, docs/guides/eval.md, docs/guides/grpo-deepscaler.md, docs/guides/sft-openmathinstruct2.md
Updated --dcp-ckpt-path argument in DCP-to-HF conversion examples from results/grpo/step_170/policy/weights/ to results/grpo/step_170/policy/weights/model, pointing to a specific model file instead of a directory.
Conversion Logic
nemo_rl/utils/native_checkpoint.py
Modified convert_dcp_to_hf to reload saved weights and conditionally rewrite the file only if state dict is exactly {"model"}. Removes unconditional assertion and rewrite, now handles both nested and non-nested state dict formats.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: fixing DCP-to-HF conversion for model-wrapped checkpoints, which is the core objective of the PR.
Linked Issues check ✅ Passed The PR addresses the core issue #1427: updating conversion logic to handle wrapped state dicts and correcting DCP weights path to resolve FileNotFoundError in checkpoint conversion.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing DCP-to-HF conversion: logic updates in native_checkpoint.py and consistent documentation updates across multiple guides reflecting the corrected path.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Test Results For Major Changes ✅ Passed Minor bug fix with targeted code changes (+4/-5 lines) and documentation path corrections, qualifying for pass per policy.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ruit/ckpt_bug_issue_1427

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@RayenTian RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Feb 5, 2026
@RayenTian RayenTian requested review from guyueh1 and yuki-97 February 5, 2026 05:40
Copy link
Copy Markdown
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a fix for dtensor v1? or v2?

@RayenTian RayenTian requested review from a team as code owners February 5, 2026 07:32
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 5, 2026

⚠️ File Consistency Check

Check based on commit: 0fed948 (PR #1881 from ruit/ckpt_bug_issue_1427)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@RayenTian
Copy link
Copy Markdown
Contributor Author

RayenTian commented Feb 5, 2026

is this a fix for dtensor v1? or v2?

The issue is because of:

  1. the different file layouts between dtensor v1 and v2. Using policy/weights as the ckpt path for v2 cannot find data, and vice versa.
  2. DCP exports wrap weights under {"model": ...} in dtensor v1, while dtensor v2 saves a flat state_dict.

This PR, we unify the path and weights state_dict. Now it works well for both dtensor v1 and v2 ckpt.

@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 5, 2026
@yuki-97
Copy link
Copy Markdown
Contributor

yuki-97 commented Feb 5, 2026

thanks for the fix!

can you help adding an unit test for dtensor v2 like test_convert_dcp_to_hf in tests/unit/utils/test_native_checkpoint.py to guard it?

Copy link
Copy Markdown
Contributor

@yuki-97 yuki-97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just curious from the PR description that seems v1 will save the tokenizer but v2 won't?

Comment thread nemo_rl/models/policy/workers/dtensor_policy_worker.py Outdated
@RayenTian RayenTian force-pushed the ruit/ckpt_bug_issue_1427 branch from 0fed948 to 2a5f5a2 Compare February 9, 2026 06:33
@RayenTian RayenTian requested a review from a team as a code owner February 9, 2026 06:33
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 9, 2026

⚠️ File Consistency Check

Check based on commit: 2a5f5a2 (PR #1881 from ruit/ckpt_bug_issue_1427)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 9, 2026

⚠️ File Consistency Check

Check based on commit: 453a291 (PR #1881 from ruit/ckpt_bug_issue_1427)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@RayenTian RayenTian force-pushed the ruit/ckpt_bug_issue_1427 branch from 453a291 to 54f7d1f Compare February 9, 2026 06:40
@github-actions github-actions Bot removed the Documentation Improvements or additions to documentation label Feb 9, 2026
Comment thread nemo_rl/utils/native_checkpoint.py
@RayenTian RayenTian removed the CI:L1 Run doctests, unit tests, and functional tests label Feb 10, 2026
@github-actions
Copy link
Copy Markdown

⚠️ File Consistency Check

Check based on commit: 9627a78 (PR #1881 from ruit/ckpt_bug_issue_1427)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@RayenTian RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Feb 10, 2026
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
…ror if no metadata file is found for Dtensor V1 or V2 checkpoint paths.

Signed-off-by: ruit <ruit@nvidia.com>
@RayenTian RayenTian force-pushed the ruit/ckpt_bug_issue_1427 branch from 9627a78 to e572ba5 Compare February 10, 2026 02:08
@github-actions
Copy link
Copy Markdown

⚠️ File Consistency Check

Check based on commit: e572ba5 (PR #1881 from ruit/ckpt_bug_issue_1427)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 10, 2026
@yuki-97 yuki-97 enabled auto-merge (squash) February 10, 2026 12:59
@yuki-97 yuki-97 merged commit 314c272 into main Feb 10, 2026
44 of 45 checks passed
@yuki-97 yuki-97 deleted the ruit/ckpt_bug_issue_1427 branch February 10, 2026 15:08
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 12, 2026
…eMo#1881)

Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
…eMo#1881)

Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
seonjinn pushed a commit that referenced this pull request Mar 9, 2026
Aniketsy pushed a commit to Aniketsy/RL that referenced this pull request Mar 29, 2026
…eMo#1881)

Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: Aniket Singh Yadav <singhyadavaniket43@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Checkpoint conversion dcp to hf format doesnot work Add HF offline conversion script for models trained with DTensorPolicyWorkerV2

3 participants