fix: Fix DCP-to-HF conversion for model-wrapped checkpoints#1881
fix: Fix DCP-to-HF conversion for model-wrapped checkpoints#1881
Conversation
38c628f to
b7ec34f
Compare
📝 WalkthroughWalkthroughUpdates DCP-to-HF checkpoint conversion documentation across multiple guide files to reference a specific model file path instead of a directory. Modifies the conversion logic to conditionally handle state dict structures, supporting both nested Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
terrykong
left a comment
There was a problem hiding this comment.
is this a fix for dtensor v1? or v2?
|
The issue is because of:
This PR, we unify the path and weights state_dict. Now it works well for both dtensor v1 and v2 ckpt. |
|
thanks for the fix! can you help adding an unit test for dtensor v2 like |
yuki-97
left a comment
There was a problem hiding this comment.
Also just curious from the PR description that seems v1 will save the tokenizer but v2 won't?
0fed948 to
2a5f5a2
Compare
|
|
453a291 to
54f7d1f
Compare
|
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
…ror if no metadata file is found for Dtensor V1 or V2 checkpoint paths. Signed-off-by: ruit <ruit@nvidia.com>
9627a78 to
e572ba5
Compare
|
…eMo#1881) Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
…eMo#1881) Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
…eMo#1881) Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Aniket Singh Yadav <singhyadavaniket43@gmail.com>
Summary
policy/weightsas the ckpt path for v2 cannot find data, and vice versa.{"model": ...}in dtensor v1, while dtensor v2 saves a flat state_dict.In this PR, we:
{"model": ...}) and flat state dict formats..../weights/modeland optimizer in.../optimizer/optim..../optimizer/optim).Related issue
The file structure of V1
The structure of V2
Test
Summary by CodeRabbit
Documentation
Improvements