Refactor core_model_loading to support FSDP shard-on-read loading by 3outeille · Pull Request #44974 · huggingface/transformers

3outeille · 2026-03-24T16:13:25Z

TODO:

Saving seems to take a bit of time tho. Need investigation
Need to check if it works in 1D (FSDP or TP)and 2D (FSDP + TP).

Running the script from #44996

(env_pr-44974-fsdp-core-model-loading) ➜  pr-44974-fsdp-core-model-loading git:(pr-44974-fsdp-core-model-loading) ✗ torchrun --nproc_per_node=4 train_fsdp_tp.py 2>&1 | tee ref.txt
W0326 17:05:52.336000 1498148 torch/distributed/run.py:803] 
W0326 17:05:52.336000 1498148 torch/distributed/run.py:803] *****************************************
W0326 17:05:52.336000 1498148 torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W0326 17:05:52.336000 1498148 torch/distributed/run.py:803] *****************************************
`torch_dtype` is deprecated! Use `dtype` instead!
`torch_dtype` is deprecated! Use `dtype` instead!
`torch_dtype` is deprecated! Use `dtype` instead!
`torch_dtype` is deprecated! Use `dtype` instead!
Loading weights: 100%|██████████| 146/146 [00:00<00:00, 1015.33it/s]
Loading weights: 100%|██████████| 146/146 [00:00<00:00, 947.54it/s]
Loading weights: 100%|██████████| 146/146 [00:00<00:00, 888.82it/s]
Loading weights: 100%|██████████| 146/146 [00:00<00:00, 967.20it/s]
Step    0 | Loss: 12.9297
Step   10 | Loss: 6.8154
Step   20 | Loss: 6.2856
Step   30 | Loss: 6.5783
Step   40 | Loss: 6.1821

HuggingFaceDocBuilderDev · 2026-03-24T16:24:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

3outeille · 2026-04-07T11:38:43Z

    return alternation, src_group_to_glob, tgt_group_to_glob


+def resolve_target_wildcards(source_pattern: str, target_pattern: str, source_key: str) -> str:


review this part

3outeille · 2026-04-07T11:38:54Z

-                    model=model,
-                    missing_keys=loading_info.missing_keys if loading_info else None,
-                )
+            if len(collected_tensors) > 1 and model is not None:


review this part

3outeille · 2026-04-07T11:39:05Z

+                # ref.shape is the DTensor global shape. For DTensor-based TP+FSDP,
+                # parallelize_module + fully_shard compose correctly, so ref.shape
+                # and ref.placements are already correct for the 2D DTensor.
+                fsdp_param = DTensor.from_local(


review this part

- DtensorShardOperation for range-math shard-on-read - spawn_materialize() enhancements - from_pretrained wiring for distributed config - Shard operation helpers in tensor_parallel - Shard-on-read and LoadStateDictConfig tests

github-actions · 2026-04-14T14:38:45Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44974&sha=21f056

3outeille force-pushed the fsdp-core-model-loading branch from e5fc7eb to 4b2a921 Compare March 24, 2026 16:14

3outeille mentioned this pull request Mar 25, 2026

🚨 Distributed training API #44989

Draft

3outeille force-pushed the fsdp-core-model-loading branch from d48fcc7 to 607cc11 Compare March 26, 2026 17:48

3outeille commented Apr 7, 2026

View reviewed changes

3outeille force-pushed the fsdp-vs-ddp branch from 978ac87 to a5c2554 Compare April 13, 2026 14:11

DistributedConfig + shard-on-read loading

739332c

- DtensorShardOperation for range-math shard-on-read - spawn_materialize() enhancements - from_pretrained wiring for distributed config - Shard operation helpers in tensor_parallel - Shard-on-read and LoadStateDictConfig tests

3outeille force-pushed the fsdp-core-model-loading branch from 607cc11 to 739332c Compare April 13, 2026 14:14

3outeille force-pushed the fsdp-vs-ddp branch from 864e9fa to 7f6cd3d Compare April 14, 2026 09:53

3outeille force-pushed the fsdp-core-model-loading branch from dbc9619 to c567240 Compare April 14, 2026 09:54

3outeille force-pushed the fsdp-vs-ddp branch from 7f6cd3d to 37dcc14 Compare April 14, 2026 13:44

Merge branch 'fsdp-vs-ddp' into fsdp-core-model-loading

c1dab9e

3outeille force-pushed the fsdp-core-model-loading branch from c567240 to c1dab9e Compare April 14, 2026 13:44

Fix ruff formatting in core_model_loading.py

21f0561

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor core_model_loading to support FSDP shard-on-read loading#44974

Refactor core_model_loading to support FSDP shard-on-read loading#44974
3outeille wants to merge 3 commits intofsdp-vs-ddpfrom
fsdp-core-model-loading

3outeille commented Mar 24, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 24, 2026

Uh oh!

3outeille Apr 7, 2026

Uh oh!

3outeille Apr 7, 2026

Uh oh!

3outeille Apr 7, 2026

Uh oh!

github-actions Bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return alternation, src_group_to_glob, tgt_group_to_glob


		def resolve_target_wildcards(source_pattern: str, target_pattern: str, source_key: str) -> str:

Conversation

3outeille commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 24, 2026

Uh oh!

3outeille Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

3outeille Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

3outeille Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

3outeille commented Mar 24, 2026 •

edited

Loading