fix FSDP loading with meta devices by winglian · Pull Request #44473 · huggingface/transformers

winglian · 2026-03-05T14:52:15Z

What does this PR do?

supersedes #44446

on main, when loading to cpu and using meta devices for non-rank0 processes, it now re-initializes weights on those processes as well as uses more CPU memory. In testing with loading llama3-8b.

main; both on CPU, uses 16GB system RAM, slow to load, re-inits weights on rank1
#44446: rank0 on CPU, rank1 on meta, uses 1.5GB system RAM
v4.57.6, both on CPU, uses 1.5GB system RAM
this PR, both on CPU, uses 1.5GB system RAM, same behavior and training loss as main and v4.57.6

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

winglian · 2026-03-05T16:51:05Z

This PR basically restores the logic from v4.57.6 per https://github.com/huggingface/transformers/blob/v4.57.6/src/transformers/modeling_utils.py#L5853-L5877

(this was removed in #41580 https://github.com/huggingface/transformers/pull/41580/changes#diff-6b72b98c4c2dcfc6cc606843917733f5d858374fbc22a735ff483bbc0c1e63eaL5110-L5142)

Cyrilvallez

I believe we should just completely skip the init in this case rather than mark everything as initialized, then try to initialize??

Cyrilvallez · 2026-03-09T09:45:55Z

+            # Handle FSDP edge case when using cpu ram efficient loading to ensure it is marked as initialized
+            # since it will get its weights broadcasted from rank0
+            for key in self.state_dict():
+                try:
+                    param_or_buffer = self.get_parameter_or_buffer(key)
+                    param_or_buffer._is_hf_initialized = True
+                except AttributeError:
+                    pass  # may happen when handling pre-quantized weights
+            self._is_hf_initialized = True


Should we simply return here instead, to completely avoid calling initialize_weights later in the function? Would be easier than setting all weights as initialized before calling initialize which will be skipped anyway as params are marked as already initialized

the early return leads to unstable training with NaN grad_norm on first step and 0.0 loss on second step.

HuggingFaceDocBuilderDev · 2026-03-09T10:11:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Cyrilvallez

LGTM! Thanks a lot for this!
For posterity, the issue is only the non-persistent buffers, as they are NOT gathered from rank0 (only the state_dict), so we need to go through all inits for them (but skipping everything in the state_dict)!

winglian mentioned this pull request Mar 5, 2026

Fix FSDP2 sharding and validate AO version for LR groups axolotl-ai-cloud/axolotl#3403

Merged

winglian requested review from Cyrilvallez and SunMarc March 5, 2026 17:13

winglian force-pushed the fix-meta branch from ce61741 to 75e9979 Compare March 6, 2026 02:34

SunMarc reviewed Mar 6, 2026

View reviewed changes

Comment thread src/transformers/modeling_utils.py Outdated

winglian added 3 commits March 6, 2026 08:44

fix FSDP loading with meta devices

8a45501

unit tests for fsdp w cpu+meta fixes (claude):

c130536

simplify and limit to fsdp

e54a4ad

winglian force-pushed the fix-meta branch from 7fac2fd to e54a4ad Compare March 6, 2026 13:44

winglian added 2 commits March 6, 2026 08:57

simplify more and fix tests with mocks to fsdp functions

8806d25

handle prequantized weights too

ecbb725

Cyrilvallez reviewed Mar 9, 2026

View reviewed changes

Cyrilvallez added 2 commits March 9, 2026 10:53

Merge branch 'main' into fix-meta

ef6ba91

Merge branch 'main' into fix-meta

8cd36c7

add a bit more explanations in comment

873eb45

Cyrilvallez approved these changes Mar 9, 2026

View reviewed changes

style

7c565f1

Cyrilvallez merged commit 0a0ac7a into huggingface:main Mar 9, 2026
28 checks passed

SunMarc mentioned this pull request Mar 10, 2026

FSDP_CPU_RAM_EFFICIENT_LOADING broken #43749

Closed

4 tasks

This was referenced Mar 27, 2026

CI fails for distributed test_rloo[fsdp2]: CUDA error: device-side assert triggered: probability tensor contains either inf, nan or element < 0 huggingface/trl#5386

Closed

Fix NaN weights on non-rank-0 FSDP processes #45050

Merged

albertvillanova mentioned this pull request Apr 28, 2026

Fix OOM regression for FSDP2 + cpu_ram_efficient_loading on large models #45649

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix FSDP loading with meta devices#44473

fix FSDP loading with meta devices#44473
Cyrilvallez merged 9 commits intohuggingface:mainfrom
winglian:fix-meta

winglian commented Mar 5, 2026

Uh oh!

winglian commented Mar 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Cyrilvallez left a comment

Uh oh!

Cyrilvallez Mar 9, 2026

Uh oh!

winglian Mar 9, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 9, 2026

Uh oh!

Cyrilvallez left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

winglian commented Mar 5, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

winglian commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

winglian Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Mar 9, 2026

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

winglian commented Mar 5, 2026 •

edited

Loading