fix train deepseek V4 with fsdp2: AttributeError: 'Tensor' object has no attribute 'device_mesh' by frozenleaves · Pull Request #4023 · huggingface/accelerate

frozenleaves · 2026-04-29T07:30:52Z

What does this PR do?

fix the bug about training deepseek v4 with fsdp2:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/frozen/LlamaFactory/src/train.py", line 28, in <module>
[rank0]:     main()
[rank0]:   File "/home/frozen/LlamaFactory/src/train.py", line 19, in main
[rank0]:     run_exp()
[rank0]:   File "/home/frozen/LlamaFactory/src/llamafactory/train/tuner.py", line 139, in run_exp
[rank0]:     _training_function(config={"args": args, "callbacks": callbacks})
[rank0]:   File "/home/frozen/LlamaFactory/src/llamafactory/train/tuner.py", line 107, in _training_function
[rank0]:     run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]:   File "/home/frozen/LlamaFactory/src/llamafactory/train/sft/workflow.py", line 140, in run_sft
[rank0]:     train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/frozen/transformers/src/transformers/trainer.py", line 1427, in train
[rank0]:     return inner_training_loop(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/frozen/transformers/src/transformers/trainer.py", line 1466, in _inner_training_loop
[rank0]:     model, train_dataloader = self._prepare_for_training(max_steps, train_dataloader, resume_from_checkpoint)
[rank0]:                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/frozen/transformers/src/transformers/trainer.py", line 1602, in _prepare_for_training
[rank0]:     model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
[rank0]:                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/frozen/accelerate/src/accelerate/accelerator.py", line 1553, in prepare
[rank0]:     result = self._prepare_fsdp2(*args)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/frozen/accelerate/src/accelerate/accelerator.py", line 1727, in _prepare_fsdp2
[rank0]:     model = fsdp2_prepare_model(self, model)
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/frozen/accelerate/src/accelerate/utils/fsdp_utils.py", line 782, in fsdp2_prepare_model
[rank0]:     fsdp2_load_full_state_dict(
[rank0]:   File "/home/frozen/accelerate/src/accelerate/utils/fsdp_utils.py", line 521, in fsdp2_load_full_state_dict
[rank0]:     device_mesh = sharded_param.device_mesh
[rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: AttributeError: 'Tensor' object has no attribute 'device_mesh'

In fsdp2_load_full_state_dict function. It iterates over model.state_dict() and assumes that every item is a DTensor, directly accessing the .device_mesh attribute.

However, when using FSDP2 (via fully_shard), only model parameters are converted to DTensors, while persistent buffers remain standard torch.Tensors.

In the DeepSeek-V4 model, the MoE router registers persistent buffers (specifically bias and tid2eid). When fsdp2_load_full_state_dict is called and iterates over these buffers, it triggers an AttributeError: 'Tensor' object has no attribute 'device_mesh' because they lack the DTensor-specific attributes, thus interrupting the model loading process.

This PR modifies the fsdp2_load_full_state_dict function in accelerate/utils/fsdp_utils.py.

In both the chief (primary) and non-chief process branches, an explicit type check has been added for items in the state_dict:

If an item is a DTensor, the original loading and attribute access logic is retained.
If an item is not a DTensor (i.e., it is a regular Tensor like a persistent buffer), the logic now bypasses the DTensor-specific attribute accesses. Instead, it directly broadcasts the tensor and keeps it in its original standard Tensor state.

The test is based on this PR: huggingface/transformers#45643 . Both the main branch and the latest release version of accelerate can reproduce this issue.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…e 'device_mesh'

fix train with fsdp2: AttributeError: 'Tensor' object has no attribut…

8fb099a

…e 'device_mesh'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix train deepseek V4 with fsdp2: AttributeError: 'Tensor' object has no attribute 'device_mesh'#4023

fix train deepseek V4 with fsdp2: AttributeError: 'Tensor' object has no attribute 'device_mesh'#4023
frozenleaves wants to merge 1 commit intohuggingface:mainfrom
frozenleaves:main

frozenleaves commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

frozenleaves commented Apr 29, 2026

What does this PR do?

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant