fix `Dtensor` and `tensor` mismatch by 3outeille · Pull Request #42906 · huggingface/transformers

3outeille · 2025-12-16T15:58:10Z

Bug

local_rowise or local_colwise is calling RowiseParallel(use_dtensor=False) (resp. ColwiseParallel(use_dtensor=False). Issue was first noticed in #42356 , quoting

we would like to not have Dtensor logic in the modeling. For example, sinks are supposed to use local_rowwise (cf main/src/transformers/models/gpt_oss/configuration_gpt_oss.py#L41) which is supposed to not return a Dtensor (cf main/src/transformers/integrations/tensor_parallel.py#L1171) but somehow doesnt work

Fix

def convert_and_load_state_dict_in_model(
	#...
	tp_layer = ALL_PARALLEL_STYLES[model.tp_plan[matched_tp_pattern]].__class__

was creating the bug because .__class__ will reuse the class default use_dtensor value. Thus, overwritting the value we specifed in local_rowise/colwise.

The fix makes sure to properly use the use_dtensor value and thus no more Dtensor and tensor mismatch

eaeaae

eaeaae fix tensor parallel MoE test fix tensor parallel MoE test

* fix * linting

HuggingFaceDocBuilderDev · 2025-12-16T16:18:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2025-12-16T16:18:23Z

+                        tp_layer_instance = ALL_PARALLEL_STYLES[model.tp_plan[matched_tp_pattern]]
+                        tp_layer = tp_layer_instance.__class__
                        mapping.distributed_operation = tp_layer(
                            device_mesh=device_mesh, rank=device_map[""].index, empty_param=empty_param.clone()
                        )
+                        mapping.distributed_operation.use_dtensor = tp_layer_instance.use_dtensor


a nice hack, tho if we can come up with a better fix lets try to avoid that please!
the kwargs should only be

device_mesh=device_mesh, rank=device_map[""].index, empty_param=empty_param.clone()

for init, the rest should not be kwargs of init more like hardcoded for that "type".
If you see what I mean here we should only get the class and init it -> local_colwise should get its stuff

ArthurZucker

Looks good to me otherwise, the test will come later AFAIK with your PR on fast distributed tests

ArthurZucker · 2025-12-16T17:22:58Z

            router_indices == -1, num_local_experts
        )  # masking class for one hot
-        return router_scores, router_indices
+        return router_logits, router_scores, router_indices


unsure this works for all models but let's see!

yeah I need to fix the Expert parallel anyway so we will see

github-actions · 2025-12-16T17:24:20Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=42906&sha=12ff9a

* begin Moe test tensor parallel * create tiny moe model + fix test tensor parallel Moe eaeaae * create tiny moe model + fix test tensor parallel Moe eaeaae fix tensor parallel MoE test fix tensor parallel MoE test * fix backward pass test in tensor parallel for Dense model (huggingface#42811) * fix * linting * use mixtral instead for testing * fix dtensor and tensor mismatch * linting * checkout test tensor parallel to be like main * avoid hack and create class instead

3outeille and others added 12 commits November 13, 2025 14:08

begin Moe test tensor parallel

40b3e2b

create tiny moe model + fix test tensor parallel Moe

05172a9

eaeaae

create tiny moe model + fix test tensor parallel Moe

d75f4b8

eaeaae fix tensor parallel MoE test fix tensor parallel MoE test

Merge branch 'main' into v4.57.1-test_tensor_parallel

06635f7

Merge branch 'main' into v4.57.1-test_tensor_parallel

000c33f

fix backward pass test in tensor parallel for Dense model (#42811)

5f548ed

* fix * linting

Merge branch 'main' into v5-test_tensor_parallel_moe

48c69f7

use mixtral instead for testing

87fb140

fix dtensor and tensor mismatch

9524073

linting

ba79de0

checkout test tensor parallel to be like main

3fed52d

Merge branch 'main' into fix_dtensor_tensor_moe_mismatch

ad0f203

3outeille requested a review from ArthurZucker December 16, 2025 16:09

This was referenced Dec 16, 2025

Fix GPT-OSS TP IndexError and unwrapping DTensor #42356

Closed

IndexError: tuple index out of range when using Tensor Parallelism with FSDP2 on GPT-OSS 20B (tensor_parallel.py, line 510) #41819

Closed

ArthurZucker reviewed Dec 16, 2025

View reviewed changes

3outeille and others added 2 commits December 16, 2025 17:22

Merge branch 'main' into fix_dtensor_tensor_moe_mismatch

d6da5af

avoid hack and create class instead

12ff9a4

ArthurZucker approved these changes Dec 16, 2025

View reviewed changes

3outeille enabled auto-merge (squash) December 16, 2025 17:24

ArthurZucker disabled auto-merge December 16, 2025 17:36

ArthurZucker merged commit b1a2fba into main Dec 16, 2025
24 of 26 checks passed

ArthurZucker deleted the fix_dtensor_tensor_moe_mismatch branch December 16, 2025 17:36

This was referenced Dec 17, 2025

fix Dtensor and tensor mismatch for Col/RowRep #42924

Merged

Add support for GPT_OSS with tp_plan or enable native tensor parallelism #42791

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix `Dtensor` and `tensor` mismatch#42906

fix `Dtensor` and `tensor` mismatch#42906
ArthurZucker merged 14 commits intomainfrom
fix_dtensor_tensor_moe_mismatch

3outeille commented Dec 16, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Dec 16, 2025

Uh oh!

ArthurZucker Dec 16, 2025

Uh oh!

3outeille Dec 16, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker Dec 16, 2025

Uh oh!

3outeille Dec 16, 2025

Uh oh!

github-actions Bot commented Dec 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

3outeille commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug

Fix

Uh oh!

HuggingFaceDocBuilderDev commented Dec 16, 2025

Uh oh!

ArthurZucker Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

3outeille Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

3outeille Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Dec 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

3outeille commented Dec 16, 2025 •

edited

Loading