fix: ensure dtype consistency in grouped_mm under autocast by nulone · Pull Request #43833 · huggingface/transformers

nulone · 2026-02-08T07:26:06Z

What does this PR do?

torch._grouped_mm is not registered for autocast. Under torch.autocast, LayerNorm outputs float32 while model weights stay bfloat16, causing RuntimeError: "expected mat1 and mat2 to have same dtype".

Fix

Cast input to weight.dtype before calling _grouped_mm in src/transformers/integrations/moe.py.

Impact

Affects all MoE models using grouped_mm under autocast (Mixtral, Qwen3 MoE, DeepSeek, PhiMoE, etc.)

Before submitting

This PR fixes a typo or improves the docs
Was this discussed/approved via a Github issue? With torch.autocast, Phi-tiny-MoE-instruct raises an dtype mismatch error #43828
Did you write any new necessary tests?

Note: No local GPU access — relying on CI for verification.

torch._grouped_mm is not registered for autocast, causing dtype mismatch when LayerNorm outputs float32 but weights are bfloat16. Fixes huggingface#43828

HuggingFaceDocBuilderDev · 2026-02-09T12:55:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Hey! does not look too bad, just unsure, should we cast back post op?

nulone · 2026-02-11T02:28:23Z

No need — the result is already cast back to hidden_states.dtype at line 273:

https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/moe.py#L273

So the dtype flow is: input (float32) → cast to weight.dtype (bf16) → _grouped_mm → cast back to hidden_states.dtype (float32) at the end.

fix: ensure dtype consistency in grouped_mm under autocast

18dae49

torch._grouped_mm is not registered for autocast, causing dtype mismatch when LayerNorm outputs float32 but weights are bfloat16. Fixes huggingface#43828

ArthurZucker reviewed Feb 9, 2026

View reviewed changes

Merge branch 'main' into fix/43828-phimoe-dtype-autocast

1ee39ff

ArthurZucker mentioned this pull request Feb 11, 2026

fix(moe): Handle dtype mismatch in torch._grouped_mm with autocast #43839

Merged

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ensure dtype consistency in grouped_mm under autocast#43833

fix: ensure dtype consistency in grouped_mm under autocast#43833
nulone wants to merge 2 commits intohuggingface:mainfrom
nulone:fix/43828-phimoe-dtype-autocast

nulone commented Feb 8, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Feb 9, 2026

Uh oh!

ArthurZucker left a comment

Uh oh!

nulone commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nulone commented Feb 8, 2026

What does this PR do?

Fix

Impact

Before submitting

Uh oh!

HuggingFaceDocBuilderDev commented Feb 9, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

nulone commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants