Fix qwen3_vl mix precision dtype by theruiwang · Pull Request #41701 · huggingface/transformers

theruiwang · 2025-10-17T18:29:54Z

fast_pos_embed_interpolate returns pos_embeds in the same dtype as the master weights.

Therefore, when the master weights are in FP32 but the forward pass runs in BF16, hidden_states will be upcast to FP32, causing dtype mismatches with other activations.

CC @yonigozlan @molbap @zucchini-nlp

github-actions · 2025-10-17T19:03:02Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_omni_moe, qwen3_vl, qwen3_vl_moe

zucchini-nlp · 2025-10-20T09:41:40Z


        pos_embeds = self.fast_pos_embed_interpolate(grid_thw)
-        hidden_states = hidden_states + pos_embeds
+        hidden_states = (hidden_states + pos_embeds).to(input_dtype)


i think we should cast only pos_embeds to the input dtype here

Agreed, further couldn't we fix in fast_pos_embed_interpolate instead of recasting? To avoid too many conversions - could for instance pass the input_dtype.
In h_idxs = torch.linspace(0, self.num_grid_per_side - 1, h) passing the wanted dtype should be enough no?

Thought the same thing, but I am not sure if positional embedding was intentionally done in full precision for better performance 🤔

Yeah, this change intentionally keeps it running in FP32/same dtype as master weights for now, without changing the numerical dynamics.

Casting only pos_embeds or fixing inside fast_pos_embed_interpolate would have numerical implications, which requires ablations with training results if we want to be careful.

Happy to discuss — I'm leaning towards not changing model behaviors for now.

Friendly reminder.

Let me know what you think.

CC @molbap @zucchini-nlp

theruiwang added 2 commits October 17, 2025 18:17

fix qwen3_vl mix precision dtype

5a465cb

Update moe and omni

a61863c

zucchini-nlp reviewed Oct 20, 2025

View reviewed changes

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix qwen3_vl mix precision dtype#41701

Fix qwen3_vl mix precision dtype#41701
theruiwang wants to merge 2 commits intohuggingface:mainfrom
theruiwang:fix-qwen3_vl-mix-precision-dtype

theruiwang commented Oct 17, 2025

Uh oh!

github-actions Bot commented Oct 17, 2025

Uh oh!

zucchini-nlp Oct 20, 2025

Uh oh!

molbap Oct 20, 2025

Uh oh!

zucchini-nlp Oct 20, 2025

Uh oh!

theruiwang Oct 20, 2025

Uh oh!

theruiwang Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

theruiwang commented Oct 17, 2025

Uh oh!

github-actions Bot commented Oct 17, 2025

Uh oh!

zucchini-nlp Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

molbap Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

theruiwang Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

theruiwang Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants