Skip to content

Fix unexpected TF32 being enabled in testing#45252

Merged
ydshieh merged 2 commits intomainfrom
fix_strange
Apr 5, 2026
Merged

Fix unexpected TF32 being enabled in testing#45252
ydshieh merged 2 commits intomainfrom
fix_strange

Conversation

@ydshieh
Copy link
Copy Markdown
Collaborator

@ydshieh ydshieh commented Apr 5, 2026

What does this PR do?

#43166 used torch.set_float32_matmul_precision("high") which causes (likely) TF32 being used

“high”, float32 matrix multiplications either use the TensorFloat32 datatype (10 mantissa bits explicitly stored) or treat each float32 number as the sum of two bfloat16 numbers (approximately 16 mantissa bits with 14 bits explicitly stored)

(see this doc)

In #45248, we fixed an issue introduced #42428, which bring us back to the original setting: no TF32 used. This change works well and fixes all failing test_batching_equivalence, if we apply the change in #45248 back to #42428.

Somehow, even with the fix from #45248, the change torch.set_float32_matmul_precision("high") in the file tests/models/youtu/test_modeling_youtu.py in the later PR #43166 causes some failing tests (test_batching_equivalence, see this comment), because TF32 being used (or because treat each float32 number as the sum of two bfloat16 numbers is not good for us).

This PR set it to highest , i.e.

`torch.set_float32_matmul_precision("highest")

which fixes all test_batching_equivalence again, and achieve the goal mentioned by the PR author.

The expected output values in the integration tests are updated, because the PR Prepare and keep track of position ids in generate (#43734) changed the actual outputs. The new outputs look making sense and very similar to the previous ones, so I think it's safe.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 5, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: youtu

@ydshieh ydshieh mentioned this pull request Apr 5, 2026
5 tasks
@ydshieh
Copy link
Copy Markdown
Collaborator Author

ydshieh commented Apr 5, 2026

run-slow: vit, clip, youtu

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 5, 2026

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/clip", "models/vit", "models/youtu"]
quantizations: []

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 5, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45252&sha=49e459

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 5, 2026

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN ddcaf64c workflow commit (merge commit)
PR 49e4596c branch commit (from PR)
main 794d65f4 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@ydshieh
Copy link
Copy Markdown
Collaborator Author

ydshieh commented Apr 5, 2026

@zucchini-nlp

I merged this PR to move fast, but if you think the output changes since your earlier PR #43734 for this model youtu should not happen, then we can investigate further

The expected output values in the integration tests are updated, because the PR Prepare and keep track of position ids in generate (#43734) changed the actual outputs. The new outputs look making sense and very similar to the previous ones, so I think it's safe.

@ydshieh ydshieh merged commit 374d44d into main Apr 5, 2026
22 of 24 checks passed
@ydshieh ydshieh deleted the fix_strange branch April 5, 2026 17:32
louzongzhi pushed a commit to louzongzhi/transformers that referenced this pull request Apr 6, 2026
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
louzongzhi pushed a commit to louzongzhi/transformers that referenced this pull request Apr 6, 2026
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
@zucchini-nlp
Copy link
Copy Markdown
Member

Not sure about the batching, but for integration tests looks weird to me. Position ids should have remained the same after the PR, so I am going to check it (a bit later). We also had a few other big changes in generation/cache so there might have been a few PRs causing the diff 🤔

@zucchini-nlp
Copy link
Copy Markdown
Member

Ah the model was using padding_side="right" which indeed is not going to work well after that PR. We don't support right padding in any case, so Ig the diff is fine. But it's better if we pass padding_side="left" when tokenizing imo

sirzechs66 pushed a commit to sirzechs66/transformers that referenced this pull request Apr 18, 2026
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants