Fix tf32 issue: set `torch.backends.cudnn.conv.fp32_precision` explicitly. by ydshieh · Pull Request #45248 · huggingface/transformers

ydshieh · 2026-04-05T07:51:44Z

What does this PR do?

PR #42428 change the way to enable / disable torch's TF32 using torch new API. It turns out set

torch.backends.fp32_precision = False

would still have

torch.backends.cudnn.conv.fp32_precision = "tf32"
torch.backends.cudnn.rnn.fp32_precision = "tf32"

It's not clear if it's a bug or a design in torch, I will talk to people at torch conference next week.

For now, this issue causes ~60 test_batching_equivalence failing. Set torch.backends.cudnn.conv.fp32_precision = "ieee" explicitly will have no such failing tests (on the commit of the linked PR).

I will merge this PR directly to move fast. If torch team says that it's a design instead of a bug, we could move the logic to our enable_tf32.

Keep in mind, even with this fix, there are still 37 failing test_batching_equivalence, which are caused by other issues introduced after #42428 , which should be fixed in separated PR(s).

Note: this PR bring the vit and clip CI back to ✅

ydshieh · 2026-04-05T08:01:59Z

run-slow: vit

github-actions · 2026-04-05T08:03:10Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/vit"]
quantizations: []

HuggingFaceDocBuilderDev · 2026-04-05T08:04:27Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2026-04-05T08:13:54Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	1c42ae2e	workflow commit (merge commit)
PR	8fd7c7f7	branch commit (from PR)
main	499ef1d7	base commit (on `main`)

Model CI Report

❌ 1 new failed tests from this PR 😭

vit:
tests/models/vit/test_modeling_vit.py::ViTModelTest::test_torch_export (✅ ⟹ ❌)

ydshieh · 2026-04-05T09:21:32Z

run-slow: vit, clip

github-actions · 2026-04-05T09:22:51Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/clip", "models/vit"]
quantizations: []

github-actions · 2026-04-05T09:33:41Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	d5cafdce	workflow commit (merge commit)
PR	e70c3db5	branch commit (from PR)
main	499ef1d7	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

ydshieh · 2026-04-05T10:35:41Z

well, the remaining 37 failing tests only fail when we run the whole set of tests (from all models) like

python3 -m pytest -v tests/models/ -k "test_batching_equivalence"

If we run the set of those 37 tests only like

python3 -m pytest -v @failed.txt

all of them pass. So some tests might change the tf32 settings and never reset.

For the record, the list of those 37 tests are :

tests/models/aimv2/test_modeling_aimv2.py::Aimv2ModelTest::test_batching_equivalence
tests/models/altclip/test_modeling_altclip.py::AltCLIPVisionModelTest::test_batching_equivalence
tests/models/altclip/test_modeling_altclip.py::AltCLIPModelTest::test_batching_equivalence
tests/models/blip_2/test_modeling_blip_2.py::Blip2VisionModelWithProjectionTest::test_batching_equivalence
tests/models/blip_2/test_modeling_blip_2.py::Blip2TextRetrievalModelTest::test_batching_equivalence
tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPVisionModelTest::test_batching_equivalence
tests/models/chinese_clip/test_modeling_chinese_clip.py::ChineseCLIPModelTest::test_batching_equivalence
tests/models/clap/test_modeling_clap.py::ClapAudioModelTest::test_batching_equivalence
tests/models/clap/test_modeling_clap.py::ClapModelTest::test_batching_equivalence
tests/models/clip/test_modeling_clip.py::CLIPVisionModelTest::test_batching_equivalence
tests/models/clip/test_modeling_clip.py::CLIPModelTest::test_batching_equivalence
tests/models/clipseg/test_modeling_clipseg.py::CLIPSegVisionModelTest::test_batching_equivalence
tests/models/clipseg/test_modeling_clipseg.py::CLIPSegModelTest::test_batching_equivalence
tests/models/convbert/test_modeling_convbert.py::ConvBertModelTest::test_batching_equivalence
tests/models/deformable_detr/test_modeling_deformable_detr.py::DeformableDetrModelTest::test_batching_equivalence
tests/models/flava/test_modeling_flava.py::FlavaForPreTrainingTest::test_batching_equivalence
tests/models/fuyu/test_modeling_fuyu.py::FuyuModelTest::test_batching_equivalence
tests/models/groupvit/test_modeling_groupvit.py::GroupViTModelTest::test_batching_equivalence
tests/models/lasr/test_modeling_lasr.py::LasrEncoderModelTest::test_batching_equivalence
tests/models/metaclip_2/test_modeling_metaclip_2.py::MetaClip2VisionModelTest::test_batching_equivalence
tests/models/metaclip_2/test_modeling_metaclip_2.py::MetaClip2ModelTest::test_batching_equivalence
tests/models/metaclip_2/test_modeling_metaclip_2.py::MetaClip2ForImageClassificationModelTest::test_batching_equivalence
tests/models/mlcd/test_modeling_mlcd.py::MLCDVisionModelTest::test_batching_equivalence
tests/models/musicgen_melody/test_modeling_musicgen_melody.py::MusicgenMelodyDecoderTest::test_batching_equivalence
tests/models/omdet_turbo/test_modeling_omdet_turbo.py::OmDetTurboModelTest::test_batching_equivalence
tests/models/owlv2/test_modeling_owlv2.py::Owlv2VisionModelTest::test_batching_equivalence
tests/models/owlv2/test_modeling_owlv2.py::Owlv2TextModelTest::test_batching_equivalence
tests/models/owlv2/test_modeling_owlv2.py::Owlv2ModelTest::test_batching_equivalence
tests/models/owlv2/test_modeling_owlv2.py::Owlv2ForObjectDetectionTest::test_batching_equivalence
tests/models/owlvit/test_modeling_owlvit.py::OwlViTVisionModelTest::test_batching_equivalence
tests/models/owlvit/test_modeling_owlvit.py::OwlViTTextModelTest::test_batching_equivalence
tests/models/owlvit/test_modeling_owlvit.py::OwlViTModelTest::test_batching_equivalence
tests/models/owlvit/test_modeling_owlvit.py::OwlViTForObjectDetectionTest::test_batching_equivalence
tests/models/wav2vec2/test_modeling_wav2vec2.py::Wav2Vec2ModelTest::test_batching_equivalence
tests/models/wav2vec2_conformer/test_modeling_wav2vec2_conformer.py::Wav2Vec2ConformerModelTest::test_batching_equivalence
tests/models/x_clip/test_modeling_x_clip.py::XCLIPModelTest::test_batching_equivalence

…itly. (huggingface#45248) * empty * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

khushali9 · 2026-04-10T21:38:23Z

+    # te that cuDNN conv and cuDNN RNN have different TF32 flags.This combination indicates that you have used a mix of the legacy and new APIs
+    #  to set the TF32 flags. We suggest only using the new API to set the TF32 flag(s).`.
+    # TODO: report a bug to `torch`
+    if hasattr(torch.backends.cudnn, "allow_tf32"):


Basically I feel like we are setting fp32_precision in enable_tf32 and then we are checking for allow_tf32 and it is giving runtime error.

This whole PR change is what happens inside enable_tf32(False) , I dont think allow_tf32 block is required , I do understand need for torch.backends.cudnn.conv block seeing the issue-1 being reported, for which I have kept a PR need, I need to get traction on it.

but if you still want to keep it we can do if-else
if hasattr(torch.backends.cudnn.conv, "fp32_precision"):
torch.backends.cudnn.conv.fp32_precision = "ieee"
else if hasattr(torch.backends.cudnn, "allow_tf32"):
torch.backends.cudnn.allow_tf32 = False

For issue-2 I have idea that I will add into the same PR.

These are just the suggetions before PR on pytorch is merged.

Hi @khushali9

I don't want to keep this block about allow_tf32, but currently without it, it will fail some (torch export) tests.
I prefer to wait your PR being merged and I will remove this block.

Thank you!

khushali9 · 2026-04-10T21:48:56Z

+
+    # This is necessary to make several `test_batching_equivalence` pass (within the tolerance `1e-5`)
+    if hasattr(torch.backends.cudnn.conv, "fp32_precision"):
+        torch.backends.cudnn.conv.fp32_precision = "ieee"


another easy one is just move this line torch.backends.cudnn.conv.fp32_precision = "ieee" in enable_tf32

…itly. (huggingface#45248) * empty * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

ydshieh added 2 commits April 4, 2026 20:51

empty

3f14413

fix

1cb7040

ydshieh changed the title ~~Fix tf32 issue~~ Fix tf32 issue: set torch.backends.cudnn.conv.fp32_precision explicitly. Apr 5, 2026

fix

8fd7c7f

fix

e70c3db

ydshieh merged commit 794d65f into main Apr 5, 2026
17 of 18 checks passed

ydshieh deleted the fix_tf32_issue branch April 5, 2026 09:42

This was referenced Apr 5, 2026

Add Youtu-LLM model #43166

Merged

Fix unexpected TF32 being enabled in testing #45252

Merged

ydshieh mentioned this pull request Apr 6, 2026

Add hasattr(torch.backends.cudnn, "conv") to conftest.py #45263

Merged

khushali9 reviewed Apr 10, 2026

View reviewed changes

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tf32 issue: set `torch.backends.cudnn.conv.fp32_precision` explicitly.#45248

Fix tf32 issue: set `torch.backends.cudnn.conv.fp32_precision` explicitly.#45248
ydshieh merged 4 commits intomainfrom
fix_tf32_issue

ydshieh commented Apr 5, 2026 •

edited

Loading

Uh oh!

ydshieh commented Apr 5, 2026

Uh oh!

github-actions Bot commented Apr 5, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Apr 5, 2026

Uh oh!

github-actions Bot commented Apr 5, 2026

Uh oh!

ydshieh commented Apr 5, 2026

Uh oh!

github-actions Bot commented Apr 5, 2026

Uh oh!

github-actions Bot commented Apr 5, 2026

Uh oh!

Uh oh!

ydshieh commented Apr 5, 2026

Uh oh!

khushali9 Apr 10, 2026

Uh oh!

ydshieh Apr 23, 2026

Uh oh!

khushali9 Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ydshieh commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

ydshieh commented Apr 5, 2026

Uh oh!

github-actions Bot commented Apr 5, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Apr 5, 2026

Uh oh!

github-actions Bot commented Apr 5, 2026

CI Results

Commit Info

Model CI Report

Uh oh!

ydshieh commented Apr 5, 2026

Uh oh!

github-actions Bot commented Apr 5, 2026

Uh oh!

github-actions Bot commented Apr 5, 2026

CI Results

Commit Info

Uh oh!

Uh oh!

ydshieh commented Apr 5, 2026

Uh oh!

khushali9 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

ydshieh Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

khushali9 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ydshieh commented Apr 5, 2026 •

edited

Loading