Skip to content

Make TF32 tests hardware-aware for PyTorch 2.9+#43151

Open
Shraman123 wants to merge 4 commits intohuggingface:mainfrom
Shraman123:fix-tf32-tests-hardware-aware
Open

Make TF32 tests hardware-aware for PyTorch 2.9+#43151
Shraman123 wants to merge 4 commits intohuggingface:mainfrom
Shraman123:fix-tf32-tests-hardware-aware

Conversation

@Shraman123
Copy link
Copy Markdown

Fixes #42371

TF32 tests assumed fp32_precision == "tf32" after enabling, which is not true on
CPU-only or unsupported hardware. PyTorch reports "none" in those cases.

This change:

  • Makes TF32 tests hardware-aware via is_torch_tf32_available
  • Supports PyTorch >= 2.9 fp32_precision API
  • Preserves legacy allow_tf32 behavior
  • Restores backend state after tests

@khushali9
Copy link
Copy Markdown
Contributor

@Shraman123 I dont think we can write torch tests, cause CI does not have multiple version of torch.

@khushali9
Copy link
Copy Markdown
Contributor

@Shraman123 looking into this further, seems like pytorch issue, can you try filing but and link the issue from Huggingface. As I just checked another comment on the issue and in traceback I see pytorch inductor still using old API. thanks.

Copy link
Copy Markdown
Contributor

@khushali9 khushali9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for removing the tests, but then I do not see any other files changed, are you sure all your changes are in. Thanks

@Shraman123
Copy link
Copy Markdown
Author

Thanks for pointing this out.

After removing the TF32 test, there are no additional functional changes left in this PR. This is intentional — the behavior was determined to be PyTorch-owned rather than a Transformers bug.

The purpose of this PR is therefore limited to removing a fragile test that relies on PyTorch internal precision semantics, which cannot be reliably validated in CI.

Please let me know if you’d prefer closing this PR instead, or if removing the test alone is acceptable.

Copy link
Copy Markdown
Contributor

@khushali9 khushali9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shraman123 aah cool if you tested this and as we determined to be pytorch issue, we can just close this as this PR does not have any changes to merge. cc - @Rocketknight1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Please use the new API settings to control TF32 behavior, ...

2 participants