Fix flash_attention_3 detection and import for hopper wheel installs by albertorkive · Pull Request #45387 · huggingface/transformers

albertorkive · 2026-04-12T17:02:14Z

What does this PR do?

Fixes attn_implementation="flash_attention_3" which is currently broken for the most common FA3 install method — the hopper wheel built from flash-attention/hopper/.

Three issues fixed:

is_flash_attn_3_available() returns False even when FA3 is installed. The check looks for "flash-attn-3" in PACKAGE_DISTRIBUTION_MAPPING["flash_attn_interface"], but the hopper wheel doesn't register under that distribution name. Fix: try the actual imports (flash_attn_interface and hopper.flash_attn_interface) instead of relying on package metadata.
_lazy_imports() fails at runtime even if detection passes. It only tries from flash_attn_interface import ..., which fails for the hopper wheel (which exposes hopper.flash_attn_interface). Fix: try both import paths with a fallback.
Compatibility matrix lists wrong minimum CUDA version. cuda_min_major_version is set to 8 (Ampere) for FA3, but FA3 kernels require Hopper (sm90+). Fix: set to 9.

Context: FA3 support was added in #38972, but the detection relies on package distribution metadata that doesn't match how most users actually install FA3 (building from the hopper/ subdirectory of the flash-attention repo, or using pre-built hopper wheels). The flash-dispatch package (a Flash Attention ecosystem tool) solves this same problem by probing both import paths — this PR applies the same approach directly in transformers.

Files changed

src/transformers/utils/import_utils.py — rewrite is_flash_attn_3_available() to probe actual imports
src/transformers/modeling_flash_attention_utils.py — try both FA3 import paths in _lazy_imports(), fix compat matrix

How to reproduce the bug

# Install FA3 hopper wheel (the common method)
cd flash-attention/hopper && pip install .

# This returns False even though FA3 is installed:
from transformers.utils.import_utils import is_flash_attn_3_available
print(is_flash_attn_3_available())  # False — should be True

# This fails at model load time:
model = AutoModel.from_pretrained("...", attn_implementation="flash_attention_3")
# ValueError: flash_attention_3 not available

After this PR, both work correctly.

is_flash_attn_3_available() checked package distribution metadata for "flash-attn-3", but the hopper wheel (built from flash-attention/hopper/) doesn't register under that distribution name. Replace metadata check with actual import probing of both known module paths: 1. flash_attn_interface (standalone flash-attn-3 wheel) 2. hopper.flash_attn_interface (built from flash-attention/hopper/) Also fix _lazy_imports() to try both paths, and correct the compatibility matrix minimum CUDA version from 8 (Ampere) to 9 (Hopper) — FA3 kernels require sm90+.

github-actions · 2026-04-12T17:32:42Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45387&sha=d4c3f2

Rocketknight1 · 2026-04-13T13:30:49Z

cc @ArthurZucker @Cyrilvallez @vasqu but not sure if it's real or code agent hallucination

vasqu

This is not the most common way to install FA3, you should use python setup.py install, e.g. see the recommended way in https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#flashattention-3-beta-release

This seems more like a broken install tbh and we shouldn't support it: Kv cache function missing, different exposing of the official functions in the docs (readme).

I tend to say it might be some agent tbh

vasqu · 2026-04-13T15:24:42Z

+    #   1. "flash_attn_interface"          — standalone flash-attn-3 wheel
+    #   2. "hopper.flash_attn_interface"   — built from flash-attention/hopper/


https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#flashattention-3-beta-release shows the usage to be

import flash_attn_interface flash_attn_interface.flash_attn_func()

So I don't see why hopper.flash_attn_interface should be valid - that implies some weird structure inherited from pip which shouldn't be used

vasqu · 2026-04-13T15:25:18Z

        "general_availability_check": is_flash_attn_3_available,
-        "pkg_availability_check": lambda *args, **kwargs: importlib.util.find_spec("flash_attn_interface") is not None
-        and "flash-attn-3" in [pkg.replace("_", "-") for pkg in PACKAGE_DISTRIBUTION_MAPPING["flash_attn_interface"]],
+        "pkg_availability_check": is_flash_attn_3_available,  # defers to actual import probing


This would also use cuda checks, we only want to check for metadata here

vasqu · 2026-04-13T15:25:35Z

+        "pkg_availability_check": is_flash_attn_3_available,  # defers to actual import probing
        "supported_devices": ((is_torch_cuda_available, "cuda"),),
-        "cuda_min_major_version": 8,  # Ampere
+        "cuda_min_major_version": 9,  # Hopper (sm90+) — FA3 does NOT run on Ampere


Plain wrong, and have checked since ages that ampere works

vasqu · 2026-04-13T15:27:01Z

+            except ImportError:
+                from hopper.flash_attn_interface import flash_attn_func, flash_attn_varlen_func  # type: ignore[no-redef]
+
+                flash_attn_with_kvcache = None  # hopper wheel may not expose this


Honestly, given that this weird install doesn't have the kv cache function, it implies something is seriously wrong because the existing functions are not properly exposed.

albertorkive · 2026-04-13T16:14:20Z

Fair, closing this. The problem I ran into was the metadata check in is_flash_attn_3_available() installed FA3 correctly from the hopper wheel but PACKAGE_DISTRIBUTION_MAPPING didn't pick it up, so detection returned false. Went too broad with the fix. I'll verify and open an issue if it still reproduces.

albertorkive added 2 commits April 12, 2026 18:47

Apply ruff formatting to compatibility matrix lambdas

d4c3f29

vasqu reviewed Apr 13, 2026

View reviewed changes

albertorkive closed this Apr 13, 2026

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flash_attention_3 detection and import for hopper wheel installs#45387

Fix flash_attention_3 detection and import for hopper wheel installs#45387
albertorkive wants to merge 2 commits intohuggingface:mainfrom
albertorkive:flash-attention-3-backend

albertorkive commented Apr 12, 2026

Uh oh!

github-actions Bot commented Apr 12, 2026

Uh oh!

Rocketknight1 commented Apr 13, 2026 •

edited

Loading

Uh oh!

vasqu left a comment

Uh oh!

vasqu Apr 13, 2026

Uh oh!

vasqu Apr 13, 2026

Uh oh!

vasqu Apr 13, 2026

Uh oh!

vasqu Apr 13, 2026

Uh oh!

albertorkive commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# 1. "flash_attn_interface" — standalone flash-attn-3 wheel
		# 2. "hopper.flash_attn_interface" — built from flash-attention/hopper/

Conversation

albertorkive commented Apr 12, 2026

What does this PR do?

Files changed

How to reproduce the bug

Uh oh!

github-actions Bot commented Apr 12, 2026

Uh oh!

Rocketknight1 commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

albertorkive commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Rocketknight1 commented Apr 13, 2026 •

edited

Loading