[refactor] set attention implementation by zucchini-nlp · Pull Request #38974 · huggingface/transformers

zucchini-nlp · 2025-06-23T07:02:28Z

What does this PR do?

As per title, refactors attention implementation setting and makes it a public API. We should encourage users to model.set_attn_implementation() whenever they want to change it after loading the model, instead of setting config's private attr model.config._attn_implementation="sdpa"

After the clean-up, we will be calling attention implementation setter only once per pretrained model class, when init the module. Since from_pretrained/from_config at the end call init, we don't need to keep it as a classmethod. Also setting attention after init allows us to know which backbones support attn or do not, and might be useful of we want to early raise errors in the future versions.

Also, removed redundant flags for FA2/FA3. Realized that we can use one flag for both versions :)

HuggingFaceDocBuilderDev · 2025-06-23T07:15:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2025-07-07T10:03:53Z

Failing tests are unrelated, ready for review

zucchini-nlp · 2025-07-07T10:04:20Z

run-slow: bark, blip_2, instructblipvideo, modernbert, qwen2_5_vl, qwen2_vl, zamba

github-actions · 2025-07-07T10:05:46Z

This comment contains run-slow, running the specified jobs:

models: ['models/bark', 'models/blip_2', 'models/instructblipvideo', 'models/modernbert', 'models/qwen2_5_vl', 'models/qwen2_vl', 'models/zamba']
quantizations: [] ...

zucchini-nlp · 2025-07-10T10:17:19Z

removed redundant flags for FA2/FA3. Realized that we can use one flag for both versions :)

Cyrilvallez

Extremely nice and welcome PR! Super glad to see this become a public API, and to simplify how it's done overall! 🤗🚀

Cyrilvallez · 2025-07-10T18:01:54Z

+            # package `flash-attn` can not be installed on Ascend NPU, ignore related validation logi
+            if importlib.util.find_spec("flash_attn") is None and not is_torch_npu_available():
+                raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
+            else:
+                # Check FA2 installed version compatibility
+                flash_attention_version = version.parse(importlib.metadata.version("flash_attn"))
+                if torch.version.cuda:
+                    if flash_attention_version < version.parse("2.1.0"):
+                        raise ImportError(
+                            f"{preface} you need flash_attn package version to be greater or equal than 2.1.0. Detected version {flash_attention_version}. {install_message}"
+                        )
+                    elif not torch.cuda.is_available():
+                        raise ValueError(
+                            f"{preface} Flash Attention 2 is not available on CPU. Please make sure torch can access a CUDA device."
+                        )
+                    else:
+                        raise ImportError(f"{preface} Flash Attention 2 is not available. {install_message}")
+                elif torch.version.hip:
+                    if flash_attention_version < version.parse("2.0.4"):
+                        raise ImportError(
+                            f"{preface} you need flash_attn package version to be greater or equal than 2.0.4. Detected version {flash_attention_version}. {install_message}"
+                        )
+                    else:
+                        raise ImportError(f"{preface} Flash Attention 2 is not available. {install_message}")


Don't we have a simple is_fa2_installed somewhere for all this? (same comment for fa3)

Huh, indeed, ig we are checking the exact issue here why not is_flash_attn_2_available() to raise a proper and informative error. The import checker utils like is_flash_attn_2_available() usually return a simple boolean, and raise no errors

We try to raise informative errors in modeling code only, so I don't think we need to move this helper in import_utilis. WDYT?

Alright, we can keep as-is, at least for now! This is already a big PR!

Cyrilvallez · 2025-07-10T18:09:49Z

-                attention_mask_tensor = attention_mask_tensor / torch.finfo(attention_mask_tensor.dtype).min
-                attention_mask_tensor = (1.0 - attention_mask_tensor).int()
+                # Invert if floating, some attention interfaces pass already a boolean 4D mask
+                if attention_mask_tensor.is_floating_point():
+                    attention_mask_tensor = attention_mask_tensor / torch.finfo(attention_mask_tensor.dtype).min
+                    attention_mask_tensor = (1.0 - attention_mask_tensor).int()


Looks nasty indeed, but unrelated right?

It is related unfortunately. The attention tensor at some point with SDPA came in as boolean 4D mask, while in eager mode it is a floating point mask. AFAIK we support both types of masks from users

This part tries to revert ops back and get a 2D boolean mask, if a 4D mask is found. The 2D mask is later used by Qwen's special 3D position ids constructor. Actually it is same as #39333, just realized

I see but what I meant is it's still an issue independently of this refactor! Fine to fix it here though no worries!

Cyrilvallez · 2025-07-10T18:10:31Z

-                model = model_class(config).to(torch_device).to(dtype).eval()
+                model = model_class(copy.deepcopy(config)).to(torch_device).to(dtype).eval()


Can you clarify why we need to start deepcopy all the configs? 🤗

Oh yeah, it was needed because when we do Model(config) the config changes its attention implementation in-place. It was like that always tbh, and I added deepcopy at some point when trying to remove config._attn_implementation_autoset attribute

Right now I reverted it back, it caused more problems than I thought. We could revert deepcopy as well, but I think it's more robust to keep it and ensure no in-place changes in original config. Another option is to use only Model.from_config() in tests, it deepcopies config internally

zucchini-nlp · 2025-07-11T04:27:51Z

We also need to make proper documentation about attentions, I am stuck at finding the right place. Existing docs are a bit dispersed and don't explain much how the API works internally :(

Let's leave it for next PR, might need to change doc structure slightly

github-actions · 2025-07-14T14:38:02Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, arcee, aria, audio_spectrogram_transformer, aya_vision, bamba, bark, bart, biogpt, bitnet, blenderbot, blenderbot_small, blip_2, chameleon, clip, cohere

Cyrilvallez

Alright, let's merge it!! 🔥🔥

Cyrilvallez · 2025-07-14T15:03:17Z

(But indeed nice documentation about this seem super important for users, let's keep it in mind somewhere!)

zucchini-nlp · 2025-07-15T07:34:00Z

Yep, doc coming in the next PR

ArthurZucker

Nice!

* update * fix some tests * init from config, changes it in-place, add deepcopy in tests * fix modernbert * don't delete thsi config attr * update * style and copies * skip tests in generation * fix style * accidentally removed flash-attn-3, revert * docs * forgot about flags set to False * fix copies * address a few comments * fix copies * custom code BC

rangehow · 2025-07-29T08:18:27Z

This PR does not seem to work as expected. It breaks the default attention mechanism of ModernBERT (it should be fa2 when the user does not specify it, but now it is sdpa). Moreover, this function is not actually called when the model is constructed through class initialization.

@zucchini-nlp @ArthurZucker

* update * fix some tests * init from config, changes it in-place, add deepcopy in tests * fix modernbert * don't delete thsi config attr * update * style and copies * skip tests in generation * fix style * accidentally removed flash-attn-3, revert * docs * forgot about flags set to False * fix copies * address a few comments * fix copies * custom code BC

update

ddf4b32

zucchini-nlp commented Jun 23, 2025

View reviewed changes

Comment thread src/transformers/models/dpt/modeling_dpt.py Outdated

zucchini-nlp added 7 commits June 23, 2025 10:34

fix some tests

eea69d4

init from config, changes it in-place, add deepcopy in tests

b48e285

fix modernbert

06d0a63

don't delete thsi config attr

9ae9b56

update

abc11b5

Merge remote-tracking branch 'upstream/main' into refactor-set-attn-impl

6b1b3b3

style and copies

b2f0527

zucchini-nlp requested a review from Cyrilvallez July 7, 2025 10:04

skip tests in generation

803c8c6

zucchini-nlp mentioned this pull request Jul 8, 2025

Skip test_eager_matches sdpa generate and update an integration test for blip-like models #39248

Merged

zucchini-nlp added 6 commits July 10, 2025 10:47

Merge remote-tracking branch 'upstream/main' into refactor-set-attn-impl

1d02d65

fix style

7a66dc8

accidentally removed flash-attn-3, revert

594ff33

docs

0e1e296

forgot about flags set to False

3bc323d

fix copies

4bcfcbf

Cyrilvallez reviewed Jul 10, 2025

View reviewed changes

zucchini-nlp added 3 commits July 11, 2025 06:32

address a few comments

0d5d1b4

Merge branch 'main' into refactor-set-attn-impl

9bb24f6

Merge branch 'main' into refactor-set-attn-impl

2259a4a

qubvel reviewed Jul 14, 2025

View reviewed changes

Comment thread src/transformers/modeling_utils.py Outdated

fix copies

66c4511

custom code BC

f6eeeeb

Cyrilvallez approved these changes Jul 14, 2025

View reviewed changes

zucchini-nlp merged commit 8d6259b into huggingface:main Jul 15, 2025
25 checks passed

ArthurZucker reviewed Jul 16, 2025

View reviewed changes

rangehow mentioned this pull request Jul 29, 2025

ModernBERT has been totally destroyed by PR #38974 and #38838 #39747

Closed

		model = model_class(config).to(torch_device).to(dtype).eval()
		model = model_class(copy.deepcopy(config)).to(torch_device).to(dtype).eval()

Conversation

zucchini-nlp commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jun 23, 2025

Uh oh!

Uh oh!

zucchini-nlp commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp commented Jul 7, 2025

Uh oh!

github-actions Bot commented Jul 7, 2025

Uh oh!

zucchini-nlp commented Jul 10, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Cyrilvallez Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Jul 11, 2025

Uh oh!

Uh oh!

github-actions Bot commented Jul 14, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez commented Jul 14, 2025

Uh oh!

zucchini-nlp commented Jul 15, 2025

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

rangehow commented Jul 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zucchini-nlp commented Jun 23, 2025 •

edited

Loading

zucchini-nlp commented Jul 7, 2025 •

edited

Loading

zucchini-nlp Jul 11, 2025 •

edited

Loading