Fix the FA2 logic in the longcat_flash model #42549

YangKai0616 · 2025-12-02T10:21:16Z

What does this PR do?

FA2 does not support MLA (i.e., cases where the dimensions of Q, K, and V heads are inconsistent), so skip this test.

YangKai0616 · 2025-12-02T10:22:32Z

@vasqu , please help review. Thanks!

vasqu · 2025-12-02T13:44:53Z

tests/models/longcat_flash/test_modeling_longcat_flash.py

+            if config.qk_head_dim != config.v_head_dim:
+                self.skipTest(
+                    reason="Flash Attention 2 requires qk_head_dim == v_head_dim, but got "
+                    f"qk_head_dim={config.qk_head_dim}, v_head_dim={config.v_head_dim}"
+                )


Will this not skip all tests here? I doubt that the classes will have different head dims.

I'd rather we properly adjust the sizes than to skip - should always be the last resort.

Yes, this test only involves this single model_class. After digging deeper into LongcatFlashForCausalLM, I found that it already implements the padding pre-processing for FA2 internally. However, for the fallback FA2 path(kernels), it failed to correctly match the parameter naming. This PR updates that part with the fix. Please review it again!

vasqu

Thx for iterating! I think we can generalize this some more to include all attentions? They will probably face similar issues

vasqu · 2025-12-03T13:26:16Z

src/transformers/models/longcat_flash/modular_longcat_flash.py

+        uses_flash_attention_2 = (
+            "flash" in self.config._attn_implementation and self.config._attn_implementation.endswith("2")
+        )
+        if uses_flash_attention_2 and self.qk_head_dim != self.v_head_dim:


Suggested change

uses_flash_attention_2 = (

"flash" in self.config._attn_implementation and self.config._attn_implementation.endswith("2")

)

if uses_flash_attention_2 and self.qk_head_dim != self.v_head_dim:

if "flash" in self.config._attn_implementation and self.qk_head_dim != self.v_head_dim:

I think we should generalize this here to check for all flavors. FA3 etc would face the same issue

Great! Done.

vasqu · 2025-12-03T13:26:40Z

src/transformers/models/longcat_flash/modular_longcat_flash.py

        )

-        if self.config._attn_implementation == "flash_attention_2" and self.qk_head_dim != self.v_head_dim:
+        if uses_flash_attention_2 and self.qk_head_dim != self.v_head_dim:


Suggested change

if uses_flash_attention_2 and self.qk_head_dim != self.v_head_dim:

if "flash" in self.config._attn_implementation and self.qk_head_dim != self.v_head_dim:

Same here then

github-actions · 2025-12-03T14:06:50Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: longcat_flash

vasqu

Perfect, let's merge

HuggingFaceDocBuilderDev · 2025-12-03T14:22:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* Matching FA2 naming under kernels * make style * convert model * Follow the comments

vasqu reviewed Dec 2, 2025

View reviewed changes

Matching FA2 naming under kernels

c492337

YangKai0616 force-pushed the fix-FA2-UT branch from 13038ae to c492337 Compare December 3, 2025 02:17

make style

11e4ac9

YangKai0616 force-pushed the fix-FA2-UT branch 2 times, most recently from 13038ae to 11e4ac9 Compare December 3, 2025 02:28

convert model

7739d36

YangKai0616 changed the title ~~Fixed FA2-MLA UT~~ Fix the FA2 logic in the longcat_flash model Dec 3, 2025

vasqu reviewed Dec 3, 2025

View reviewed changes

Follow the comments

37281d3

Merge branch 'main' into fix-FA2-UT

667fb64

vasqu approved these changes Dec 3, 2025

View reviewed changes

vasqu enabled auto-merge (squash) December 3, 2025 14:13

vasqu merged commit c0328af into huggingface:main Dec 3, 2025
17 checks passed

sarathc-cerebras pushed a commit to sarathc-cerebras/transformers that referenced this pull request Dec 7, 2025

Fix the FA2 logic in the longcat_flash model (huggingface#42549)

3f49d50

* Matching FA2 naming under kernels * make style * convert model * Follow the comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix the FA2 logic in the longcat_flash model #42549

Fix the FA2 logic in the longcat_flash model #42549

Uh oh!

YangKai0616 commented Dec 2, 2025

Uh oh!

YangKai0616 commented Dec 2, 2025

Uh oh!

vasqu Dec 2, 2025

Uh oh!

YangKai0616 Dec 3, 2025

Uh oh!

vasqu left a comment

Uh oh!

vasqu Dec 3, 2025

Uh oh!

YangKai0616 Dec 3, 2025

Uh oh!

vasqu Dec 3, 2025

Uh oh!

YangKai0616 Dec 3, 2025

Uh oh!

github-actions bot commented Dec 3, 2025

Uh oh!

vasqu left a comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if uses_flash_attention_2 and self.qk_head_dim != self.v_head_dim:
	if "flash" in self.config._attn_implementation and self.qk_head_dim != self.v_head_dim:

Fix the FA2 logic in the longcat_flash model #42549

Fix the FA2 logic in the longcat_flash model #42549

Uh oh!

Conversation

YangKai0616 commented Dec 2, 2025

What does this PR do?

Uh oh!

YangKai0616 commented Dec 2, 2025

Uh oh!

vasqu Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

YangKai0616 Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

YangKai0616 Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

YangKai0616 Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 3, 2025

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants