Skip to content

Conversation

@YangKai0616
Copy link
Contributor

What does this PR do?

FA2 does not support MLA (i.e., cases where the dimensions of Q, K, and V heads are inconsistent), so skip this test.

@YangKai0616
Copy link
Contributor Author

@vasqu , please help review. Thanks!

Comment on lines 301 to 305
if config.qk_head_dim != config.v_head_dim:
self.skipTest(
reason="Flash Attention 2 requires qk_head_dim == v_head_dim, but got "
f"qk_head_dim={config.qk_head_dim}, v_head_dim={config.v_head_dim}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this not skip all tests here? I doubt that the classes will have different head dims.

I'd rather we properly adjust the sizes than to skip - should always be the last resort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this test only involves this single model_class. After digging deeper into LongcatFlashForCausalLM, I found that it already implements the padding pre-processing for FA2 internally. However, for the fallback FA2 path(kernels), it failed to correctly match the parameter naming. This PR updates that part with the fix. Please review it again!

@YangKai0616 YangKai0616 force-pushed the fix-FA2-UT branch 2 times, most recently from 13038ae to 11e4ac9 Compare December 3, 2025 02:28
@YangKai0616 YangKai0616 changed the title Fixed FA2-MLA UT Fix the FA2 logic in the longcat_flash model Dec 3, 2025
Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for iterating! I think we can generalize this some more to include all attentions? They will probably face similar issues

Comment on lines 218 to 221
uses_flash_attention_2 = (
"flash" in self.config._attn_implementation and self.config._attn_implementation.endswith("2")
)
if uses_flash_attention_2 and self.qk_head_dim != self.v_head_dim:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uses_flash_attention_2 = (
"flash" in self.config._attn_implementation and self.config._attn_implementation.endswith("2")
)
if uses_flash_attention_2 and self.qk_head_dim != self.v_head_dim:
if "flash" in self.config._attn_implementation and self.qk_head_dim != self.v_head_dim:

I think we should generalize this here to check for all flavors. FA3 etc would face the same issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Done.

)

if self.config._attn_implementation == "flash_attention_2" and self.qk_head_dim != self.v_head_dim:
if uses_flash_attention_2 and self.qk_head_dim != self.v_head_dim:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if uses_flash_attention_2 and self.qk_head_dim != self.v_head_dim:
if "flash" in self.config._attn_implementation and self.qk_head_dim != self.v_head_dim:

Same here then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: longcat_flash

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, let's merge

@vasqu vasqu enabled auto-merge (squash) December 3, 2025 14:13
@vasqu vasqu merged commit c0328af into huggingface:main Dec 3, 2025
17 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sarathc-cerebras pushed a commit to sarathc-cerebras/transformers that referenced this pull request Dec 7, 2025
* Matching FA2 naming under kernels

* make style

* convert model

* Follow the comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants