[Bugfix] Fix flash-attention func param mismatch and softmax_scale default value mistake on Ascend NPU by FightingZhen · Pull Request #37575 · huggingface/transformers

FightingZhen · 2025-04-17T09:07:23Z

What does this PR do?

After we support using Flash Attention on Ascend NPU(PR), we found 2 bugs in the implementation of Flash Attention on Ascend NPU. This PR is committed for solving both.

Bugfix1:
We found that there exists a kind of situation in the code of main branch, where func flash_attn_varlen_func is used without key-value style params passing, like following code:

transformers/src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

Line 203 in 3bc44ea

    
           attn_output = flash_attn_varlen_func(q, k, v, cu_seqlens, cu_seqlens, max_seqlen, max_seqlen).reshape(

In that case, params order between func flash_attn_varlen_func in package flash-attn and func npu_flash_attn_varlen_func in package transformers is not aligned, which may cause unexpected errors 😞

Therefore, we solve this problem by aligning mismatch params max_seqlen_q and max_seqlen_k.

At the same time, we have also checked the func npu_flash_attn_func, and it does not have param order mismatch problem.

Bugfix2:
For func npu_flash_attn_func and npu_flash_attn_varlen_func, when param softmax_scale is set to None, it should be set to 1.0 / sqrt(q.shape(-1)) as default value.

Fixes # (issue)
Not related.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

github-actions · 2025-04-17T09:07:36Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

FightingZhen · 2025-04-17T12:09:17Z

@ArthurZucker please help me review and merge it, thanks~

MekkCyber

LGTM thanks !

SunMarc

Thanks !

SunMarc · 2025-04-17T14:28:18Z

+    max_seqlen_q=None,
+    max_seqlen_k=None,


maybe leave a msg on why we put those here

@SunMarc Thanks for your suggestion, I have added code comments on both params.

…fault value mistake on Ascend NPU

…fault value mistake on Ascend NPU (huggingface#37575) [Bugfix] fix flash-attention func param mismatch and softmax_scale default value mistake on Ascend NPU Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

github-actions Bot marked this pull request as draft April 17, 2025 09:07

FightingZhen force-pushed the bugfix-npu-fa branch 2 times, most recently from 98b4473 to 8bc2bfb Compare April 17, 2025 12:02

FightingZhen marked this pull request as ready for review April 17, 2025 12:05

github-actions Bot requested review from MekkCyber and SunMarc April 17, 2025 12:05

FightingZhen force-pushed the bugfix-npu-fa branch from 8bc2bfb to b99b85b Compare April 17, 2025 12:13

FightingZhen changed the title ~~[Bugfix] Fix the parameter order mismatch between npu_flash_attn_varlen_func and flash_attn_varlen_func in flash-attn library~~ [Bugfix] Fix flash-attention func param mismatch and softmax_scale default value mistake on Ascend NPU Apr 17, 2025

MekkCyber approved these changes Apr 17, 2025

View reviewed changes

SunMarc approved these changes Apr 17, 2025

View reviewed changes

[Bugfix] fix flash-attention func param mismatch and softmax_scale de…

2d7c8e1

…fault value mistake on Ascend NPU

FightingZhen force-pushed the bugfix-npu-fa branch from b99b85b to 2d7c8e1 Compare April 18, 2025 01:18

Merge branch 'main' into bugfix-npu-fa

80c387c

MekkCyber merged commit aa17cfb into huggingface:main Apr 18, 2025
18 checks passed

llan-ml mentioned this pull request May 28, 2025

Qwen2.5-VL using ascend NPU with flash-attention-2 raises error #38189

Closed

4 tasks

FightingZhen deleted the bugfix-npu-fa branch August 14, 2025 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix flash-attention func param mismatch and softmax_scale default value mistake on Ascend NPU#37575

[Bugfix] Fix flash-attention func param mismatch and softmax_scale default value mistake on Ascend NPU#37575
MekkCyber merged 2 commits intohuggingface:mainfrom
FightingZhen:bugfix-npu-fa

FightingZhen commented Apr 17, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 17, 2025

Uh oh!

FightingZhen commented Apr 17, 2025

Uh oh!

MekkCyber left a comment

Uh oh!

SunMarc left a comment

Uh oh!

SunMarc Apr 17, 2025

Uh oh!

FightingZhen Apr 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

FightingZhen commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Uh oh!

github-actions Bot commented Apr 17, 2025

Uh oh!

FightingZhen commented Apr 17, 2025

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

FightingZhen Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FightingZhen commented Apr 17, 2025 •

edited

Loading

FightingZhen Apr 18, 2025 •

edited

Loading