Skip to content

[Bugfix] Fix flash-attention func param mismatch and softmax_scale default value mistake on Ascend NPU#37575

Merged
MekkCyber merged 2 commits intohuggingface:mainfrom
FightingZhen:bugfix-npu-fa
Apr 18, 2025
Merged

[Bugfix] Fix flash-attention func param mismatch and softmax_scale default value mistake on Ascend NPU#37575
MekkCyber merged 2 commits intohuggingface:mainfrom
FightingZhen:bugfix-npu-fa

Conversation

@FightingZhen
Copy link
Copy Markdown
Contributor

@FightingZhen FightingZhen commented Apr 17, 2025

What does this PR do?

After we support using Flash Attention on Ascend NPU(PR), we found 2 bugs in the implementation of Flash Attention on Ascend NPU. This PR is committed for solving both.

Bugfix1:
We found that there exists a kind of situation in the code of main branch, where func flash_attn_varlen_func is used without key-value style params passing, like following code:

attn_output = flash_attn_varlen_func(q, k, v, cu_seqlens, cu_seqlens, max_seqlen, max_seqlen).reshape(

In that case, params order between func flash_attn_varlen_func in package flash-attn and func npu_flash_attn_varlen_func in package transformers is not aligned, which may cause unexpected errors 😞

Therefore, we solve this problem by aligning mismatch params max_seqlen_q and max_seqlen_k.

At the same time, we have also checked the func npu_flash_attn_func, and it does not have param order mismatch problem.

Bugfix2:
For func npu_flash_attn_func and npu_flash_attn_varlen_func, when param softmax_scale is set to None, it should be set to 1.0 / sqrt(q.shape(-1)) as default value.

Fixes # (issue)
Not related.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

@github-actions github-actions Bot marked this pull request as draft April 17, 2025 09:07
@github-actions
Copy link
Copy Markdown
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@FightingZhen FightingZhen force-pushed the bugfix-npu-fa branch 2 times, most recently from 98b4473 to 8bc2bfb Compare April 17, 2025 12:02
@FightingZhen FightingZhen marked this pull request as ready for review April 17, 2025 12:05
@github-actions github-actions Bot requested review from MekkCyber and SunMarc April 17, 2025 12:05
@FightingZhen
Copy link
Copy Markdown
Contributor Author

@ArthurZucker please help me review and merge it, thanks~

@FightingZhen FightingZhen changed the title [Bugfix] Fix the parameter order mismatch between npu_flash_attn_varlen_func and flash_attn_varlen_func in flash-attn library [Bugfix] Fix flash-attention func param mismatch and softmax_scale default value mistake on Ascend NPU Apr 17, 2025
Copy link
Copy Markdown
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks !

Copy link
Copy Markdown
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

Comment on lines +197 to +198
max_seqlen_q=None,
max_seqlen_k=None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe leave a msg on why we put those here

Copy link
Copy Markdown
Contributor Author

@FightingZhen FightingZhen Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SunMarc Thanks for your suggestion, I have added code comments on both params.

@MekkCyber MekkCyber merged commit aa17cfb into huggingface:main Apr 18, 2025
18 checks passed
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
…fault value mistake on Ascend NPU (huggingface#37575)

[Bugfix] fix flash-attention func param mismatch and softmax_scale default value mistake on Ascend NPU

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
@FightingZhen FightingZhen deleted the bugfix-npu-fa branch August 14, 2025 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants