Skip to content

Fix cuda memory access violation in GQA FlashAttention#24447

Merged
RyanUnderhill merged 1 commit intomainfrom
ryanunderhill/flashattention_crash_fix
Apr 17, 2025
Merged

Fix cuda memory access violation in GQA FlashAttention#24447
RyanUnderhill merged 1 commit intomainfrom
ryanunderhill/flashattention_crash_fix

Conversation

@RyanUnderhill
Copy link
Contributor

Description

zeros_ memory buffer was uninitialized, but it must be initialized to zero.

Motivation and Context

A memory allocator change in GenAI started crashing in FlashAttention and this was eventually tracked down to be the cause. The allocator change was innocent. I'm not sure how this didn't fail previously, or if it was we weren't getting the reports about it.

@tianleiwu tianleiwu changed the title Fix cuda memory access violation in FlashAttention Fix cuda memory access violation in GQA FlashAttention Apr 16, 2025
@RyanUnderhill RyanUnderhill merged commit 99f2b80 into main Apr 17, 2025
84 of 89 checks passed
@RyanUnderhill RyanUnderhill deleted the ryanunderhill/flashattention_crash_fix branch April 17, 2025 00:36
ashrit-ms pushed a commit that referenced this pull request Apr 24, 2025
### Description
zeros_ memory buffer was uninitialized, but it must be initialized to
zero.


### Motivation and Context
A memory allocator change in GenAI started crashing in FlashAttention
and this was eventually tracked down to be the cause. The allocator
change was innocent. I'm not sure how this didn't fail previously, or if
it was we weren't getting the reports about it.

Co-authored-by: Ryan Hill <{ID}+{username}@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants