-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Pull requests: Dao-AILab/flash-attention
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[ROCM] Add support with Infinity Cache (LLC) awareness for improved performance
#2147
opened Jan 7, 2026 by
tianwyan
Loading…
[CUTE][SM100] Fix backward gqa on sm100 post mask-mod semantic change
#2146
opened Jan 7, 2026 by
drisspg
Loading…
Add FLASH_ATTENTION_FORCE_NON_STABLE_API option to allow building on NVidia Pytorch 25.09 image
#2140
opened Jan 5, 2026 by
jp-gr
Loading…
[Cute] API update to include FlexAttention parameters
#2138
opened Jan 5, 2026 by
reubenconducts
Loading…
Fix AMD Triton backend crash when dropout != 0 and return_attn_probs …
#2111
opened Dec 30, 2025 by
Logiquo
Loading…
[Cute,Fwd,Sm100] distributed offset calculation for paged KV
#2104
opened Dec 28, 2025 by
timmy-feng
Loading…
[Cute,Fwd,Sm80] Fix fwd + add pack_gqa/local/learnable_sink
#2103
opened Dec 28, 2025 by
GWS0428
Loading…
[Cute,Fwd,SM100] Add softmax precision control parameters: rescale_threshold and disable_e2e
#2067
opened Dec 15, 2025 by
ssxuwinter
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2026-01-05.