Skip to content

Performance Update (2025.04.22)#71

Merged
beginlner merged 10 commits intodeepseek-ai:mainfrom
interestingLSY:main
Apr 22, 2025
Merged

Performance Update (2025.04.22)#71
beginlner merged 10 commits intodeepseek-ai:mainfrom
interestingLSY:main

Conversation

@interestingLSY
Copy link
Copy Markdown
Collaborator

  • The new release of Flash MLA, which delivers 5% ~ 15% performance improvement on compute-bound workloads, achieving up to 660 TFlops on NVIDIA H800 SXM5 GPUs.
  • The interface of the new version is fully compatible with the old one.
  • A deep-dive blog is provided

@beginlner beginlner merged commit c2067be into deepseek-ai:main Apr 22, 2025
LucasWilkinson pushed a commit to vllm-project/FlashMLA that referenced this pull request Aug 1, 2025
* Fix benchmark script

* Performance optimization for compute-bound cases

* Add new testcase (s_k = 16384)

* Update README.md

* Update comment

* Update README.md

* Add the deep-dive blog

* Add background color for MLA Kernel Sched.drawio.svg

* Use relative path for the schedule image

* Move flash_mla.h to kernels/params.h

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants