Skip to content

Multiple updates and refactorings#150

Merged
interestingLSY merged 2 commits intomainfrom
misc
Jan 16, 2026
Merged

Multiple updates and refactorings#150
interestingLSY merged 2 commits intomainfrom
misc

Conversation

@interestingLSY
Copy link
Copy Markdown
Collaborator

No description provided.

@interestingLSY interestingLSY merged commit ca58fed into main Jan 16, 2026
interestingLSY added a commit that referenced this pull request Jan 16, 2026
* Multiple updates and refactorings

* Remove dead code
@interestingLSY interestingLSY deleted the misc branch January 16, 2026 10:02
LucasWilkinson added a commit to vllm-project/FlashMLA that referenced this pull request Jan 21, 2026
* Update blog and README

* Fix error message

* Code format

* Update README

* Multiple updates and refactorings (deepseek-ai#150)

* Multiple updates and refactorings

* Remove dead code

* migrate to new structure

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

* add includes

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

* Add missing include<span>

Co-authored-by: baowending.bwd <baowending.bwd@alibaba-inc.com>

* clean

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

* Fix SM100 sparse FP8 decode metadata calculation

The num_sm_parts formula for sparse FP8 decode was using the SM90
formula for all architectures. On SM100, the kernel dispatch uses
different formulas (num_sms/s_q for head64/head64x2 vs num_sms/s_q/2
for head128), causing a shape mismatch error.

Fix by using architecture-specific formulas:
- SM100: num_sms / s_q (covers both head64x2 and head128)
- SM90: num_sms / s_q / (h_q/64)

---------

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Shengyu Liu <shengyuliu@deepseek.com>
Co-authored-by: Jiashi Li <js.li@high-flyer.cn>
Co-authored-by: baowending.bwd <baowending.bwd@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant