Skip to content

Upstream sync#12

Merged
LucasWilkinson merged 12 commits intomainfrom
lwilkinson/upstream-sync-1
Jan 21, 2026
Merged

Upstream sync#12
LucasWilkinson merged 12 commits intomainfrom
lwilkinson/upstream-sync-1

Conversation

@LucasWilkinson
Copy link
Copy Markdown
Collaborator

No description provided.

interestingLSY and others added 7 commits September 30, 2025 18:21
* Multiple updates and refactorings

* Remove dead code
…-sync-1

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
LucasWilkinson and others added 5 commits January 16, 2026 16:52
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: baowending.bwd <baowending.bwd@alibaba-inc.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
The num_sm_parts formula for sparse FP8 decode was using the SM90
formula for all architectures. On SM100, the kernel dispatch uses
different formulas (num_sms/s_q for head64/head64x2 vs num_sms/s_q/2
for head128), causing a shape mismatch error.

Fix by using architecture-specific formulas:
- SM100: num_sms / s_q (covers both head64x2 and head128)
- SM90: num_sms / s_q / (h_q/64)
@LucasWilkinson LucasWilkinson marked this pull request as ready for review January 21, 2026 06:35
@LucasWilkinson LucasWilkinson changed the title [WIP] Upstream sync Upstream sync Jan 21, 2026
@LucasWilkinson LucasWilkinson merged commit 16a272d into main Jan 21, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants