Skip to content

Replace QH16 bf16 kernel with a new one that does not use ptr_RP#2999

Open
JohnNikolay84 wants to merge 3 commits intomainfrom
mla_nheads32_fault_fix
Open

Replace QH16 bf16 kernel with a new one that does not use ptr_RP#2999
JohnNikolay84 wants to merge 3 commits intomainfrom
mla_nheads32_fault_fix

Conversation

@JohnNikolay84
Copy link
Copy Markdown
Contributor

Motivation

#2729 has introduced a new QH64 kernel that is not writing directly to ptr_RP and instead is writing split data into ptr_R/logits.

As this #2983 states other kernels like MLA_A16W16_1TG_4W_32mx1_16nx1_Coex0_Msk1_QH16.co do not follow the same logic and write into a null pointer instead.

Technical Details

This change is introducing a new kernel for nhead=32 bf16 that is using same convention as QH64 kernel. However I have not been able to find a kernel with mfma layouts 32x32x16, instead I am using the one with 16x16x32.

Test Plan

Run a new test in aiter and make sure it pass torch reference.
Run DeepSeek in TP4 and make sure it is not crashing.

Test Result

image

Submission Checklist

@JohnNikolay84 JohnNikolay84 self-assigned this May 1, 2026
@JohnNikolay84 JohnNikolay84 requested review from a team and fangche123 May 1, 2026 11:37
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-300x Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
ci:sglang SGLang integration tests
ci:atom ATOM benchmark (DeepSeek-R1 + GPT-OSS)
ci:vllm vLLM benchmark
ci:all All of the above

Add labels via the sidebar or gh pr edit 2999 --add-label <label>

@JohnNikolay84 JohnNikolay84 requested review from Zzz9990 and valarLip May 1, 2026 11:48
@JohnNikolay84 JohnNikolay84 force-pushed the mla_nheads32_fault_fix branch from ce19134 to 46d6983 Compare May 4, 2026 13:38
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant