Replace QH16 bf16 kernel with a new one that does not use ptr_RP by JohnNikolay84 · Pull Request #2999 · ROCm/aiter

JohnNikolay84 · 2026-05-01T11:37:41Z

Motivation

#2729 has introduced a new QH64 kernel that is not writing directly to ptr_RP and instead is writing split data into ptr_R/logits.

As this #2983 states other kernels like MLA_A16W16_1TG_4W_32mx1_16nx1_Coex0_Msk1_QH16.co do not follow the same logic and write into a null pointer instead.

Technical Details

This change is introducing a new kernel for nhead=32 bf16 that is using same convention as QH64 kernel. However I have not been able to find a kernel with mfma layouts 32x32x16, instead I am using the one with 16x16x32.

Test Plan

Run a new test in aiter and make sure it pass torch reference.
Run DeepSeek in TP4 and make sure it is not crashing.

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

github-actions · 2026-05-01T11:38:12Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2999 --add-label <label>

Co-authored-by: Cursor <cursoragent@cursor.com>

JohnNikolay84 self-assigned this May 1, 2026

JohnNikolay84 requested review from a team and fangche123 May 1, 2026 11:37

JohnNikolay84 requested review from Zzz9990 and valarLip May 1, 2026 11:48

Sergey Solo added 2 commits May 4, 2026 13:37

Replace QH16 bf16 kernel with a new one that does not use ptr_RP

808b6ae

Fix OOB access

46d6983

JohnNikolay84 force-pushed the mla_nheads32_fault_fix branch from ce19134 to 46d6983 Compare May 4, 2026 13:38

Formatting fix

f75e79e

Co-authored-by: Cursor <cursoragent@cursor.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace QH16 bf16 kernel with a new one that does not use ptr_RP#2999

Replace QH16 bf16 kernel with a new one that does not use ptr_RP#2999
JohnNikolay84 wants to merge 3 commits intomainfrom
mla_nheads32_fault_fix

JohnNikolay84 commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JohnNikolay84 commented May 1, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions Bot commented May 1, 2026

🏷️ CI Guide

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant