[kernel] add fused_qk_rmsnorm_per_token_quant kernel by gbyu-amd · Pull Request #2958 · ROCm/aiter

gbyu-amd · 2026-04-29T09:39:27Z

Motivation

Some quark models, e.g., amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 and amd/Kimi-K2-Thinking-MXFP4-AttnFP8 have fp8 weight linear layers in attn and adopt ptpc quant recipe, thus add fused_qk_rmsnorm_per_token_quant kernel in this pr which will be used in ATOM/vLLM-ATOM.

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

github-actions · 2026-04-29T09:40:17Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2958 --add-label <label>

add fused_qk_rmsnorm_per_token_quant kernel

582efef

gbyu-amd requested a review from a team April 29, 2026 09:39

Merge branch 'main' into guanbao/fuse_qknorm_per_token_quant

5759862

gbyu-amd mentioned this pull request Apr 29, 2026

[fix][acc] fix accuracy of fp8 attn weights model using ptpc quant recipe ROCm/ATOM#670

Open

1 task

make format happy

ce78e34

gbyu-amd marked this pull request as draft April 29, 2026 11:46

Merge branch 'main' into guanbao/fuse_qknorm_per_token_quant

9937e3a

gbyu-amd marked this pull request as ready for review April 29, 2026 13:24

Merge branch 'main' into guanbao/fuse_qknorm_per_token_quant

e77d654

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[kernel] add fused_qk_rmsnorm_per_token_quant kernel#2958

[kernel] add fused_qk_rmsnorm_per_token_quant kernel#2958
gbyu-amd wants to merge 5 commits intomainfrom
guanbao/fuse_qknorm_per_token_quant

gbyu-amd commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gbyu-amd commented Apr 29, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions Bot commented Apr 29, 2026

🏷️ CI Guide

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant