[feat][plugin] make ATOM mla attention works for vllm by XiaobingSuper · Pull Request #265 · ROCm/ATOM

XiaobingSuper · 2026-03-04T11:49:42Z

Motivation

Following #126, this PR makes ATOM mla attention work for the vLLM plugin model. Note: the sparse mla is not supported now and will be implemented in the next step.

Technical Details

The design tails can be seen in #126.

Test Plan

This PR does a test for Kimi-K2-Thinking-MXFP4 mode with TP4 on mi355:

export SAFETENSORS_FAST_GPU=1
export VLLM_ROCM_USE_AITER=1
export VLLM_RPC_TIMEOUT=1800000

export VLLM_CACHE_ROOT=/root/.cache/vllm
export TORCHINDUCTOR_CACHE_DIR=/root/.cache/inductor
export HIP_VISIBLE_DEVICES=0,1,2,3
# quick allreduce
export AITER_QUICK_REDUCE_QUANTIZATION=INT4
export ATOM_PROFILER_MORE=1

export VLLM_TORCH_PROFILER_RECORD_SHAPES=1

model_path= Kimi-K2-Thinking-MXFP4
vllm serve $model_path \
    --host localhost \
    --port 8001 \
    --tensor-parallel-size 4 \
    --enable-expert-parallel \
    --trust-remote-code \
    --disable-log-requests \
    --gpu_memory_utilization 0.9 \
    --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
    --kv-cache-dtype fp8 \
    --max-num-batched-tokens 18432 \
    --max-model-len 16384 \
    --no-enable-prefix-caching

Test Result

gsmk result"

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.9371|±  |0.0067|
|     |       |strict-match    |     3|exact_match|↑  |0.9363|±  |0.0067|

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

Adds vLLM plugin-mode support for ATOM’s MLA attention path (non-sparse), including backend selection, metadata plumbing, and DeepSeek V3 model registration/loading so MLA can run end-to-end under vLLM.

Changes:

Route vLLM’s use_mla attention selection to an ATOM MLA backend and add MLA-specific plugin-mode metadata builders.
Implement plugin-mode MLA forward/prefill/decode logic (including positions capture for graph mode).
Register DeepSeek V3 as a supported vLLM plugin model and add a plugin-mode load_weights implementation.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
atom/utils/backends.py	Extends compilation-cache hashing to ignore `<frozen os>` traced “files”.
atom/plugin/vllm/register.py	Patches vLLM `process_weights_after_loading` for Attention/MLAAttention.
atom/plugin/vllm/platform.py	Selects ATOM MLA backend when `attn_selector_config.use_mla` is true.
atom/plugin/vllm/model_wrapper.py	Copies `positions` into a static buffer for graph-mode MLA correctness.
atom/plugin/attention_mla.py	New: plugin-mode MLAAttention implementation helpers (prefill/decode/DCP).
atom/plugin/attention.py	Adds MLA plugin-mode metadata builders + backend wiring; renames plugin metadata class.
atom/models/deepseek_v2.py	Adds DeepSeek V3 support + plugin-mode `load_weights`.
atom/model_ops/utils.py	Removes duplicate `per_tensor_dequantize` implementation (keeps the canonical one).
atom/model_ops/paged_attention.py	Integrates vLLM MLAAttention usage and allocates a shared `positions` buffer.
atom/model_ops/linear.py	Ensures activation tensor is contiguous before quantizer `.view()` calls.
atom/model_ops/base_attention.py	Adjusts MLA unified-attn path to apply `o_proj` outside MLA impl.
atom/model_ops/attentions/aiter_mla.py	Decorates MLA backend/builder for plugin mode; builder init adjustments.
atom/model_ops/attentions/aiter_attention.py	Removes unused import.
atom/model_ops/attention_mla.py	Adds plugin-mode hooks/decorator and splits `v_up` and `o_proj` responsibilities.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

XiaobingSuper · 2026-03-04T12:14:45Z

DeepSeek-R1-0528 with TP=8 has also been tested:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.9424|±  |0.0064|
|     |       |strict-match    |     3|exact_match|↑  |0.9363|±  |0.0067|

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ChuanLi1101

Left my comment FYI.

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ChuanLi1101

LGTM, thanks for the quick turnaround.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com>

* [feat][plugin] make ATOM mla attention works for vllm Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com> * recover unrelated code * simplify attention.py code Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com> * update postions init * cleare code v1 * update scale use * fix typo * fix ruff issue * update base_attention * clear mla init * clear code * avoid copy for quant_func * simlpe code * reduce atom change * warp mla attention head_dim arg --------- Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com>

Copilot AI review requested due to automatic review settings March 4, 2026 11:49

Copilot started reviewing on behalf of XiaobingSuper March 4, 2026 11:50 View session

XiaobingSuper force-pushed the xiaobing/oot_kimi branch from 0a9f742 to 77ccd4d Compare March 4, 2026 11:52

XiaobingSuper requested review from wuhuikx and zejunchen-zejun March 4, 2026 11:53

Copilot AI reviewed Mar 4, 2026

View reviewed changes

XiaobingSuper commented Mar 4, 2026

View reviewed changes

Comment thread atom/model_ops/linear.py Outdated

XiaobingSuper commented Mar 4, 2026

View reviewed changes

Comment thread atom/model_ops/attention_mla.py Outdated

Copilot AI review requested due to automatic review settings March 4, 2026 13:00

Copilot started reviewing on behalf of XiaobingSuper March 4, 2026 13:02 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

Comment thread atom/plugin/vllm/register.py

Comment thread atom/plugin/attention_mla.py

Comment thread atom/plugin/attention_mla.py Outdated

Comment thread atom/plugin/attention.py

wuhuikx requested review from ChuanLi1101, ZhangLirong-amd, sunway513 and valarLip March 4, 2026 14:15

ZhangLirong-amd reviewed Mar 5, 2026

View reviewed changes

Comment thread atom/model_ops/paged_attention.py Outdated

ganyi1996ppo reviewed Mar 5, 2026

View reviewed changes

Comment thread atom/model_ops/attention_mla.py

ChuanLi1101 reviewed Mar 5, 2026

View reviewed changes

Comment thread atom/model_ops/attention_mla.py Outdated

Comment thread atom/plugin/attention_mla.py

Comment thread atom/model_ops/paged_attention.py Outdated

Comment thread atom/model_ops/paged_attention.py Outdated

Comment thread atom/plugin/attention_mla.py

Copilot AI review requested due to automatic review settings March 5, 2026 05:49

Copilot started reviewing on behalf of XiaobingSuper March 5, 2026 05:51 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Comment thread atom/plugin/attention_mla.py Outdated

Comment thread atom/plugin/attention_mla.py Outdated

Comment thread atom/plugin/vllm/register.py

Comment thread atom/plugin/attention.py

XiaobingSuper requested review from ChuanLi1101, ZhangLirong-amd and Copilot March 5, 2026 06:51

Copilot started reviewing on behalf of XiaobingSuper March 5, 2026 06:59 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Comment thread atom/model_ops/paged_attention.py Outdated

Comment thread atom/plugin/attention_mla.py

Comment thread atom/plugin/attention.py

Comment thread atom/model_ops/paged_attention.py Outdated

Comment thread atom/model_ops/paged_attention.py

ChuanLi1101 previously approved these changes Mar 5, 2026

View reviewed changes

XiaobingSuper dismissed ChuanLi1101’s stale review via e046d06 March 5, 2026 07:55

XiaobingSuper force-pushed the xiaobing/oot_kimi branch from f5260c0 to e046d06 Compare March 5, 2026 07:55

Copilot AI reviewed Mar 9, 2026

View reviewed changes

zejunchen-zejun previously approved these changes Mar 9, 2026

View reviewed changes

XiaobingSuper dismissed zejunchen-zejun’s stale review via 20b5a98 March 10, 2026 03:03

XiaobingSuper force-pushed the xiaobing/oot_kimi branch from 20267b3 to 20b5a98 Compare March 10, 2026 03:03

Copilot AI review requested due to automatic review settings March 10, 2026 03:05

Copilot started reviewing on behalf of XiaobingSuper March 10, 2026 03:07 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread atom/model_ops/base_attention.py

Comment thread atom/plugin/attention.py

Copilot AI review requested due to automatic review settings March 10, 2026 03:20

XiaobingSuper force-pushed the xiaobing/oot_kimi branch from 827e5da to 65a5431 Compare March 10, 2026 03:20

Copilot started reviewing on behalf of XiaobingSuper March 10, 2026 03:21 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread atom/plugin/attention.py

Comment thread atom/plugin/attention_mla.py

XiaobingSuper added 15 commits March 10, 2026 07:42

[feat][plugin] make ATOM mla attention works for vllm

138c5ba

Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com>

recover unrelated code

431a47e

simplify attention.py code

5bb10d2

Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com>

update postions init

3b2b72b

cleare code v1

1080632

update scale use

77b9686

fix typo

13997b9

fix ruff issue

d996ee3

update base_attention

031e8a5

clear mla init

58c5dcb

clear code

a065beb

avoid copy for quant_func

315b50c

simlpe code

6761dcb

reduce atom change

eff8030

warp mla attention head_dim arg

eef5957

XiaobingSuper force-pushed the xiaobing/oot_kimi branch from 65a5431 to eef5957 Compare March 10, 2026 07:42

valarLip approved these changes Mar 10, 2026

View reviewed changes

XiaobingSuper merged commit 78b1a4d into ROCm:main Mar 10, 2026
14 of 16 checks passed

Conversation

XiaobingSuper commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

XiaobingSuper commented Mar 4, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChuanLi1101 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChuanLi1101 left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

XiaobingSuper commented Mar 4, 2026 •

edited

Loading