[plugin][OOT] make OOT MLA work with vLLM 0.17 by XiaobingSuper · Pull Request #304 · ROCm/ATOM

XiaobingSuper · 2026-03-11T07:32:30Z

This PR is to make plugin mode works for vllm 0.17 and also add the mla persistent kernel.

I test this PR for kimi-2 and deepseek, the accuracy are:

kimi-2 fp4 with TP4:

deepseek-r1-fp8 with TP8:

deepseek-r1-fp4 with TP8:

Copilot

Pull request overview

This PR updates ATOM’s vLLM plugin-mode integration for vLLM 0.17 compatibility and adds MLA-related patches/buffer handling to support a persistent-kernel style decode path.

Changes:

Update vLLM platform integration to match vLLM 0.17 attention-backend selection signature changes.
Introduce a dedicated MLA patch module to hook vLLM’s MLAAttention initialization / weight-loading / forward behavior in plugin mode.
Extend MLA plugin-mode attention/metadata flow to provide persistent decode metadata buffers and output-buffer writing for v_up projection.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
atom/plugin/vllm/register.py	Wires in MLA patching during model registration and keeps attention weight-loading patching.
atom/plugin/vllm/platform.py	Updates `get_attn_backend_cls` override signature for vLLM 0.17.
atom/plugin/vllm/mla_patch.py	New module implementing vLLM `MLAAttention` monkey patches for plugin mode.
atom/plugin/attention_mla.py	Adjusts MLA plugin-mode v_up projection to write into a provided output buffer; adds `do_kv_cache_update` stub.
atom/plugin/attention.py	Adds persistent MLA metadata buffers and passes them through `AttentionMetaData`; updates env var access.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

zejunchen-zejun

LGTM for now

Copilot AI review requested due to automatic review settings March 11, 2026 07:32

XiaobingSuper changed the title ~~plugin： works for vllm 0.17~~ plugin: works for vllm 0.17 Mar 11, 2026

Copilot started reviewing on behalf of XiaobingSuper March 11, 2026 07:33 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Comment thread atom/plugin/vllm/register.py Outdated

Comment thread atom/plugin/vllm/mla_patch.py

Comment thread atom/plugin/attention.py

Comment thread atom/plugin/vllm/mla_patch.py Outdated

XiaobingSuper force-pushed the xiaobing/vllm_0.17 branch from 916fffc to ca4b8c4 Compare March 11, 2026 07:39

Copilot AI review requested due to automatic review settings March 11, 2026 07:41

XiaobingSuper force-pushed the xiaobing/vllm_0.17 branch from ca4b8c4 to aa278a2 Compare March 11, 2026 07:41

Copilot started reviewing on behalf of XiaobingSuper March 11, 2026 07:42 View session

XiaobingSuper force-pushed the xiaobing/vllm_0.17 branch from aa278a2 to 39c7099 Compare March 11, 2026 07:44

XiaobingSuper requested a review from zejunchen-zejun March 11, 2026 07:44

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Comment thread atom/plugin/vllm/mla_patch.py

Comment thread atom/plugin/attention.py

Comment thread atom/plugin/attention.py

Comment thread atom/plugin/attention.py

zejunchen-zejun reviewed Mar 11, 2026

View reviewed changes

Comment thread atom/plugin/attention_mla.py

zejunchen-zejun reviewed Mar 11, 2026

View reviewed changes

Comment thread atom/plugin/attention.py Outdated

wuhuikx reviewed Mar 11, 2026

View reviewed changes

Comment thread atom/plugin/vllm/mla_patch.py

Comment thread atom/plugin/vllm/mla_patch.py

wuhuikx reviewed Mar 11, 2026

View reviewed changes

Comment thread atom/plugin/vllm/mla_patch.py Outdated

zejunchen-zejun reviewed Mar 11, 2026

View reviewed changes

Comment thread atom/plugin/attention.py Outdated

XiaobingSuper requested review from Copilot, wuhuikx and zejunchen-zejun March 11, 2026 08:58

wuhuikx requested review from ChuanLi1101 and removed request for Copilot March 11, 2026 09:02

Copilot started reviewing on behalf of XiaobingSuper March 11, 2026 09:02 View session

wuhuikx requested review from ZhangLirong-amd and valarLip March 11, 2026 09:02

Copilot AI review requested due to automatic review settings March 11, 2026 09:13

Copilot started reviewing on behalf of XiaobingSuper March 11, 2026 09:14 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Comment thread atom/plugin/attention.py

Comment thread atom/plugin/attention.py

XiaobingSuper added 3 commits March 11, 2026 09:25

works for vllm 0.17

d049da8

code format

57a0c86

suport mla persistent kernel

e5d0102

XiaobingSuper added 7 commits March 11, 2026 09:25

remove copy

518eacd

code format

3681935

clear code

289addf

add comments

94dc08e

code format

20bf390

fix import issue

2446eea

update attention

9de25e1

XiaobingSuper force-pushed the xiaobing/vllm_0.17 branch from 2470275 to 9de25e1 Compare March 11, 2026 09:26

zejunchen-zejun changed the title ~~plugin: works for vllm 0.17~~ [plugin][OOT] make OOT MLA work with vLLM 0.17 Mar 11, 2026

zejunchen-zejun approved these changes Mar 11, 2026

View reviewed changes

wuhuikx approved these changes Mar 11, 2026

View reviewed changes

valarLip approved these changes Mar 11, 2026

View reviewed changes

XiaobingSuper merged commit 3a3e7ef into ROCm:main Mar 11, 2026
29 of 30 checks passed

Jasen2201 pushed a commit to Jasen2201/ATOM that referenced this pull request Apr 10, 2026

[plugin][OOT] make OOT MLA work with vLLM 0.17 (ROCm#304)

578915a

peizhang56 mentioned this pull request Apr 20, 2026

mla: drop max_split_per_batch=16 cap to match vLLM #611

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[plugin][OOT] make OOT MLA work with vLLM 0.17#304

[plugin][OOT] make OOT MLA work with vLLM 0.17#304
XiaobingSuper merged 10 commits intoROCm:mainfrom
XiaobingSuper:xiaobing/vllm_0.17

XiaobingSuper commented Mar 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

zejunchen-zejun left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

XiaobingSuper commented Mar 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

zejunchen-zejun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants