Skip to content

[plugin][OOT] make OOT MLA work with vLLM 0.17#304

Merged
XiaobingSuper merged 10 commits intoROCm:mainfrom
XiaobingSuper:xiaobing/vllm_0.17
Mar 11, 2026
Merged

[plugin][OOT] make OOT MLA work with vLLM 0.17#304
XiaobingSuper merged 10 commits intoROCm:mainfrom
XiaobingSuper:xiaobing/vllm_0.17

Conversation

@XiaobingSuper
Copy link
Copy Markdown
Contributor

This PR is to make plugin mode works for vllm 0.17 and also add the mla persistent kernel.

I test this PR for kimi-2 and deepseek, the accuracy are:

  1. kimi-2 fp4 with TP4:
image
  1. deepseek-r1-fp8 with TP8:
image
  1. deepseek-r1-fp4 with TP8:
image

Copilot AI review requested due to automatic review settings March 11, 2026 07:32
@XiaobingSuper XiaobingSuper changed the title plugin: works for vllm 0.17 plugin: works for vllm 0.17 Mar 11, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates ATOM’s vLLM plugin-mode integration for vLLM 0.17 compatibility and adds MLA-related patches/buffer handling to support a persistent-kernel style decode path.

Changes:

  • Update vLLM platform integration to match vLLM 0.17 attention-backend selection signature changes.
  • Introduce a dedicated MLA patch module to hook vLLM’s MLAAttention initialization / weight-loading / forward behavior in plugin mode.
  • Extend MLA plugin-mode attention/metadata flow to provide persistent decode metadata buffers and output-buffer writing for v_up projection.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
atom/plugin/vllm/register.py Wires in MLA patching during model registration and keeps attention weight-loading patching.
atom/plugin/vllm/platform.py Updates get_attn_backend_cls override signature for vLLM 0.17.
atom/plugin/vllm/mla_patch.py New module implementing vLLM MLAAttention monkey patches for plugin mode.
atom/plugin/attention_mla.py Adjusts MLA plugin-mode v_up projection to write into a provided output buffer; adds do_kv_cache_update stub.
atom/plugin/attention.py Adds persistent MLA metadata buffers and passes them through AttentionMetaData; updates env var access.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread atom/plugin/vllm/register.py Outdated
Comment thread atom/plugin/vllm/mla_patch.py
Comment thread atom/plugin/attention.py
Comment thread atom/plugin/vllm/mla_patch.py Outdated
Copilot AI review requested due to automatic review settings March 11, 2026 07:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread atom/plugin/vllm/mla_patch.py
Comment thread atom/plugin/attention.py
Comment thread atom/plugin/attention.py
Comment thread atom/plugin/attention.py
Comment thread atom/plugin/attention_mla.py
Comment thread atom/plugin/attention.py Outdated
Comment thread atom/plugin/vllm/mla_patch.py
Comment thread atom/plugin/vllm/mla_patch.py
Comment thread atom/plugin/vllm/mla_patch.py Outdated
Comment thread atom/plugin/attention.py Outdated
@wuhuikx wuhuikx requested review from ChuanLi1101 and removed request for Copilot March 11, 2026 09:02
Copilot AI review requested due to automatic review settings March 11, 2026 09:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread atom/plugin/attention.py
Comment thread atom/plugin/attention.py
@zejunchen-zejun zejunchen-zejun changed the title plugin: works for vllm 0.17 [plugin][OOT] make OOT MLA work with vLLM 0.17 Mar 11, 2026
Copy link
Copy Markdown
Collaborator

@zejunchen-zejun zejunchen-zejun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for now

@XiaobingSuper XiaobingSuper merged commit 3a3e7ef into ROCm:main Mar 11, 2026
29 of 30 checks passed
Jasen2201 pushed a commit to Jasen2201/ATOM that referenced this pull request Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants