[plugin][OOT] make OOT MLA work with vLLM 0.17#304
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates ATOM’s vLLM plugin-mode integration for vLLM 0.17 compatibility and adds MLA-related patches/buffer handling to support a persistent-kernel style decode path.
Changes:
- Update vLLM platform integration to match vLLM 0.17 attention-backend selection signature changes.
- Introduce a dedicated MLA patch module to hook vLLM’s
MLAAttentioninitialization / weight-loading / forward behavior in plugin mode. - Extend MLA plugin-mode attention/metadata flow to provide persistent decode metadata buffers and output-buffer writing for v_up projection.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| atom/plugin/vllm/register.py | Wires in MLA patching during model registration and keeps attention weight-loading patching. |
| atom/plugin/vllm/platform.py | Updates get_attn_backend_cls override signature for vLLM 0.17. |
| atom/plugin/vllm/mla_patch.py | New module implementing vLLM MLAAttention monkey patches for plugin mode. |
| atom/plugin/attention_mla.py | Adjusts MLA plugin-mode v_up projection to write into a provided output buffer; adds do_kv_cache_update stub. |
| atom/plugin/attention.py | Adds persistent MLA metadata buffers and passes them through AttentionMetaData; updates env var access. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
916fffc to
ca4b8c4
Compare
ca4b8c4 to
aa278a2
Compare
aa278a2 to
39c7099
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2470275 to
9de25e1
Compare
This PR is to make plugin mode works for vllm 0.17 and also add the mla persistent kernel.
I test this PR for kimi-2 and deepseek, the accuracy are: