[Feat][Plugin] Enable MTP for vLLM Plugin by whx-sjtu · Pull Request #557 · ROCm/ATOM

whx-sjtu · 2026-04-14T06:44:19Z

Motivation

This PR enables MTP feature for running DeepSeekV3 and GLM5 with vLLM+atom.

Technical Details

atom_config related bugfix.
Fix wrong full_cls_name of different MLA sparse attention backends.
Register model architecture and model class for DeepSeek V3 and GLM5 MTP.
Add index_buffer for DeepseekMTP.
Adapt full graph of main model with mtp enabled.

Test Plan

Comming soon.

Test Result

zai-org/GLM-5.1-FP8

Accuracy test commands:

lm_eval --model local-completions \
        --model_args model=/home/models/GLM-5.1-FP8,base_url=http://localhost:8000/v1/completions,num_concurrent=64,max_retries=3 \
        --tasks gsm8k \
        --num_fewshot 20

Accuracy test result with mtp=3:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|    20|exact_match|↑  |0.9454|±  |0.0063|
|     |       |strict-match    |    20|exact_match|↑  |0.9462|±  |0.0062|

deepseek-ai/DeepSeek-R1-0528

Accuracy test commands:

lm_eval --model local-completions \
        --model_args model=/home/models/DeepSeek-R1-0528,base_url=http://localhost:8000/v1/completions,num_concurrent=16,max_retries=3,tokenized_requests=False \
        --tasks gsm8k \
        --num_fewshot 3

Accuracy test result with mtp=3:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.9492|±  |0.0060|
|     |       |strict-match    |     3|exact_match|↑  |0.9469|±  |0.0062|

Submission Checklist

This PR is based on PR [Feat][Plugin] Enable Sparse MLA and GLM-5 for vLLM-ATOM #399 and should be merged after that.

wuhuikx · 2026-04-15T06:02:59Z

Could you please help attach the accuracy test results on gms8k? Do we support MTP=1 or MTP=1/2/3? How about the acceptance ratio?

wuhuikx · 2026-04-15T09:17:38Z

I will turn this PR to draft and go through CI after the code review is done.

whx-sjtu · 2026-04-15T09:40:09Z

Could you please help attach the accuracy test results on gms8k? Do we support MTP=1 or MTP=1/2/3? How about the acceptance ratio?

Sure I will attach the acc results later. Now we support MTP=1/2/3, but the acceptance rate is low (about 20% for first draft token and 0 for other tokens) and I'm working on it.

zejunchen-zejun · 2026-04-22T11:02:08Z

+        self.model_arch = model_arch
+        # if self.forced_model_arch is not None:
+        #     model_arch = self.forced_model_arch
+        #     logger.info(f"Using forced model arch: {model_arch} for vLLM plugin mode")


should be removed?

zejunchen-zejun · 2026-04-22T11:50:49Z

+    # can coexist in one process. Resolve per-forward config first to avoid
+    # reading a stale global singleton.
+    if not is_vllm():
+        return None


here should have an assertion to avoid non-vllm backend calling this method
it should only be called by atom-vllm backend

zejunchen-zejun · 2026-04-22T12:02:39Z

+
+
 def get_current_atom_config() -> Config:
+    forward_atom_config = _get_current_atom_config_from_vllm_forward_context()


Here maybe a little bit risky. If the forward_atom_config is None, and there is no assertion, it will silent fallback to the global singleton _current_atom_config. Can we add some log here? Or make it more safe
In ideal situation, the lifecycle and ownership forward_atom_config is belong to the model itself and the main model will get its atom config, draft model will get its config. While if draft model cannot get its config, it will fallback to the _current_atom_config, which not be correct

Do you mean that we should always obtain forward_atom_config from vllm_forward_context? Will there be scenarios that we need to return the default global _current_atom_config?

Let us add some comments here to warn a case: there is no atom config in forward context, so the default global config will be provided. With this warning, we can mitigate the coherent issue for local value and its twins global value.

zejunchen-zejun · 2026-04-22T12:10:14Z

+        position_offset = getattr(self.model, "vllm_draft_position_offset", 0)
+        if position_offset == 0:
+            return positions
+        return positions + position_offset


can we leave some comments here for the position offset

not needed anymore. removed.

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

whx-sjtu marked this pull request as ready for review April 14, 2026 14:52

whx-sjtu requested review from ganyi1996ppo, wuhuikx and zejunchen-zejun April 14, 2026 14:53

whx-sjtu changed the title ~~[Feat][Plugin] Enable spec decoding for GLM5 in atom (vLLM Plugin)~~ [Feat][Plugin] Enable spec decoding for GLM5 (vLLM Plugin) Apr 14, 2026

whx-sjtu force-pushed the whx-sjtu/atom-support-vllm-glm5-mtp branch from 17446a6 to 9015568 Compare April 15, 2026 03:37

wuhuikx marked this pull request as draft April 15, 2026 09:16

whx-sjtu mentioned this pull request Apr 16, 2026

[Feature] Support GLM-5 MTP for vLLM Pluggin. #544

Closed

whx-sjtu changed the title ~~[Feat][Plugin] Enable spec decoding for GLM5 (vLLM Plugin)~~ [Feat][Plugin] Enable spec decoding for vLLM Plugin Apr 17, 2026

whx-sjtu changed the title ~~[Feat][Plugin] Enable spec decoding for vLLM Plugin~~ [Feat][Plugin] Enable MTP for vLLM Plugin Apr 21, 2026

zejunchen-zejun reviewed Apr 22, 2026

View reviewed changes

wuhuikx marked this pull request as ready for review April 22, 2026 12:39

wuhuikx marked this pull request as draft April 22, 2026 12:40

ganyi1996ppo previously approved these changes Apr 23, 2026

View reviewed changes

whx-sjtu added 8 commits April 23, 2026 10:49

adapt mtp for glm5 (vllm plugin)

922aa8e

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

add patch to support mtp>1

3b82d15

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

fix model load failure of draft model

3f7d3d4

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

adapt full graph with mtp enabled

4c9c960

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

fix MLA MTP acceptance issue

75c46e6

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

fall back to vllm-style mtp position

ca42e27

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

fix embedding sharing failure for mtp

a7f6918

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

fix lint

90aa06b

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

whx-sjtu force-pushed the whx-sjtu/atom-support-vllm-glm5-mtp branch from 2c1db99 to 90aa06b Compare April 23, 2026 10:49

fix comment

4a663a4

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

whx-sjtu dismissed ganyi1996ppo’s stale review via 4a663a4 April 23, 2026 11:55

remove warnig log

9050626

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat][Plugin] Enable MTP for vLLM Plugin#557

[Feat][Plugin] Enable MTP for vLLM Plugin#557
whx-sjtu wants to merge 10 commits intomainfrom
whx-sjtu/atom-support-vllm-glm5-mtp

whx-sjtu commented Apr 14, 2026 •

edited

Loading

Uh oh!

wuhuikx commented Apr 15, 2026

Uh oh!

wuhuikx commented Apr 15, 2026

Uh oh!

whx-sjtu commented Apr 15, 2026

Uh oh!

zejunchen-zejun Apr 22, 2026

Uh oh!

whx-sjtu Apr 23, 2026

Uh oh!

zejunchen-zejun Apr 22, 2026

Uh oh!

whx-sjtu Apr 23, 2026

Uh oh!

zejunchen-zejun Apr 22, 2026

Uh oh!

whx-sjtu Apr 23, 2026

Uh oh!

zejunchen-zejun Apr 23, 2026

Uh oh!

zejunchen-zejun Apr 22, 2026

Uh oh!

whx-sjtu Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		def get_current_atom_config() -> Config:
		forward_atom_config = _get_current_atom_config_from_vllm_forward_context()

Conversation

whx-sjtu commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

wuhuikx commented Apr 15, 2026

Uh oh!

wuhuikx commented Apr 15, 2026

Uh oh!

whx-sjtu commented Apr 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

whx-sjtu commented Apr 14, 2026 •

edited

Loading