[Speculative Decoding]Fix multistep MTP in splitewise-prefill mode #5723

freeliuzc · 2025-12-23T12:24:58Z

Motivation

在PD模式下，P只需要prefill一次，无需做 Decoder，因此强行限制 P 的 num_model_step 为1

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-12-23T12:25:08Z

Thanks for your contribution!

Copilot

Pull request overview

This PR addresses an issue with multistep Multi-Token Prediction (MTP) speculative decoding when running in splitwise-prefill mode. The fix ensures that when MTP is enabled with splitwise-prefill configuration, both num_speculative_tokens and num_model_steps are reset to 1, effectively disabling multistep behavior in prefill mode.

Key Changes:

Added configuration adjustment logic to disable multistep MTP in splitwise-prefill mode

Copilot · 2025-12-23T12:28:26Z

fastdeploy/config.py

+
+        # adjust speculative config
+        if self.speculative_config is not None and self.speculative_config.method == "mtp":
+            if self.scheduler_config.splitwise_role == "prefill:


Missing closing quote in the string literal. The condition should be "prefill" (with a closing quote) instead of "prefill:. This will cause a syntax error.

Suggested change

if self.scheduler_config.splitwise_role == "prefill:

if self.scheduler_config.splitwise_role == "prefill":

Copilot · 2025-12-23T12:28:26Z

fastdeploy/config.py

+
+        # adjust speculative config
+        if self.speculative_config is not None and self.speculative_config.method == "mtp":
+            if self.scheduler_config.splitwise_role == "prefill:
+                self.speculative_config.num_speculative_tokens = 1
+                self.speculative_config.num_model_steps = 1


The PR description is incomplete and does not adequately explain why these modifications are being made. According to the custom coding guidelines, the description should at minimum explain why these modifications are being made and what problem is being solved. Please provide more details about:

What issue occurs with multistep MTP in splitwise-prefill mode

Why setting both num_speculative_tokens and num_model_steps to 1 fixes the issue

What testing has been done to verify this fix

Copilot · 2025-12-23T12:28:27Z

fastdeploy/config.py

            else:
                # It will hang when real batch_size < tp_size
                self.graph_opt_config.filter_capture_size(tp_size=self.parallel_config.tensor_parallel_size)
+


Trailing whitespace detected. Please remove the trailing whitespace at the end of this line to maintain code consistency.

Suggested change

Copilot · 2025-12-23T12:28:27Z

fastdeploy/config.py

+        if self.speculative_config is not None and self.speculative_config.method == "mtp":
+            if self.scheduler_config.splitwise_role == "prefill:
+                self.speculative_config.num_speculative_tokens = 1
+                self.speculative_config.num_model_steps = 1


The new configuration adjustment for MTP in splitwise-prefill mode lacks test coverage. The test file tests/utils/test_config.py already contains tests for other postprocess behaviors. Consider adding a test case to verify that when speculative_config.method is "mtp" and scheduler_config.splitwise_role is "prefill", both num_speculative_tokens and num_model_steps are correctly set to 1.

Copilot · 2025-12-23T12:28:27Z

fastdeploy/config.py

+
+        # adjust speculative config
+        if self.speculative_config is not None and self.speculative_config.method == "mtp":
+            if self.scheduler_config.splitwise_role == "prefill:


The new configuration adjustment lacks an explanatory comment. Consider adding a comment explaining why MTP multistep is disabled in splitwise-prefill mode, similar to other configuration adjustments in this method (e.g., line 1785's comment about hanging when batch_size < tp_size). This would help future maintainers understand the reasoning behind this constraint.

Suggested change

if self.scheduler_config.splitwise_role == "prefill:

# In splitwise prefill mode, only the prefill node runs here and decode runs elsewhere.

# Multi-step MTP across splitwise prefill/decode is currently unsupported and can lead

# to state divergence between nodes, so we effectively disable MTP multistep by forcing

# both num_speculative_tokens and num_model_steps to 1 on the prefill side.

if self.scheduler_config.splitwise_role == "prefill":

codecov-commenter · 2025-12-23T14:20:55Z

Codecov Report

❌ Patch coverage is 25.00000% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@3aee5c4). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/config.py	25.00%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5723   +/-   ##
==========================================
  Coverage           ?   65.47%           
==========================================
  Files              ?      329           
  Lines              ?    41803           
  Branches           ?     6403           
==========================================
  Hits               ?    27371           
  Misses             ?    12398           
  Partials           ?     2034

Flag	Coverage Δ
GPU	`65.47% <25.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…#5724)

…addlePaddle#5723)

Copilot AI review requested due to automatic review settings December 23, 2025 12:24

freeliuzc had a problem deploying to Metax_ci December 23, 2025 12:25 — with GitHub Actions Failure

Copilot started reviewing on behalf of freeliuzc December 23, 2025 12:25 View session

Copilot AI reviewed Dec 23, 2025

View reviewed changes

fix mtp prefill in splitewise mode

b09d08a

freeliuzc force-pushed the merge_fix_mtp_splite_prefill branch from d3eb96b to b09d08a Compare December 23, 2025 12:57

freeliuzc temporarily deployed to Metax_ci December 23, 2025 12:57 — with GitHub Actions Inactive

yuanlehome approved these changes Dec 24, 2025

View reviewed changes

yangjianfengo1 approved these changes Dec 24, 2025

View reviewed changes

yuanlehome merged commit 2dc2ba4 into PaddlePaddle:develop Dec 24, 2025
14 of 18 checks passed

yuanlehome pushed a commit that referenced this pull request Dec 24, 2025

[Cherry-Pick][CI]Fix multistep MTP in splitewise-prefill mode (#5723) (…

b018c49

…#5724)

fxyfxy777 pushed a commit to fxyfxy777/FastDeploy that referenced this pull request Dec 24, 2025

[Speculative Decoding] Fix multistep MTP in splitewise-prefill mode (P…

32eaeae

…addlePaddle#5723)

ckl117 pushed a commit to fxyfxy777/FastDeploy that referenced this pull request Dec 29, 2025

[Speculative Decoding] Fix multistep MTP in splitewise-prefill mode (P…

e204920

…addlePaddle#5723)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Speculative Decoding]Fix multistep MTP in splitewise-prefill mode #5723

[Speculative Decoding]Fix multistep MTP in splitewise-prefill mode #5723

Uh oh!

freeliuzc commented Dec 23, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Dec 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 23, 2025

Uh oh!

Copilot AI Dec 23, 2025

Uh oh!

Copilot AI Dec 23, 2025

Uh oh!

Copilot AI Dec 23, 2025

Uh oh!

Copilot AI Dec 23, 2025

Uh oh!

codecov-commenter commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if self.scheduler_config.splitwise_role == "prefill:
	if self.scheduler_config.splitwise_role == "prefill":

-            if self.scheduler_config.splitwise_role == "prefill:
+            # In splitwise prefill mode, only the prefill node runs here and decode runs elsewhere.
+            # Multi-step MTP across splitwise prefill/decode is currently unsupported and can lead
+            # to state divergence between nodes, so we effectively disable MTP multistep by forcing
+            # both num_speculative_tokens and num_model_steps to 1 on the prefill side.
+            if self.scheduler_config.splitwise_role == "prefill":

[Speculative Decoding]Fix multistep MTP in splitewise-prefill mode #5723

[Speculative Decoding]Fix multistep MTP in splitewise-prefill mode #5723

Uh oh!

Conversation

freeliuzc commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Dec 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Dec 23, 2025

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

freeliuzc commented Dec 23, 2025 •

edited

Loading