-
Notifications
You must be signed in to change notification settings - Fork 692
[Speculative Decoding]Fix multistep MTP in splitewise-prefill mode #5723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Speculative Decoding]Fix multistep MTP in splitewise-prefill mode #5723
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses an issue with multistep Multi-Token Prediction (MTP) speculative decoding when running in splitwise-prefill mode. The fix ensures that when MTP is enabled with splitwise-prefill configuration, both num_speculative_tokens and num_model_steps are reset to 1, effectively disabling multistep behavior in prefill mode.
Key Changes:
- Added configuration adjustment logic to disable multistep MTP in splitwise-prefill mode
fastdeploy/config.py
Outdated
|
|
||
| # adjust speculative config | ||
| if self.speculative_config is not None and self.speculative_config.method == "mtp": | ||
| if self.scheduler_config.splitwise_role == "prefill: |
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing closing quote in the string literal. The condition should be "prefill" (with a closing quote) instead of "prefill:. This will cause a syntax error.
| if self.scheduler_config.splitwise_role == "prefill: | |
| if self.scheduler_config.splitwise_role == "prefill": |
fastdeploy/config.py
Outdated
|
|
||
| # adjust speculative config | ||
| if self.speculative_config is not None and self.speculative_config.method == "mtp": | ||
| if self.scheduler_config.splitwise_role == "prefill: | ||
| self.speculative_config.num_speculative_tokens = 1 | ||
| self.speculative_config.num_model_steps = 1 |
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description is incomplete and does not adequately explain why these modifications are being made. According to the custom coding guidelines, the description should at minimum explain why these modifications are being made and what problem is being solved. Please provide more details about:
- What issue occurs with multistep MTP in splitwise-prefill mode
- Why setting both num_speculative_tokens and num_model_steps to 1 fixes the issue
- What testing has been done to verify this fix
fastdeploy/config.py
Outdated
| else: | ||
| # It will hang when real batch_size < tp_size | ||
| self.graph_opt_config.filter_capture_size(tp_size=self.parallel_config.tensor_parallel_size) | ||
|
|
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace detected. Please remove the trailing whitespace at the end of this line to maintain code consistency.
| if self.speculative_config is not None and self.speculative_config.method == "mtp": | ||
| if self.scheduler_config.splitwise_role == "prefill: | ||
| self.speculative_config.num_speculative_tokens = 1 | ||
| self.speculative_config.num_model_steps = 1 |
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new configuration adjustment for MTP in splitwise-prefill mode lacks test coverage. The test file tests/utils/test_config.py already contains tests for other postprocess behaviors. Consider adding a test case to verify that when speculative_config.method is "mtp" and scheduler_config.splitwise_role is "prefill", both num_speculative_tokens and num_model_steps are correctly set to 1.
fastdeploy/config.py
Outdated
|
|
||
| # adjust speculative config | ||
| if self.speculative_config is not None and self.speculative_config.method == "mtp": | ||
| if self.scheduler_config.splitwise_role == "prefill: |
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new configuration adjustment lacks an explanatory comment. Consider adding a comment explaining why MTP multistep is disabled in splitwise-prefill mode, similar to other configuration adjustments in this method (e.g., line 1785's comment about hanging when batch_size < tp_size). This would help future maintainers understand the reasoning behind this constraint.
| if self.scheduler_config.splitwise_role == "prefill: | |
| # In splitwise prefill mode, only the prefill node runs here and decode runs elsewhere. | |
| # Multi-step MTP across splitwise prefill/decode is currently unsupported and can lead | |
| # to state divergence between nodes, so we effectively disable MTP multistep by forcing | |
| # both num_speculative_tokens and num_model_steps to 1 on the prefill side. | |
| if self.scheduler_config.splitwise_role == "prefill": |
d3eb96b to
b09d08a
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5723 +/- ##
==========================================
Coverage ? 65.47%
==========================================
Files ? 329
Lines ? 41803
Branches ? 6403
==========================================
Hits ? 27371
Misses ? 12398
Partials ? 2034
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.