Skip to content

Conversation

@freeliuzc
Copy link
Collaborator

@freeliuzc freeliuzc commented Dec 23, 2025

Motivation

  1. 在PD模式下,P只需要prefill一次,无需做 Decoder,因此强行限制 P 的 num_model_step 为1

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copilot AI review requested due to automatic review settings December 23, 2025 12:24
@paddle-bot
Copy link

paddle-bot bot commented Dec 23, 2025

Thanks for your contribution!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses an issue with multistep Multi-Token Prediction (MTP) speculative decoding when running in splitwise-prefill mode. The fix ensures that when MTP is enabled with splitwise-prefill configuration, both num_speculative_tokens and num_model_steps are reset to 1, effectively disabling multistep behavior in prefill mode.

Key Changes:

  • Added configuration adjustment logic to disable multistep MTP in splitwise-prefill mode


# adjust speculative config
if self.speculative_config is not None and self.speculative_config.method == "mtp":
if self.scheduler_config.splitwise_role == "prefill:
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing closing quote in the string literal. The condition should be "prefill" (with a closing quote) instead of "prefill:. This will cause a syntax error.

Suggested change
if self.scheduler_config.splitwise_role == "prefill:
if self.scheduler_config.splitwise_role == "prefill":

Copilot uses AI. Check for mistakes.
Comment on lines 1787 to 1792

# adjust speculative config
if self.speculative_config is not None and self.speculative_config.method == "mtp":
if self.scheduler_config.splitwise_role == "prefill:
self.speculative_config.num_speculative_tokens = 1
self.speculative_config.num_model_steps = 1
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description is incomplete and does not adequately explain why these modifications are being made. According to the custom coding guidelines, the description should at minimum explain why these modifications are being made and what problem is being solved. Please provide more details about:

  1. What issue occurs with multistep MTP in splitwise-prefill mode
  2. Why setting both num_speculative_tokens and num_model_steps to 1 fixes the issue
  3. What testing has been done to verify this fix

Copilot generated this review using guidance from repository custom instructions.
else:
# It will hang when real batch_size < tp_size
self.graph_opt_config.filter_capture_size(tp_size=self.parallel_config.tensor_parallel_size)

Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace detected. Please remove the trailing whitespace at the end of this line to maintain code consistency.

Suggested change

Copilot uses AI. Check for mistakes.
Comment on lines 1789 to 1792
if self.speculative_config is not None and self.speculative_config.method == "mtp":
if self.scheduler_config.splitwise_role == "prefill:
self.speculative_config.num_speculative_tokens = 1
self.speculative_config.num_model_steps = 1
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new configuration adjustment for MTP in splitwise-prefill mode lacks test coverage. The test file tests/utils/test_config.py already contains tests for other postprocess behaviors. Consider adding a test case to verify that when speculative_config.method is "mtp" and scheduler_config.splitwise_role is "prefill", both num_speculative_tokens and num_model_steps are correctly set to 1.

Copilot uses AI. Check for mistakes.

# adjust speculative config
if self.speculative_config is not None and self.speculative_config.method == "mtp":
if self.scheduler_config.splitwise_role == "prefill:
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new configuration adjustment lacks an explanatory comment. Consider adding a comment explaining why MTP multistep is disabled in splitwise-prefill mode, similar to other configuration adjustments in this method (e.g., line 1785's comment about hanging when batch_size < tp_size). This would help future maintainers understand the reasoning behind this constraint.

Suggested change
if self.scheduler_config.splitwise_role == "prefill:
# In splitwise prefill mode, only the prefill node runs here and decode runs elsewhere.
# Multi-step MTP across splitwise prefill/decode is currently unsupported and can lead
# to state divergence between nodes, so we effectively disable MTP multistep by forcing
# both num_speculative_tokens and num_model_steps to 1 on the prefill side.
if self.scheduler_config.splitwise_role == "prefill":

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 25.00000% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@3aee5c4). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/config.py 25.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5723   +/-   ##
==========================================
  Coverage           ?   65.47%           
==========================================
  Files              ?      329           
  Lines              ?    41803           
  Branches           ?     6403           
==========================================
  Hits               ?    27371           
  Misses             ?    12398           
  Partials           ?     2034           
Flag Coverage Δ
GPU 65.47% <25.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yuanlehome yuanlehome merged commit 2dc2ba4 into PaddlePaddle:develop Dec 24, 2025
14 of 18 checks passed
fxyfxy777 pushed a commit to fxyfxy777/FastDeploy that referenced this pull request Dec 24, 2025
ckl117 pushed a commit to fxyfxy777/FastDeploy that referenced this pull request Dec 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants