[BugFix] reschedule_preempt_task append waiting & PREEMPTED blocksize#5506
[BugFix] reschedule_preempt_task append waiting & PREEMPTED blocksize#5506Jiang-Jia-Jun merged 6 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5506 +/- ##
==========================================
Coverage ? 60.74%
==========================================
Files ? 329
Lines ? 41142
Branches ? 6271
==========================================
Hits ? 24992
Misses ? 14259
Partials ? 1891
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug in the request rescheduling logic where preempted requests could be repeatedly rescheduled due to under-allocation of blocks. The fix tracks the block size at preemption time and ensures that rescheduled requests receive at least as many blocks as they had before preemption.
Key Changes:
- Records
last_preempted_blocksizewhen a request is preempted to remember the previous block allocation - Adds logic to ensure rescheduled requests receive at least the same number of blocks as before to prevent repeated preemption cycles
| # If num_new_block is less than the last preempted block size, use the last preempted block size instead. | ||
| # For normal requests, when allocating blocks, we reserve two extra blocks for decoding. | ||
| # In the request rescheduling scenario, we currently only consider the number of tokens already generated, | ||
| # which might lead to allocating fewer blocks than the previous allocation, causing repeated rescheduling. | ||
| # This adjustment ensures we at least allocate as many blocks as before to avoid this issue. | ||
| if num_new_block < request.last_preempted_blocksize: | ||
| num_new_block = request.last_preempted_blocksize |
There was a problem hiding this comment.
The PR description lacks essential information about the motivation and the specific problem being solved. While there is an image reference in the description, there is no text explanation of:
- What bug is being fixed
- Why these modifications are necessary
- What problem scenario triggers the repeated rescheduling issue
Please provide a detailed description explaining the bug scenario, reproduction steps, and how these changes resolve the issue.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.