-
Notifications
You must be signed in to change notification settings - Fork 693
[BugFix] fix speculate_limit_thinking_content_length #5590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] fix speculate_limit_thinking_content_length #5590
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR removes an unused seq_lens_decoder parameter from the speculate_limit_thinking_content_length family of functions, which are used to limit thinking content length during speculative decoding.
Key Changes:
- Removed
seq_lens_decoderparameter from v1 and v2 CUDA kernel functions and their wrappers - Updated all test cases to remove the unused parameter
- Removed redundant sequence length adjustment logic (now only
step_idxis adjusted)
Critical Issue Found: The function call in post_process_specualate is missing required parameters after the removal.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
tests/operators/test_speculate_limit_thinking_content_length.py |
Removed seq_lens_decoder parameter from all test cases for both v1 and v2 functions; updated comments to reflect removal |
fastdeploy/model_executor/pre_and_post_process.py |
Removed seq_lens_decoder parameter from function signatures and internal calls; bug introduced in call at line 444 |
custom_ops/gpu_ops/speculate_decoding/speculate_limit_thinking_content_length_v1.cu |
Removed seq_lens_decoder from kernel, wrapper function, and operator input declaration; removed redundant adjustment logic |
custom_ops/gpu_ops/speculate_decoding/speculate_limit_thinking_content_length_v2.cu |
Removed seq_lens_decoder from kernel, wrapper function, and operator input declaration; removed redundant adjustment logic |
Note on PR Description: The PR description is incomplete. The "Motivation" and "Modifications" sections are empty. Please add:
- Motivation: Explain that
seq_lens_decoderwas redundant because it was being updated in the same way asstep_idx, and the actual sequence length update happens in thespeculate_updatefunction that is called after this function. - Modifications: Detail that this PR removes the unused
seq_lens_decoderparameter from thespeculate_limit_thinking_content_length_v1andv2functions across CUDA kernels, Python wrappers, and test files.
| accept_num=share_inputs["accept_num"], | ||
| seq_lens_decoder=share_inputs["seq_lens_decoder"], | ||
| think_end_id=think_end_id, | ||
| line_break_id=line_break_id, |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing required parameters in function call. The function speculate_limit_thinking_content_length requires stop_flags and eos_token_ids parameters (as seen in the function signature on line 147-148), but they are not provided in this call. After removing seq_lens_decoder, you should add:
stop_flags=share_inputs["stop_flags"],after line 450eos_token_ids=share_inputs["eos_token_ids"],after stop_flags (only needed for the "" strategy, but the function signature requires it)
This will cause a runtime error when this function is called.
| line_break_id=line_break_id, | |
| line_break_id=line_break_id, | |
| stop_flags=share_inputs["stop_flags"], | |
| eos_token_ids=share_inputs["eos_token_ids"], |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #5590 +/- ##
==========================================
Coverage ? 61.93%
==========================================
Files ? 328
Lines ? 41134
Branches ? 6270
==========================================
Hits ? 25476
Misses ? 13743
Partials ? 1915
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
… (#5591) * fix bug * fix bug --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.