perf: add is_reasoning_end_streaming() override to GptOssReasoningParser#4
Open
fergusfinn wants to merge 111 commits into
Open
perf: add is_reasoning_end_streaming() override to GptOssReasoningParser#4fergusfinn wants to merge 111 commits into
fergusfinn wants to merge 111 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 99ff69837d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
76ac447 to
f374ce9
Compare
f374ce9 to
01b79fe
Compare
Override is_reasoning_end_streaming() in GptOssReasoningParser to window the backward scan to the last ~23 + len(delta_ids) tokens instead of scanning the entire sequence. This reduces per-step cost from O(n) to O(1), eliminating the O(n²) total cost over a generation. Including len(delta_ids) in the window ensures correctness under speculative decoding where a single step can accept many tokens. Signed-off-by: Fergus <fergus.barratt00@gmail.com>
06cacc8 to
3ae6795
Compare
The base class broadened delta_ids from Sequence to Iterable in vllm-project#33593, and the call site now passes itertools.islice. Materialize to tuple before calling len(). Signed-off-by: fergus barratt <fergus.barratt00@gmail.com>
3ae6795 to
293e353
Compare
…duling (vllm-project#38726) Signed-off-by: Jing Wang <jingwang96@qq.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…nd named function (vllm-project#39870) Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: sfeng33 <4florafeng@gmail.com> Co-authored-by: sfeng33 <4florafeng@gmail.com>
…ct#39291) Signed-off-by: allgather <all2allops@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
…mdhip64 (vllm-project#39978) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
…t64 overflow (vllm-project#39953) Signed-off-by: aditi <aditi.rana@amd.com>
vllm-project#40171) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>
…quence_group (vllm-project#40175) Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>
…m-project#39844) Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: milesial <milesial@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…-project#39845) Signed-off-by: Ziying Tao <tzzying@outlook.com>
…llm-project#38405) Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com> Signed-off-by: Nithin Chalapathi <nithinc@berkeley.edu> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…llm-project#40189) Signed-off-by: z1ying <tzzying@outlook.com>
…ject#40167) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Theresa Shan <Theresa.Shan@amd.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… on-disk model_type differs (vllm-project#39554) Signed-off-by: Misa <misaAle@users.noreply.github.com> Signed-off-by: Misael Casarez <misacasa@amazon.com> Co-authored-by: Misael Casarez <misacasa@amazon.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…ence (vllm-project#40411) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
…project#40355) Signed-off-by: shen-shanshan <467638484@qq.com>
…orkserver prewarm (vllm-project#40331) Signed-off-by: simon-mo <simon@inferact.ai>
…9100) Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: lesj0610 <lesj0610@gmail.com>
…ples (vllm-project#40335) Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Hang Yang <hangy@amd.com>
…ompt` (vllm-project#40339) Signed-off-by: Alchuang22-dev <2584829494@qq.com>
…etch + forkserver prewarm" (vllm-project#40438) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
…llm-project#39391) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
…njection (vllm-project#39502) Signed-off-by: Krish Hung <krishung5@gmail.com> Signed-off-by: krishung5 <krish@nvidia.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…37861) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: Artem Spector <artems@il.ibm.com> Signed-off-by: artemspector <artems@il.ibm.com> Co-authored-by: artemspector <artems@il.ibm.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…nch serve (vllm-project#40288) Signed-off-by: talora <talora@nvidia.com> Signed-off-by: Talor Abramovich <talor19@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Yusuf <yusufmohammad@live.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
vllm-project#37114) Signed-off-by: Hollow Man <hollowman@opensuse.org>
…oject#39887) Signed-off-by: zengxian <xiangdong.zeng@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
…UDA graph video inference (vllm-project#40445) Signed-off-by: shen-shanshan <467638484@qq.com>
…lative decoding is enabled (vllm-project#40454) Signed-off-by: Roi Koren <roik@nvidia.com>
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>
…oject#40467) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…llm-project#40276) Co-authored-by: Roger Wang <hey@rogerw.io>
…_sampled_tokens and draft_tokens (vllm-project#39833) Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
is_reasoning_end_streaming()inGptOssReasoningParserto window the backward scan to the last ~23 tokens instead of scanning the entire sequenceApproach
The base class
is_reasoning_end_streaming(input_ids, delta_ids)defaults to callingis_reasoning_end(input_ids), which scans backward through the full token sequence. Other parsers (Step3, BaseThinking) override this to only checkdelta_ids(O(1)), but GptOss can't do a simple delta check because its end pattern (<|channel|>final ... <|message|>) spans multiple tokens with a variable gap.The override windows the search to the last
prefix_len + max_gap + suffix_lentokens (~23 for gpt-oss). The reasoning end pattern is always at the tail of the sequence (we just generated those tokens), so looking further back is unnecessary. Theeom_token_idearly-exit inis_reasoning_end()still works within the window for multi-turn safety.Test plan
TEST_CASESrun throughis_reasoning_end_streaming()to confirm parity withis_reasoning_end()pytest tests/reasoning/test_gptoss_reasoning_parser.py -v