Skip to content

perf: add is_reasoning_end_streaming() override to GptOssReasoningParser#4

Open
fergusfinn wants to merge 111 commits into
mainfrom
perf/gptoss-streaming-reasoning-end
Open

perf: add is_reasoning_end_streaming() override to GptOssReasoningParser#4
fergusfinn wants to merge 111 commits into
mainfrom
perf/gptoss-streaming-reasoning-end

Conversation

@fergusfinn
Copy link
Copy Markdown

@fergusfinn fergusfinn commented Mar 2, 2026

Summary

  • Override is_reasoning_end_streaming() in GptOssReasoningParser to window the backward scan to the last ~23 tokens instead of scanning the entire sequence
  • Reduces per-step cost from O(n) to O(1), eliminating the O(n²) total cost over a generation

Approach

The base class is_reasoning_end_streaming(input_ids, delta_ids) defaults to calling is_reasoning_end(input_ids), which scans backward through the full token sequence. Other parsers (Step3, BaseThinking) override this to only check delta_ids (O(1)), but GptOss can't do a simple delta check because its end pattern (<|channel|>final ... <|message|>) spans multiple tokens with a variable gap.

The override windows the search to the last prefix_len + max_gap + suffix_len tokens (~23 for gpt-oss). The reasoning end pattern is always at the tail of the sequence (we just generated those tokens), so looking further back is unnecessary. The eom_token_id early-exit in is_reasoning_end() still works within the window for multi-turn safety.

Test plan

  • All existing TEST_CASES run through is_reasoning_end_streaming() to confirm parity with is_reasoning_end()
  • Same cases with 10k dummy tokens prepended to verify windowing correctness
  • Signature smoke test with empty inputs
  • Run: pytest tests/reasoning/test_gptoss_reasoning_parser.py -v

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99ff69837d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread vllm/reasoning/gptoss_reasoning_parser.py Outdated
Comment thread vllm/reasoning/gptoss_reasoning_parser.py Outdated
@fergusfinn fergusfinn force-pushed the perf/gptoss-streaming-reasoning-end branch from 76ac447 to f374ce9 Compare March 2, 2026 13:08
@fergusfinn fergusfinn force-pushed the perf/gptoss-streaming-reasoning-end branch from f374ce9 to 01b79fe Compare April 10, 2026 06:35
Override is_reasoning_end_streaming() in GptOssReasoningParser to window
the backward scan to the last ~23 + len(delta_ids) tokens instead of
scanning the entire sequence. This reduces per-step cost from O(n) to
O(1), eliminating the O(n²) total cost over a generation.

Including len(delta_ids) in the window ensures correctness under
speculative decoding where a single step can accept many tokens.

Signed-off-by: Fergus <fergus.barratt00@gmail.com>
@fergusfinn fergusfinn force-pushed the perf/gptoss-streaming-reasoning-end branch 2 times, most recently from 06cacc8 to 3ae6795 Compare April 14, 2026 06:57
The base class broadened delta_ids from Sequence to Iterable in vllm-project#33593,
and the call site now passes itertools.islice. Materialize to tuple
before calling len().

Signed-off-by: fergus barratt <fergus.barratt00@gmail.com>
@fergusfinn fergusfinn force-pushed the perf/gptoss-streaming-reasoning-end branch from 3ae6795 to 293e353 Compare April 15, 2026 07:00
fergusfinn and others added 21 commits April 15, 2026 08:01
…duling (vllm-project#38726)

Signed-off-by: Jing Wang <jingwang96@qq.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…nd named function (vllm-project#39870)

Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: sfeng33 <4florafeng@gmail.com>
…ct#39291)

Signed-off-by: allgather <all2allops@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
…mdhip64 (vllm-project#39978)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
…t64 overflow (vllm-project#39953)

Signed-off-by: aditi <aditi.rana@amd.com>
vllm-project#40171)

Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>
…quence_group (vllm-project#40175)

Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
…m-project#39844)

Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: milesial <milesial@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…llm-project#38405)

Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>
Signed-off-by: Nithin Chalapathi <nithinc@berkeley.edu>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
gmagogsfm and others added 30 commits April 21, 2026 00:23
…ject#40167)

Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Theresa Shan <Theresa.Shan@amd.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… on-disk model_type differs (vllm-project#39554)

Signed-off-by: Misa <misaAle@users.noreply.github.com>
Signed-off-by: Misael Casarez <misacasa@amazon.com>
Co-authored-by: Misael Casarez <misacasa@amazon.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…ence (vllm-project#40411)

Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
…orkserver prewarm (vllm-project#40331)

Signed-off-by: simon-mo <simon@inferact.ai>
Signed-off-by: lesj0610 <lesj0610@gmail.com>
Signed-off-by: Hang Yang <hangy@amd.com>
…ompt` (vllm-project#40339)

Signed-off-by: Alchuang22-dev <2584829494@qq.com>
…etch + forkserver prewarm" (vllm-project#40438)

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
…njection (vllm-project#39502)

Signed-off-by: Krish Hung <krishung5@gmail.com>
Signed-off-by: krishung5 <krish@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…37861)

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: Artem Spector <artems@il.ibm.com>
Signed-off-by: artemspector <artems@il.ibm.com>
Co-authored-by: artemspector <artems@il.ibm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…nch serve (vllm-project#40288)

Signed-off-by: talora <talora@nvidia.com>
Signed-off-by: Talor Abramovich <talor19@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Yusuf <yusufmohammad@live.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…oject#39887)

Signed-off-by: zengxian <xiangdong.zeng@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
…UDA graph video inference (vllm-project#40445)

Signed-off-by: shen-shanshan <467638484@qq.com>
…lative decoding is enabled (vllm-project#40454)

Signed-off-by: Roi Koren <roik@nvidia.com>
)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
…oject#40467)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…_sampled_tokens and draft_tokens (vllm-project#39833)

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.