Skip to content

fix(parsers): recover Kimi K2 tool calls when section_end is missing#8208

Merged
rmccorm4 merged 3 commits into
mainfrom
keivenchang/fix-kimi-k2-tool-call-boundary
Apr 20, 2026
Merged

fix(parsers): recover Kimi K2 tool calls when section_end is missing#8208
rmccorm4 merged 3 commits into
mainfrom
keivenchang/fix-kimi-k2-tool-call-boundary

Conversation

@keivenchang
Copy link
Copy Markdown
Contributor

@keivenchang keivenchang commented Apr 15, 2026

Overview:

Kimi K2 tool calls were getting silently dropped whenever the model hit max_tokens before emitting the section_end marker. The parser just threw away everything if that closing tag was missing -- same bug exists in Moonshot's reference Python parser and got copied into vLLM and SGLang too.

Details:

  • the root cause is in extract_tool_calls (kimi_k2_parser.rs) -- the old code discarded the entire tool call section when section_end wasnt there. now it treats missing section_end as "section goes to EOF" and pulls out whatever complete individual calls it can find
  • the streaming jail (jail.rs) had a related issue: find_tool_call_end_position_kimi_k2 couldn't tell "found section_end" apart from "section_end is missing", so it would early-exit and swallow parallel calls. changed the return type to Option<usize> so None skips the early-exit path, and finalize() recovers all the calls at stream end
  • 4 new parser unit tests, 3 new streaming integration tests that reproduce the customer-reported scenario (DIS-1765)

Where should the reviewer start?

docs/agents/kimi_k2.md, then kimi_k2_parser.rs

Related Issues:

Relates to DIS-1765

/coderabbit profile chill

@keivenchang keivenchang requested a review from a team as a code owner April 15, 2026 02:13
@github-actions github-actions Bot added fix documentation Improvements or additions to documentation frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Apr 15, 2026
@keivenchang keivenchang self-assigned this Apr 15, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 15, 2026

Walkthrough

Refactors the Kimi K2 tool-call parser to handle truncated output gracefully by making the section-end marker optional. Updates the parser API to return Option<usize> instead of usize, modifying underlying implementations and callers to treat missing section-end markers as incomplete rather than errors.

Changes

Cohort / File(s) Summary
Documentation
docs/agents/kimi_k2.md
New documentation page describing the tool-call truncation bug and documenting the Python regex fix and Rust-side implementation status, including TODO items to upstream changes.
Kimi K2 Parser
lib/parsers/src/tool_calling/xml/kimi_k2_parser.rs
Changed find_tool_call_end_position_kimi_k2 return type from usize to Option<usize>; returns None when section_end marker is missing. Updated extract_tool_calls to extract complete tool calls from truncated output lacking section_end.
Parser API
lib/parsers/src/tool_calling/parsers.rs
Updated find_tool_call_end_position return type from usize to Option<usize>. All parser branches now wrap results in Some(...). Added doc comment clarifying that None indicates unclosed/incomplete tool-call sections.
Stream Handling
lib/llm/src/protocols/openai/chat_completions/jail.rs
Modified JailedStream::should_end_jail early-exit logic to treat find_tool_call_end_position as optional; returns (false, accumulated_content.len()) when None is returned instead of unconditionally splitting.
Test Coverage
lib/llm/tests/test_streaming_tool_parsers.rs
Added make_chunk test helper and three integration-style tests: baseline case with complete markers, DIS-1765 reproduction with truncated output (no section_end), and multi-tool-call variant with truncation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: fixing Kimi K2 tool call recovery when the section_end marker is missing.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed PR description follows the template with all required sections: Overview explains the bug clearly, Details describe specific changes and affected files, Where should the reviewer start identifies key files, and Related Issues references DIS-1765.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/llm/src/protocols/openai/chat_completions/jail.rs (1)

803-824: ⚠️ Potential issue | 🟠 Major

Preserve the terminal finish reason on the new finalize() recovery path.

These Kimi None cases now intentionally defer emission to finalize(), but the parsed tool-call branch in create_tool_call_choice() always builds the emitted chunk with finish_reason: None. For a truncated stream ending with FinishReason::Length, the recovered tool call comes out without any terminal reason, and fix_finish_reason() cannot restore it because it only rewrites existing Some(Stop) values. Please thread base_choice.finish_reason through the successful parsed-tool-call path so recovered Kimi responses still terminate with Length (or get rewritten from Stop to ToolCalls).

Suggested fix
-                    let choice = create_choice_stream(
-                        choice_index,
-                        Some(Role::Assistant),
-                        normal_text.as_deref().unwrap_or(""),
-                        Some(tool_call_chunks),
-                        None,
-                        None,
-                        None,
-                    );
+                    let choice = create_choice_stream(
+                        choice_index,
+                        Some(Role::Assistant),
+                        normal_text.as_deref().unwrap_or(""),
+                        Some(tool_call_chunks),
+                        base_choice.finish_reason,
+                        base_choice.stop_reason.clone(),
+                        base_choice.logprobs.clone(),
+                    );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/protocols/openai/chat_completions/jail.rs` around lines 803 -
824, create_tool_call_choice() currently constructs the emitted tool-call chunk
with finish_reason: None causing recovered Kimi tool-call responses (from
finalize()) to lose terminal reasons like FinishReason::Length; update the
parsed-tool-call branch in create_tool_call_choice() to propagate the original
base_choice.finish_reason into the emitted chunk (use base_choice.finish_reason
instead of None) so finalize() recovers tool calls with the correct terminal
reason and fix_finish_reason() can continue to rewrite Stop→ToolCalls as before;
ensure any helper paths that build the emitted choice (the parsed tool-call
success branch where try_tool_call_parse_aggregate(...) and
find_tool_call_end_position(...) return) use base_choice.finish_reason and keep
existing behavior for non-parsed branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@lib/llm/src/protocols/openai/chat_completions/jail.rs`:
- Around line 803-824: create_tool_call_choice() currently constructs the
emitted tool-call chunk with finish_reason: None causing recovered Kimi
tool-call responses (from finalize()) to lose terminal reasons like
FinishReason::Length; update the parsed-tool-call branch in
create_tool_call_choice() to propagate the original base_choice.finish_reason
into the emitted chunk (use base_choice.finish_reason instead of None) so
finalize() recovers tool calls with the correct terminal reason and
fix_finish_reason() can continue to rewrite Stop→ToolCalls as before; ensure any
helper paths that build the emitted choice (the parsed tool-call success branch
where try_tool_call_parse_aggregate(...) and find_tool_call_end_position(...)
return) use base_choice.finish_reason and keep existing behavior for non-parsed
branches.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b62536c8-0684-494a-b37f-c0a1eaf514d8

📥 Commits

Reviewing files that changed from the base of the PR and between 45364b5 and c0f12a8.

📒 Files selected for processing (5)
  • docs/agents/kimi_k2.md
  • lib/llm/src/protocols/openai/chat_completions/jail.rs
  • lib/llm/tests/test_streaming_tool_parsers.rs
  • lib/parsers/src/tool_calling/parsers.rs
  • lib/parsers/src/tool_calling/xml/kimi_k2_parser.rs

Comment thread docs/agents/kimi_k2.md Outdated
When the model hits max_tokens or EOS before emitting
<|tool_calls_section_end|>, the parser now treats the rest of the string
as the section body and extracts any complete individual tool calls
(those with <|tool_call_begin|> + args + <|tool_call_end|>).

This matches the one-line regex fix proposed upstream for Moonshot's
reference Python parser, where `re.findall` silently returns [] when
section_end is absent.

Parser change (kimi_k2_parser.rs):
- extract_tool_calls: missing section_end → section extends to EOF
- find_tool_call_end_position_kimi_k2: returns Option<usize>, None when
  section_end absent (signals incomplete section to the streaming jail)

Streaming jail change (jail.rs):
- should_end_jail: when find_tool_call_end_position returns None, skip
  early-exit so parallel tool calls keep accumulating until stream end;
  finalize() recovers them via the lenient parser

Tests:
- 4 new parser unit tests for truncation scenarios
- 3 new streaming integration tests (complete section, single call
  truncated, multiple calls truncated)

Docs:
- docs/agents/kimi_k2.md: reference parser bug analysis with actual and
  proposed Python code, upstream TODO checklist

DIS-1765
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Reviewer feedback: this file captures troubleshooting and repro steps
rather than user-facing docs. Moved to ~/notes/tool-calling/kimi_k2.md.

Signed-off-by: Keiven Chang <keivenc@nvidia.com>
@keivenchang keivenchang force-pushed the keivenchang/fix-kimi-k2-tool-call-boundary branch from c78fac3 to 118a390 Compare April 20, 2026 21:40
@rmccorm4 rmccorm4 enabled auto-merge (squash) April 20, 2026 22:28
@rmccorm4 rmccorm4 merged commit 47cfbd4 into main Apr 20, 2026
90 checks passed
@rmccorm4 rmccorm4 deleted the keivenchang/fix-kimi-k2-tool-call-boundary branch April 20, 2026 22:28
keivenchang added a commit that referenced this pull request Apr 25, 2026
Add universal test cases that apply to every tool-call parser, filling
in the gaps in the per-parser coverage by surfacing the matrix as live
cargo-test output instead of a static doc.

This first revision covers two cases for all 18 registered parsers:
- case_1_single_call           --  single tool-call happy path  (18/18 pass)
- case_5_missing_end_token_recovery (PR #8208 generalized)
                               --  2 pass (kimi_k2, mistral),
                                   5 N/A,
                                   11 KnownBroken (each tagged for
                                   follow-up work to generalize the
                                   PR #8208 fix)

The framework is a four-state FixtureCase enum: Sample (parser handles
correctly), KnownBroken (parser drops the call today; the test asserts
the broken state and fails when a future fix actually adds recovery,
forcing the fixture to be upgraded), NotApplicable (format genuinely
lacks the concept; reason printed), Unimplemented (CI hard-fail). The
four states distinguish honest gaps from silent ones, which is the
matrix-sparseness problem this scaffold is meant to solve.

Layout:
- lib/parsers/src/tool_calling/test_cases/{mod,normalize,tests}.rs
- lib/parsers/src/tool_calling/test_cases/fixtures/<parser>.rs (x18)

Existing 134 inline parser tests are untouched; each test module gets
a TODO breadcrumb pointing at the contract suite for future trimming
once the suite stabilizes.

Future revisions: more universal cases (parallel calls, malformed
args, finish_reason, etc.), adversarial streaming chunkings, and a
CI matrix-report block.

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
keivenchang added a commit that referenced this pull request Apr 30, 2026
CASE.16 was misleading — it sat next to CASE.1–CASE.15 generic test
categories but is actually a per-incident bookkeeping marker, not a
contract every parser must cover. Renames in this commit:

- TEST_CASES.md: CASE.16 retired; REPORT.<n> introduced as a separate
  taxonomy with explicit semantics.
- xml/kimi_k2_parser.rs: CASE.16 (PR #8208) → REPORT.8208.
- test_streaming_tool_parsers.rs: CASE.16 → REPORT.8208 (same incident).
- reasoning/base_parser.rs: drops the CASE.16 marker from two tests
  that had no incident reference; only CASE.8 (the real category)
  remains.
- dsml/parser.rs chart: notes that DSv4 has no REPORT.<n> tests yet
  because no customer bugs have been filed against V4.

Linear DIS-1842 description updated with the same rename so the chart
and the in-code labels stay consistent.

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
keivenchang added a commit that referenced this pull request Apr 30, 2026
…omments

REPORT.<n> was duplicating information already conveyed by the (PR #N)
parenthetical inside the same #[test] line. Reverts the taxonomy:

- TEST_CASES.md: drops REPORT.<n> bullet and section; documents the
  inline (PR #N) / (DIS-NNNN) convention instead.
- xml/kimi_k2_parser.rs: REPORT.8208 → just (PR #8208) in CASE.5 comment.
- test_streaming_tool_parsers.rs: same.
- dsml/parser.rs chart: drops REPORT.<n> reference; notes V4 has no
  customer incidents yet.

CASE.16 stays retired (no longer referenced anywhere).

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation fix frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants