fix(parsers): recover Kimi K2 tool calls when section_end is missing#8208
Conversation
WalkthroughRefactors the Kimi K2 tool-call parser to handle truncated output gracefully by making the section-end marker optional. Updates the parser API to return Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/llm/src/protocols/openai/chat_completions/jail.rs (1)
803-824:⚠️ Potential issue | 🟠 MajorPreserve the terminal finish reason on the new
finalize()recovery path.These Kimi
Nonecases now intentionally defer emission tofinalize(), but the parsed tool-call branch increate_tool_call_choice()always builds the emitted chunk withfinish_reason: None. For a truncated stream ending withFinishReason::Length, the recovered tool call comes out without any terminal reason, andfix_finish_reason()cannot restore it because it only rewrites existingSome(Stop)values. Please threadbase_choice.finish_reasonthrough the successful parsed-tool-call path so recovered Kimi responses still terminate withLength(or get rewritten fromStoptoToolCalls).Suggested fix
- let choice = create_choice_stream( - choice_index, - Some(Role::Assistant), - normal_text.as_deref().unwrap_or(""), - Some(tool_call_chunks), - None, - None, - None, - ); + let choice = create_choice_stream( + choice_index, + Some(Role::Assistant), + normal_text.as_deref().unwrap_or(""), + Some(tool_call_chunks), + base_choice.finish_reason, + base_choice.stop_reason.clone(), + base_choice.logprobs.clone(), + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/llm/src/protocols/openai/chat_completions/jail.rs` around lines 803 - 824, create_tool_call_choice() currently constructs the emitted tool-call chunk with finish_reason: None causing recovered Kimi tool-call responses (from finalize()) to lose terminal reasons like FinishReason::Length; update the parsed-tool-call branch in create_tool_call_choice() to propagate the original base_choice.finish_reason into the emitted chunk (use base_choice.finish_reason instead of None) so finalize() recovers tool calls with the correct terminal reason and fix_finish_reason() can continue to rewrite Stop→ToolCalls as before; ensure any helper paths that build the emitted choice (the parsed tool-call success branch where try_tool_call_parse_aggregate(...) and find_tool_call_end_position(...) return) use base_choice.finish_reason and keep existing behavior for non-parsed branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@lib/llm/src/protocols/openai/chat_completions/jail.rs`:
- Around line 803-824: create_tool_call_choice() currently constructs the
emitted tool-call chunk with finish_reason: None causing recovered Kimi
tool-call responses (from finalize()) to lose terminal reasons like
FinishReason::Length; update the parsed-tool-call branch in
create_tool_call_choice() to propagate the original base_choice.finish_reason
into the emitted chunk (use base_choice.finish_reason instead of None) so
finalize() recovers tool calls with the correct terminal reason and
fix_finish_reason() can continue to rewrite Stop→ToolCalls as before; ensure any
helper paths that build the emitted choice (the parsed tool-call success branch
where try_tool_call_parse_aggregate(...) and find_tool_call_end_position(...)
return) use base_choice.finish_reason and keep existing behavior for non-parsed
branches.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b62536c8-0684-494a-b37f-c0a1eaf514d8
📒 Files selected for processing (5)
docs/agents/kimi_k2.mdlib/llm/src/protocols/openai/chat_completions/jail.rslib/llm/tests/test_streaming_tool_parsers.rslib/parsers/src/tool_calling/parsers.rslib/parsers/src/tool_calling/xml/kimi_k2_parser.rs
d87bb6f to
c78fac3
Compare
When the model hits max_tokens or EOS before emitting <|tool_calls_section_end|>, the parser now treats the rest of the string as the section body and extracts any complete individual tool calls (those with <|tool_call_begin|> + args + <|tool_call_end|>). This matches the one-line regex fix proposed upstream for Moonshot's reference Python parser, where `re.findall` silently returns [] when section_end is absent. Parser change (kimi_k2_parser.rs): - extract_tool_calls: missing section_end → section extends to EOF - find_tool_call_end_position_kimi_k2: returns Option<usize>, None when section_end absent (signals incomplete section to the streaming jail) Streaming jail change (jail.rs): - should_end_jail: when find_tool_call_end_position returns None, skip early-exit so parallel tool calls keep accumulating until stream end; finalize() recovers them via the lenient parser Tests: - 4 new parser unit tests for truncation scenarios - 3 new streaming integration tests (complete section, single call truncated, multiple calls truncated) Docs: - docs/agents/kimi_k2.md: reference parser bug analysis with actual and proposed Python code, upstream TODO checklist DIS-1765
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Reviewer feedback: this file captures troubleshooting and repro steps rather than user-facing docs. Moved to ~/notes/tool-calling/kimi_k2.md. Signed-off-by: Keiven Chang <keivenc@nvidia.com>
c78fac3 to
118a390
Compare
Add universal test cases that apply to every tool-call parser, filling in the gaps in the per-parser coverage by surfacing the matrix as live cargo-test output instead of a static doc. This first revision covers two cases for all 18 registered parsers: - case_1_single_call -- single tool-call happy path (18/18 pass) - case_5_missing_end_token_recovery (PR #8208 generalized) -- 2 pass (kimi_k2, mistral), 5 N/A, 11 KnownBroken (each tagged for follow-up work to generalize the PR #8208 fix) The framework is a four-state FixtureCase enum: Sample (parser handles correctly), KnownBroken (parser drops the call today; the test asserts the broken state and fails when a future fix actually adds recovery, forcing the fixture to be upgraded), NotApplicable (format genuinely lacks the concept; reason printed), Unimplemented (CI hard-fail). The four states distinguish honest gaps from silent ones, which is the matrix-sparseness problem this scaffold is meant to solve. Layout: - lib/parsers/src/tool_calling/test_cases/{mod,normalize,tests}.rs - lib/parsers/src/tool_calling/test_cases/fixtures/<parser>.rs (x18) Existing 134 inline parser tests are untouched; each test module gets a TODO breadcrumb pointing at the contract suite for future trimming once the suite stabilizes. Future revisions: more universal cases (parallel calls, malformed args, finish_reason, etc.), adversarial streaming chunkings, and a CI matrix-report block. Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
CASE.16 was misleading — it sat next to CASE.1–CASE.15 generic test categories but is actually a per-incident bookkeeping marker, not a contract every parser must cover. Renames in this commit: - TEST_CASES.md: CASE.16 retired; REPORT.<n> introduced as a separate taxonomy with explicit semantics. - xml/kimi_k2_parser.rs: CASE.16 (PR #8208) → REPORT.8208. - test_streaming_tool_parsers.rs: CASE.16 → REPORT.8208 (same incident). - reasoning/base_parser.rs: drops the CASE.16 marker from two tests that had no incident reference; only CASE.8 (the real category) remains. - dsml/parser.rs chart: notes that DSv4 has no REPORT.<n> tests yet because no customer bugs have been filed against V4. Linear DIS-1842 description updated with the same rename so the chart and the in-code labels stay consistent. Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
…omments REPORT.<n> was duplicating information already conveyed by the (PR #N) parenthetical inside the same #[test] line. Reverts the taxonomy: - TEST_CASES.md: drops REPORT.<n> bullet and section; documents the inline (PR #N) / (DIS-NNNN) convention instead. - xml/kimi_k2_parser.rs: REPORT.8208 → just (PR #8208) in CASE.5 comment. - test_streaming_tool_parsers.rs: same. - dsml/parser.rs chart: drops REPORT.<n> reference; notes V4 has no customer incidents yet. CASE.16 stays retired (no longer referenced anywhere). Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Overview:
Kimi K2 tool calls were getting silently dropped whenever the model hit max_tokens before emitting the section_end marker. The parser just threw away everything if that closing tag was missing -- same bug exists in Moonshot's reference Python parser and got copied into vLLM and SGLang too.
Details:
extract_tool_calls(kimi_k2_parser.rs) -- the old code discarded the entire tool call section whensection_endwasnt there. now it treats missingsection_endas "section goes to EOF" and pulls out whatever complete individual calls it can findjail.rs) had a related issue:find_tool_call_end_position_kimi_k2couldn't tell "found section_end" apart from "section_end is missing", so it would early-exit and swallow parallel calls. changed the return type toOption<usize>soNoneskips the early-exit path, andfinalize()recovers all the calls at stream endWhere should the reviewer start?
docs/agents/kimi_k2.md, thenkimi_k2_parser.rsRelated Issues:
Relates to DIS-1765
/coderabbit profile chill