test(parsers): harden top-7 parsers with 5 more corner-case tests#8846
Conversation
WalkthroughThe pull request adds regression tests across five tool-calling parser implementations (Harmony, JSON, GLM 4.7, Kimi K2, and XML) to validate "silent drop" behavior when tool calls are incomplete due to truncation or missing closing delimiters. No production code is modified. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/parsers/src/tool_calling/harmony/harmony_parser.rs`:
- Around line 273-279: The comment block referencing internal Linear tickets
"DIS-1842" and "DIS-1832" should be updated: remove or replace those internal
references with public GitHub issue links or neutral wording. Edit the comment
around Harmony's `<|call|>` and the surrounding test prose (the lines mentioning
"DIS-1842 / DIS-1832") to either cite corresponding GitHub issues (e.g.,
#<issue>) or rephrase to "see issue tracker" / "see discussion" so no internal
Linear IDs remain, keeping the rest of the explanatory text intact.
In `@lib/parsers/src/tool_calling/json/mod.rs`:
- Around line 153-157: The source comment in
lib/parsers/src/tool_calling/json/mod.rs contains internal Linear ticket IDs
"DIS-1842" and "DIS-1832"; remove or replace those IDs with either the
corresponding public GitHub issue numbers or a generic description (e.g.,
"internal tracker references removed") so no internal ticket identifiers remain
in the checked-in comment that begins "Pin current Nemotron behavior..." and
references recovery/parser change behavior.
In `@lib/parsers/src/tool_calling/xml/glm47_parser.rs`:
- Around line 450-454: The comment in glm47_parser.rs references internal Linear
IDs "DIS-1842 / DIS-1832"; update that comment to remove Linear ticket
identifiers and replace them with GitHub-style issue references (e.g., "#1842"
and "#1832" or "GH-1842/GH-1832") or neutral text. Locate the comment near the
GLM 4.7 parser block mentioning `</tool_call>`, `TEST_CASES.md`, and the
`kimi_k2_parser::test_parse_malformed_no_section_end` test, then edit the text
to drop the internal IDs and use the approved GitHub-format references or a
generic description.
In `@lib/parsers/src/tool_calling/xml/parser.rs`:
- Around line 905-910: The source comment in parser.rs mentions internal Linear
ticket IDs (DIS-1842 / DIS-1832) which must be removed; edit the comment around
the XmlParserConfig::default() / qwen3_coder explanation (the block describing
absent OUTER </tool_call> fence and behavior) to delete or replace the Linear
IDs with a generic reference (e.g., "internal bug reports" or "internal
tickets") or describe the observed failure mode/test case instead, keeping the
rest of the explanatory text intact and preserving references to
XmlParserConfig::default(), qwen3_coder, and the related
test_parse_missing_*_closing_tag note.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9f92d884-6255-44be-a086-1556ae348ed8
📒 Files selected for processing (5)
lib/parsers/src/tool_calling/harmony/harmony_parser.rslib/parsers/src/tool_calling/json/mod.rslib/parsers/src/tool_calling/xml/glm47_parser.rslib/parsers/src/tool_calling/xml/kimi_k2_parser.rslib/parsers/src/tool_calling/xml/parser.rs
|
@coderabbitai please see if comments are addressed. If so, Resolve Comment. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 18795 🏁 Script executed: Length of output: 82683 3 of the 4 threads are already resolved. Let me resolve the remaining one for 🏁 Script executed: Length of output: 92
🧠 Learnings used |
…op-N parsers DIS-1842 / DIS-1832 — add per-parser unit tests that document the current customer-facing failure modes when models hit max_tokens or emit EOS mid-call. No parser code changes in this PR; assertions pin observed behavior so a future parser change has to update the test intentionally. Recovery (matching Kimi K2 post-DIS-1765) is a follow-up parser PR. Tests added (10 new + 1 label correction): CASE.5 — missing outer end-token (silent drop): - glm47_parser: test_parse_no_end_tag_complete_args_silent_drop - glm47_parser: test_parse_no_end_tag_multiple_calls_silent_drop - xml/parser: test_parse_qwen3_no_outer_close_silent_drop - xml/parser: test_parse_minimax_m2_no_outer_close_silent_drop - json/mod: test_parse_nemotron_deci_no_outer_close_silent_drop - harmony_parser: relabeled test_parse_tool_calls_harmony_without_call_token to also carry the CASE.5 marker (it was already exercising missing-call-token recovery). CASE.2 — multiple calls: - json/mod: test_parse_nemotron_deci_multiple_calls (Nemotron parser already handles this correctly; test makes the per-parser surface explicit.) - harmony_parser: test_parse_harmony_multiple_calls_silent_drop (gpt-oss drops both calls when two commentary blocks appear back-to-back.) CASE.4 — truncated JSON args: - kimi_k2_parser: test_parse_truncated_json_inside_complete_fences_silent_drop (Kimi recovers the missing-fence case but drops truncated payload inside complete fences.) - json/mod: test_parse_nemotron_deci_truncated_json_silent_drop - harmony_parser: test_parse_harmony_truncated_json_silent_drop Customer symptom for the silent-drop cases: HTTP 200 with empty tool_calls and no error — no signal that work was lost. Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Address CodeRabbit feedback. The new test comments referenced DIS-1842 / DIS-1832; per repo policy internal Linear IDs do not belong in checked-in source. Comments still convey the intent (this pins broken behavior, follow-up parser change flips the assertion) without naming the tracker. No test logic changes. Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
e77b713 to
c765450
Compare
ayushag-nv
left a comment
There was a problem hiding this comment.
Looks really Great. Thanks for working on this
…s top-7 parsers Part 2 of the work tracked by #8846. That PR added 9 _silent_drop tests pinning scenarios where parsers saw a real tool call but dropped it because max_tokens / EOS truncated the closing fence or cut JSON mid-value. This PR makes those 9 cases recover instead. The _silent_drop tests are flipped to _recovers and assert the extracted call. Cells flipped from x to check in the #8846 coverage chart: CASE.2 gpt-oss; CASE.4 Kimi K2 / Nemotron / gpt-oss; CASE.5 GLM / MiniMax / Qwen / Nemotron. - base_json_parser: EOF-as-end-token recovery + try_repair_truncated_json helper for unclosed strings/braces. - glm47_parser: recover when </tool_call> missing, gated on <arg_key>. - xml/parser.rs (qwen3_coder, minimax_m2): same recovery, gated on function_start_token. - kimi_k2_parser: relax args regex to .*? so truncated JSON hits the existing raw-string fallback. - harmony_parser: regex fallback when openai_harmony tokenizer rejects parallel commentary blocks or truncated args. Validation: cargo test -p dynamo-parsers --lib (409 passed, 0 failed), cargo clippy --tests -- -D warnings (clean). Relates to DIS-1842 Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
… family Pythonic was the only top-N family untouched by the recent coverage PRs (#8888, #8946, #8846, #9035, #8852). Add three small batch-mode tests that mirror the top-N quartet pattern landed for harmony/glm47/qwen3/etc. in #9035, plus the parameterless-call shape from vLLM's pythonic test file. - `test_parse_tool_call_parse_pythonic_empty_args` (PARSER.batch.6) — `[get_weather()]` returns one call with `arguments={}`. Mirrors vLLM `test_pythonic_tool_parser.py::test_tool_call[parameterless_*]`. - `test_parse_pythonic_empty_and_whitespace_inputs` (PARSER.batch.9) — empty / whitespace-only inputs return 0 calls and empty content without panicking. Mirrors the #9035 quartet contract. - `test_parse_pythonic_duplicate_calls_same_name` (PARSER.batch.10) — two `get_weather` calls in one list surface with distinct ids (`call-1`, `call-2`). Pins the auto-generated-id contract for the duplicate-name case. `cargo test -p dynamo-parsers --lib tool_calling::pythonic` — 19/19 pass. Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Overview:
This PR does not modify any parser. It only adds tests that expose existing parser problems. Parser fixes come in a follow-up PR.
Coverage chart — CASE.1–5 × top-7 parsers
Legend: ✓ dedicated test · ✗ not properly covered · N/A not applicable
The new tests target the ✗ cells. Most assert
calls.len() == 0— that's what the parser does today. When the parser is fixed later, the assertion flips to1and the matrix updates in the same follow-up PR. Only Nemotron CASE.2 ✗ → ✓ promotes on this merge (parser already works; test was just missing).Customer-facing symptom for the silent-drop cases: HTTP 200 with empty
tool_callsand no error.Details:
glm47when outer</tool_call>is absent (test_parse_no_end_tag_complete_args_silent_drop)glm47when first call is complete and second drops on missing</tool_call>(test_parse_no_end_tag_multiple_calls_silent_drop)qwen3_codervia sharedxml/parser.rs(test_parse_qwen3_no_outer_close_silent_drop)minimax_m2with its ownXmlParserConfig(test_parse_minimax_m2_no_outer_close_silent_drop)nemotron_deciwhen outer</TOOLCALL>is absent (test_parse_nemotron_deci_no_outer_close_silent_drop)test_parse_tool_calls_harmony_without_call_tokento also carry CASE.5 (it was tagged CASE.4 only despite exercising missing-<|call|>recovery)nemotron_decipinning working behavior — promotes ✗ → ✓ on merge (test_parse_nemotron_deci_multiple_calls)harmonywhen two<|start|>assistant<|channel|>commentaryblocks appear back-to-back (test_parse_harmony_multiple_calls_silent_drop)kimi_k2when JSON args are truncated mid-value inside otherwise complete fences (test_parse_truncated_json_inside_complete_fences_silent_drop)nemotron_decitruncated JSON (test_parse_nemotron_deci_truncated_json_silent_drop)harmonytruncated JSON before<|call|>(test_parse_harmony_truncated_json_silent_drop)Where should the reviewer start?
lib/parsers/src/tool_calling/xml/glm47_parser.rs, thenlib/parsers/src/tool_calling/json/mod.rs,lib/parsers/src/tool_calling/harmony/harmony_parser.rsRelated Issues:
Relates to DIS-1842
/coderabbit profile chill