test(parsers): harden top-7 parsers with 5 more corner-case tests by keivenchang · Pull Request #8846 · ai-dynamo/dynamo

keivenchang · 2026-04-29T19:15:50Z

Overview:

This PR does not modify any parser. It only adds tests that expose existing parser problems. Parser fixes come in a follow-up PR.

Coverage chart — CASE.1–5 × top-7 parsers

Legend: ✓ dedicated test · ✗ not properly covered · N/A not applicable

Case	Description	Kimi K2.6	GLM 5.1	MiniMax 2.7	Qwen 3.5	Nemotron	gpt-oss	DSv4
CASE.1	Single tool call (happy path)	✓	✓	✓	✓	✓	✓	✓
CASE.2	Multiple tool calls	✓	✓	✓	✓	✗	✗	✓
CASE.3	No tool call (text only)	✓	✓	✓	✓	✓	✓	✓
CASE.4	Malformed / partial JSON args	✗	✓	✓	✓	✗	✗	✓
CASE.5	Missing end-token recovery	✓	✗	✗	✗	✗	✗	✓

The new tests target the ✗ cells. Most assert calls.len() == 0 — that's what the parser does today. When the parser is fixed later, the assertion flips to 1 and the matrix updates in the same follow-up PR. Only Nemotron CASE.2 ✗ → ✓ promotes on this merge (parser already works; test was just missing).

Customer-facing symptom for the silent-drop cases: HTTP 200 with empty tool_calls and no error.

Details:

Add CASE.5 silent-drop test for glm47 when outer </tool_call> is absent (test_parse_no_end_tag_complete_args_silent_drop)
Add CASE.5 partial-recovery test for glm47 when first call is complete and second drops on missing </tool_call> (test_parse_no_end_tag_multiple_calls_silent_drop)
Add CASE.5 silent-drop test for qwen3_coder via shared xml/parser.rs (test_parse_qwen3_no_outer_close_silent_drop)
Add CASE.5 silent-drop test for minimax_m2 with its own XmlParserConfig (test_parse_minimax_m2_no_outer_close_silent_drop)
Add CASE.5 silent-drop test for nemotron_deci when outer </TOOLCALL> is absent (test_parse_nemotron_deci_no_outer_close_silent_drop)
Relabel harmony's test_parse_tool_calls_harmony_without_call_token to also carry CASE.5 (it was tagged CASE.4 only despite exercising missing-<|call|> recovery)
Add CASE.2 multi-call test for nemotron_deci pinning working behavior — promotes ✗ → ✓ on merge (test_parse_nemotron_deci_multiple_calls)
Add CASE.2 silent-drop test for harmony when two <|start|>assistant<|channel|>commentary blocks appear back-to-back (test_parse_harmony_multiple_calls_silent_drop)
Add CASE.4 silent-drop test for kimi_k2 when JSON args are truncated mid-value inside otherwise complete fences (test_parse_truncated_json_inside_complete_fences_silent_drop)
Add CASE.4 silent-drop test for nemotron_deci truncated JSON (test_parse_nemotron_deci_truncated_json_silent_drop)
Add CASE.4 silent-drop test for harmony truncated JSON before <|call|> (test_parse_harmony_truncated_json_silent_drop)

Where should the reviewer start?

lib/parsers/src/tool_calling/xml/glm47_parser.rs, then lib/parsers/src/tool_calling/json/mod.rs, lib/parsers/src/tool_calling/harmony/harmony_parser.rs

Related Issues:

Relates to DIS-1842

/coderabbit profile chill

coderabbitai · 2026-04-29T19:21:01Z

Walkthrough

The pull request adds regression tests across five tool-calling parser implementations (Harmony, JSON, GLM 4.7, Kimi K2, and XML) to validate "silent drop" behavior when tool calls are incomplete due to truncation or missing closing delimiters. No production code is modified.

Changes

Cohort / File(s)	Summary
XML-based parsers `lib/parsers/src/tool_calling/xml/glm47_parser.rs`, `lib/parsers/src/tool_calling/xml/kimi_k2_parser.rs`, `lib/parsers/src/tool_calling/xml/parser.rs`	Added regression tests validating that incomplete tool calls due to missing or truncated closing tags result in zero parsed calls, with full input treated as normal text.
Harmony and JSON parsers `lib/parsers/src/tool_calling/harmony/harmony_parser.rs`, `lib/parsers/src/tool_calling/json/mod.rs`	Added regression tests for truncation scenarios including back-to-back commentary blocks, missing outer closing tags, and mid-value JSON truncation, all expecting silent drop behavior with zero extracted tool calls.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description fully covers all required template sections with clear, comprehensive information.
Title check	✅ Passed	The PR title accurately reflects the main objective: adding regression tests across multiple parser implementations to validate corner-case behavior under missing end-tokens and truncation scenarios.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/parsers/src/tool_calling/harmony/harmony_parser.rs`:
- Around line 273-279: The comment block referencing internal Linear tickets
"DIS-1842" and "DIS-1832" should be updated: remove or replace those internal
references with public GitHub issue links or neutral wording. Edit the comment
around Harmony's `<|call|>` and the surrounding test prose (the lines mentioning
"DIS-1842 / DIS-1832") to either cite corresponding GitHub issues (e.g.,
#<issue>) or rephrase to "see issue tracker" / "see discussion" so no internal
Linear IDs remain, keeping the rest of the explanatory text intact.

In `@lib/parsers/src/tool_calling/json/mod.rs`:
- Around line 153-157: The source comment in
lib/parsers/src/tool_calling/json/mod.rs contains internal Linear ticket IDs
"DIS-1842" and "DIS-1832"; remove or replace those IDs with either the
corresponding public GitHub issue numbers or a generic description (e.g.,
"internal tracker references removed") so no internal ticket identifiers remain
in the checked-in comment that begins "Pin current Nemotron behavior..." and
references recovery/parser change behavior.

In `@lib/parsers/src/tool_calling/xml/glm47_parser.rs`:
- Around line 450-454: The comment in glm47_parser.rs references internal Linear
IDs "DIS-1842 / DIS-1832"; update that comment to remove Linear ticket
identifiers and replace them with GitHub-style issue references (e.g., "#1842"
and "#1832" or "GH-1842/GH-1832") or neutral text. Locate the comment near the
GLM 4.7 parser block mentioning `</tool_call>`, `TEST_CASES.md`, and the
`kimi_k2_parser::test_parse_malformed_no_section_end` test, then edit the text
to drop the internal IDs and use the approved GitHub-format references or a
generic description.

In `@lib/parsers/src/tool_calling/xml/parser.rs`:
- Around line 905-910: The source comment in parser.rs mentions internal Linear
ticket IDs (DIS-1842 / DIS-1832) which must be removed; edit the comment around
the XmlParserConfig::default() / qwen3_coder explanation (the block describing
absent OUTER </tool_call> fence and behavior) to delete or replace the Linear
IDs with a generic reference (e.g., "internal bug reports" or "internal
tickets") or describe the observed failure mode/test case instead, keeping the
rest of the explanatory text intact and preserving references to
XmlParserConfig::default(), qwen3_coder, and the related
test_parse_missing_*_closing_tag note.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9f92d884-6255-44be-a086-1556ae348ed8

📥 Commits

Reviewing files that changed from the base of the PR and between bd90b4e and 2ea6485.

📒 Files selected for processing (5)

lib/parsers/src/tool_calling/harmony/harmony_parser.rs
lib/parsers/src/tool_calling/json/mod.rs
lib/parsers/src/tool_calling/xml/glm47_parser.rs
lib/parsers/src/tool_calling/xml/kimi_k2_parser.rs
lib/parsers/src/tool_calling/xml/parser.rs

keivenchang · 2026-04-29T19:31:47Z

@coderabbitai please see if comments are addressed. If so, Resolve Comment.

coderabbitai · 2026-04-29T19:33:58Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# List all review comments made by coderabbitai on PR `#8846`
gh pr view 8846 --repo ai-dynamo/dynamo --json reviews,comments

# Also get review comments (inline)
gh api repos/ai-dynamo/dynamo/pulls/8846/comments --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, body: .body, path: .path, line: .line, position: .position}'

Length of output: 18795

🏁 Script executed:

#!/bin/bash
# Get all review comments (inline) on PR `#8846`
gh api repos/ai-dynamo/dynamo/pulls/8846/comments --jq '.[] | {id: .id, user: .user.login, body: .body, path: .path, line: .line, created_at: .created_at}' 

echo "---THREADS---"
# Get review threads to check resolution status
gh api graphql -f query='
{
  repository(owner: "ai-dynamo", name: "dynamo") {
    pullRequest(number: 8846) {
      reviewThreads(first: 20) {
        nodes {
          id
          isResolved
          isOutdated
          comments(first: 5) {
            nodes {
              author { login }
              body
              path
              line
              createdAt
            }
          }
        }
      }
    }
  }
}'

Length of output: 82683

3 of the 4 threads are already resolved. Let me resolve the remaining one for harmony_parser.rs:

🏁 Script executed:

#!/bin/bash
# Resolve the remaining unresolved thread for harmony_parser.rs
gh api graphql -f query='
mutation {
  resolveReviewThread(input: {threadId: "PRRT_kwDOOCjvss5-iziu"}) {
    thread {
      id
      isResolved
    }
  }
}'

Length of output: 92

@keivenchang All 4 review comments have been verified as addressed (internal Linear ticket IDs stripped in commit e77b713) and all threads are now resolved ✅:

File	Thread	Status
`harmony/harmony_parser.rs` (Line 279)	Remove `DIS-1842`/`DIS-1832` refs	Resolved ✅
`json/mod.rs` (Line 157)	Remove `DIS-1842`/`DIS-1832` refs	Resolved ✅
`xml/glm47_parser.rs` (Line 454)	Remove `DIS-1842`/`DIS-1832` refs	Resolved ✅
`xml/parser.rs` (Line 910)	Remove `DIS-1842`/`DIS-1832` refs	Resolved ✅

🧠 Learnings used

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 0
File: :0-0
Timestamp: 2026-04-25T01:38:07.316Z
Learning: User keivenchang (repo: ai-dynamo/dynamo) prefers that CodeRabbit automatically resolves its own review comments as soon as the concern has been addressed, without requiring an explicit "resolve" request each time.

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3035
File: lib/runtime/src/metrics/prometheus_names.rs:49-53
Timestamp: 2025-09-16T00:26:37.092Z
Learning: keivenchang prefers consistency in metric naming standardization over strict adherence to Prometheus conventions about gauge vs counter suffixes. When standardizing metrics naming, prioritize consistency across the codebase rather than technical pedantry about individual metric type conventions.

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3051
File: container/templates/Dockerfile.trtllm.j2:424-437
Timestamp: 2025-09-16T17:16:03.785Z
Learning: keivenchang prioritizes maintaining exact backward compatibility during migration/refactoring PRs, even when bugs are identified in the original code. Fixes should be deferred to separate PRs after the migration is complete.

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 0
File: :0-0
Timestamp: 2026-04-25T01:36:01.212Z
Learning: In `lib/parsers/src/tool_calling/dsml/parser.rs`, inline/parameter regex caching for DSML parsing (analogous to `get_dsml_regexes` via `OnceLock + RwLock` keyed on 6 config strings that already exists for block/outer regexes) is intentionally deferred to a follow-up PR. It is a valid performance concern but not a correctness issue, and was kept out of the DeepSeek V4 hardening PR (`#8670`) to avoid scope creep.

…op-N parsers DIS-1842 / DIS-1832 — add per-parser unit tests that document the current customer-facing failure modes when models hit max_tokens or emit EOS mid-call. No parser code changes in this PR; assertions pin observed behavior so a future parser change has to update the test intentionally. Recovery (matching Kimi K2 post-DIS-1765) is a follow-up parser PR. Tests added (10 new + 1 label correction): CASE.5 — missing outer end-token (silent drop): - glm47_parser: test_parse_no_end_tag_complete_args_silent_drop - glm47_parser: test_parse_no_end_tag_multiple_calls_silent_drop - xml/parser: test_parse_qwen3_no_outer_close_silent_drop - xml/parser: test_parse_minimax_m2_no_outer_close_silent_drop - json/mod: test_parse_nemotron_deci_no_outer_close_silent_drop - harmony_parser: relabeled test_parse_tool_calls_harmony_without_call_token to also carry the CASE.5 marker (it was already exercising missing-call-token recovery). CASE.2 — multiple calls: - json/mod: test_parse_nemotron_deci_multiple_calls (Nemotron parser already handles this correctly; test makes the per-parser surface explicit.) - harmony_parser: test_parse_harmony_multiple_calls_silent_drop (gpt-oss drops both calls when two commentary blocks appear back-to-back.) CASE.4 — truncated JSON args: - kimi_k2_parser: test_parse_truncated_json_inside_complete_fences_silent_drop (Kimi recovers the missing-fence case but drops truncated payload inside complete fences.) - json/mod: test_parse_nemotron_deci_truncated_json_silent_drop - harmony_parser: test_parse_harmony_truncated_json_silent_drop Customer symptom for the silent-drop cases: HTTP 200 with empty tool_calls and no error — no signal that work was lost. Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

Address CodeRabbit feedback. The new test comments referenced DIS-1842 / DIS-1832; per repo policy internal Linear IDs do not belong in checked-in source. Comments still convey the intent (this pins broken behavior, follow-up parser change flips the assertion) without naming the tracker. No test logic changes. Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

ayushag-nv

Looks really Great. Thanks for working on this

…s top-7 parsers Part 2 of the work tracked by #8846. That PR added 9 _silent_drop tests pinning scenarios where parsers saw a real tool call but dropped it because max_tokens / EOS truncated the closing fence or cut JSON mid-value. This PR makes those 9 cases recover instead. The _silent_drop tests are flipped to _recovers and assert the extracted call. Cells flipped from x to check in the #8846 coverage chart: CASE.2 gpt-oss; CASE.4 Kimi K2 / Nemotron / gpt-oss; CASE.5 GLM / MiniMax / Qwen / Nemotron. - base_json_parser: EOF-as-end-token recovery + try_repair_truncated_json helper for unclosed strings/braces. - glm47_parser: recover when </tool_call> missing, gated on <arg_key>. - xml/parser.rs (qwen3_coder, minimax_m2): same recovery, gated on function_start_token. - kimi_k2_parser: relax args regex to .*? so truncated JSON hits the existing raw-string fallback. - harmony_parser: regex fallback when openai_harmony tokenizer rejects parallel commentary blocks or truncated args. Validation: cargo test -p dynamo-parsers --lib (409 passed, 0 failed), cargo clippy --tests -- -D warnings (clean). Relates to DIS-1842 Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

) Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>

… family Pythonic was the only top-N family untouched by the recent coverage PRs (#8888, #8946, #8846, #9035, #8852). Add three small batch-mode tests that mirror the top-N quartet pattern landed for harmony/glm47/qwen3/etc. in #9035, plus the parameterless-call shape from vLLM's pythonic test file. - `test_parse_tool_call_parse_pythonic_empty_args` (PARSER.batch.6) — `[get_weather()]` returns one call with `arguments={}`. Mirrors vLLM `test_pythonic_tool_parser.py::test_tool_call[parameterless_*]`. - `test_parse_pythonic_empty_and_whitespace_inputs` (PARSER.batch.9) — empty / whitespace-only inputs return 0 calls and empty content without panicking. Mirrors the #9035 quartet contract. - `test_parse_pythonic_duplicate_calls_same_name` (PARSER.batch.10) — two `get_weather` calls in one list surface with distinct ids (`call-1`, `call-2`). Pins the auto-generated-id contract for the duplicate-name case. `cargo test -p dynamo-parsers --lib tool_calling::pythonic` — 19/19 pass. Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

keivenchang requested a review from a team as a code owner April 29, 2026 19:15

pull-request-size Bot added the size/L label Apr 29, 2026

github-actions Bot added the test label Apr 29, 2026

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread lib/parsers/src/tool_calling/harmony/harmony_parser.rs

Comment thread lib/parsers/src/tool_calling/json/mod.rs

Comment thread lib/parsers/src/tool_calling/xml/glm47_parser.rs

Comment thread lib/parsers/src/tool_calling/xml/parser.rs

keivenchang self-assigned this Apr 29, 2026

copy-pr-bot Bot temporarily deployed to GITLAB April 29, 2026 19:30 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 29, 2026 19:44 Inactive

keivenchang added 2 commits April 29, 2026 14:21

keivenchang force-pushed the keivenchang/DIS-1842__case5-top6-coverage branch from e77b713 to c765450 Compare April 29, 2026 21:22

copy-pr-bot Bot temporarily deployed to GITLAB April 29, 2026 21:22 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 29, 2026 22:19 Inactive

keivenchang changed the title ~~test(parsers): pin missing-end-token + truncation behavior across top-N parsers~~ test(parsers): harden top-7 parsers with 5 more corner-case tests Apr 30, 2026

ayushag-nv approved these changes Apr 30, 2026

View reviewed changes

ayushag-nv merged commit 4fac489 into main Apr 30, 2026
158 of 161 checks passed

ayushag-nv deleted the keivenchang/DIS-1842__case5-top6-coverage branch April 30, 2026 02:08

This was referenced Apr 30, 2026

fix(parsers): part 2 — recover 9 silent-drop scenarios across top-7 parsers #8888

Merged

test(revalidate): test(parsers): harden top-7 parsers with 5 more corner-case tests #8846 #8890

Closed

furionw pushed a commit that referenced this pull request May 2, 2026

test(parsers): harden top-7 parsers with 5 more corner-case tests (#8846

1bc753c

) Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>

keivenchang mentioned this pull request May 14, 2026

test(parsers): unified cases × all parsers, coverage matrix (step 1) #8728

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(parsers): harden top-7 parsers with 5 more corner-case tests#8846

test(parsers): harden top-7 parsers with 5 more corner-case tests#8846
ayushag-nv merged 2 commits into
mainfrom
keivenchang/DIS-1842__case5-top6-coverage

keivenchang commented Apr 29, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

keivenchang commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026

Uh oh!

ayushag-nv left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

keivenchang commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Coverage chart — CASE.1–5 × top-7 parsers

Details:

Where should the reviewer start?

Related Issues:

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

keivenchang commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026

Uh oh!

ayushag-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

keivenchang commented Apr 29, 2026 •

edited

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading