Skip to content

test(parsers): harden top-7 parsers with 5 more corner-case tests#8846

Merged
ayushag-nv merged 2 commits into
mainfrom
keivenchang/DIS-1842__case5-top6-coverage
Apr 30, 2026
Merged

test(parsers): harden top-7 parsers with 5 more corner-case tests#8846
ayushag-nv merged 2 commits into
mainfrom
keivenchang/DIS-1842__case5-top6-coverage

Conversation

@keivenchang
Copy link
Copy Markdown
Contributor

@keivenchang keivenchang commented Apr 29, 2026

Overview:

This PR does not modify any parser. It only adds tests that expose existing parser problems. Parser fixes come in a follow-up PR.

Coverage chart — CASE.1–5 × top-7 parsers

Legend: ✓ dedicated test · ✗ not properly covered · N/A not applicable

Case Description Kimi K2.6 GLM 5.1 MiniMax 2.7 Qwen 3.5 Nemotron gpt-oss DSv4
CASE.1 Single tool call (happy path)
CASE.2 Multiple tool calls
CASE.3 No tool call (text only)
CASE.4 Malformed / partial JSON args
CASE.5 Missing end-token recovery

The new tests target the ✗ cells. Most assert calls.len() == 0 — that's what the parser does today. When the parser is fixed later, the assertion flips to 1 and the matrix updates in the same follow-up PR. Only Nemotron CASE.2 ✗ → ✓ promotes on this merge (parser already works; test was just missing).

Customer-facing symptom for the silent-drop cases: HTTP 200 with empty tool_calls and no error.

Details:

  • Add CASE.5 silent-drop test for glm47 when outer </tool_call> is absent (test_parse_no_end_tag_complete_args_silent_drop)
  • Add CASE.5 partial-recovery test for glm47 when first call is complete and second drops on missing </tool_call> (test_parse_no_end_tag_multiple_calls_silent_drop)
  • Add CASE.5 silent-drop test for qwen3_coder via shared xml/parser.rs (test_parse_qwen3_no_outer_close_silent_drop)
  • Add CASE.5 silent-drop test for minimax_m2 with its own XmlParserConfig (test_parse_minimax_m2_no_outer_close_silent_drop)
  • Add CASE.5 silent-drop test for nemotron_deci when outer </TOOLCALL> is absent (test_parse_nemotron_deci_no_outer_close_silent_drop)
  • Relabel harmony's test_parse_tool_calls_harmony_without_call_token to also carry CASE.5 (it was tagged CASE.4 only despite exercising missing-<|call|> recovery)
  • Add CASE.2 multi-call test for nemotron_deci pinning working behavior — promotes ✗ → ✓ on merge (test_parse_nemotron_deci_multiple_calls)
  • Add CASE.2 silent-drop test for harmony when two <|start|>assistant<|channel|>commentary blocks appear back-to-back (test_parse_harmony_multiple_calls_silent_drop)
  • Add CASE.4 silent-drop test for kimi_k2 when JSON args are truncated mid-value inside otherwise complete fences (test_parse_truncated_json_inside_complete_fences_silent_drop)
  • Add CASE.4 silent-drop test for nemotron_deci truncated JSON (test_parse_nemotron_deci_truncated_json_silent_drop)
  • Add CASE.4 silent-drop test for harmony truncated JSON before <|call|> (test_parse_harmony_truncated_json_silent_drop)

Where should the reviewer start?

lib/parsers/src/tool_calling/xml/glm47_parser.rs, then lib/parsers/src/tool_calling/json/mod.rs, lib/parsers/src/tool_calling/harmony/harmony_parser.rs

Related Issues:

Relates to DIS-1842

/coderabbit profile chill

@keivenchang keivenchang requested a review from a team as a code owner April 29, 2026 19:15
@github-actions github-actions Bot added the test label Apr 29, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Walkthrough

The pull request adds regression tests across five tool-calling parser implementations (Harmony, JSON, GLM 4.7, Kimi K2, and XML) to validate "silent drop" behavior when tool calls are incomplete due to truncation or missing closing delimiters. No production code is modified.

Changes

Cohort / File(s) Summary
XML-based parsers
lib/parsers/src/tool_calling/xml/glm47_parser.rs, lib/parsers/src/tool_calling/xml/kimi_k2_parser.rs, lib/parsers/src/tool_calling/xml/parser.rs
Added regression tests validating that incomplete tool calls due to missing or truncated closing tags result in zero parsed calls, with full input treated as normal text.
Harmony and JSON parsers
lib/parsers/src/tool_calling/harmony/harmony_parser.rs, lib/parsers/src/tool_calling/json/mod.rs
Added regression tests for truncation scenarios including back-to-back commentary blocks, missing outer closing tags, and mid-value JSON truncation, all expecting silent drop behavior with zero extracted tool calls.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description fully covers all required template sections with clear, comprehensive information.
Title check ✅ Passed The PR title accurately reflects the main objective: adding regression tests across multiple parser implementations to validate corner-case behavior under missing end-tokens and truncation scenarios.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/parsers/src/tool_calling/harmony/harmony_parser.rs`:
- Around line 273-279: The comment block referencing internal Linear tickets
"DIS-1842" and "DIS-1832" should be updated: remove or replace those internal
references with public GitHub issue links or neutral wording. Edit the comment
around Harmony's `<|call|>` and the surrounding test prose (the lines mentioning
"DIS-1842 / DIS-1832") to either cite corresponding GitHub issues (e.g.,
#<issue>) or rephrase to "see issue tracker" / "see discussion" so no internal
Linear IDs remain, keeping the rest of the explanatory text intact.

In `@lib/parsers/src/tool_calling/json/mod.rs`:
- Around line 153-157: The source comment in
lib/parsers/src/tool_calling/json/mod.rs contains internal Linear ticket IDs
"DIS-1842" and "DIS-1832"; remove or replace those IDs with either the
corresponding public GitHub issue numbers or a generic description (e.g.,
"internal tracker references removed") so no internal ticket identifiers remain
in the checked-in comment that begins "Pin current Nemotron behavior..." and
references recovery/parser change behavior.

In `@lib/parsers/src/tool_calling/xml/glm47_parser.rs`:
- Around line 450-454: The comment in glm47_parser.rs references internal Linear
IDs "DIS-1842 / DIS-1832"; update that comment to remove Linear ticket
identifiers and replace them with GitHub-style issue references (e.g., "#1842"
and "#1832" or "GH-1842/GH-1832") or neutral text. Locate the comment near the
GLM 4.7 parser block mentioning `</tool_call>`, `TEST_CASES.md`, and the
`kimi_k2_parser::test_parse_malformed_no_section_end` test, then edit the text
to drop the internal IDs and use the approved GitHub-format references or a
generic description.

In `@lib/parsers/src/tool_calling/xml/parser.rs`:
- Around line 905-910: The source comment in parser.rs mentions internal Linear
ticket IDs (DIS-1842 / DIS-1832) which must be removed; edit the comment around
the XmlParserConfig::default() / qwen3_coder explanation (the block describing
absent OUTER </tool_call> fence and behavior) to delete or replace the Linear
IDs with a generic reference (e.g., "internal bug reports" or "internal
tickets") or describe the observed failure mode/test case instead, keeping the
rest of the explanatory text intact and preserving references to
XmlParserConfig::default(), qwen3_coder, and the related
test_parse_missing_*_closing_tag note.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9f92d884-6255-44be-a086-1556ae348ed8

📥 Commits

Reviewing files that changed from the base of the PR and between bd90b4e and 2ea6485.

📒 Files selected for processing (5)
  • lib/parsers/src/tool_calling/harmony/harmony_parser.rs
  • lib/parsers/src/tool_calling/json/mod.rs
  • lib/parsers/src/tool_calling/xml/glm47_parser.rs
  • lib/parsers/src/tool_calling/xml/kimi_k2_parser.rs
  • lib/parsers/src/tool_calling/xml/parser.rs

Comment thread lib/parsers/src/tool_calling/harmony/harmony_parser.rs
Comment thread lib/parsers/src/tool_calling/json/mod.rs
Comment thread lib/parsers/src/tool_calling/xml/glm47_parser.rs
Comment thread lib/parsers/src/tool_calling/xml/parser.rs
@keivenchang
Copy link
Copy Markdown
Contributor Author

@coderabbitai please see if comments are addressed. If so, Resolve Comment.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# List all review comments made by coderabbitai on PR `#8846`
gh pr view 8846 --repo ai-dynamo/dynamo --json reviews,comments

# Also get review comments (inline)
gh api repos/ai-dynamo/dynamo/pulls/8846/comments --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, body: .body, path: .path, line: .line, position: .position}'

Length of output: 18795


🏁 Script executed:

#!/bin/bash
# Get all review comments (inline) on PR `#8846`
gh api repos/ai-dynamo/dynamo/pulls/8846/comments --jq '.[] | {id: .id, user: .user.login, body: .body, path: .path, line: .line, created_at: .created_at}' 

echo "---THREADS---"
# Get review threads to check resolution status
gh api graphql -f query='
{
  repository(owner: "ai-dynamo", name: "dynamo") {
    pullRequest(number: 8846) {
      reviewThreads(first: 20) {
        nodes {
          id
          isResolved
          isOutdated
          comments(first: 5) {
            nodes {
              author { login }
              body
              path
              line
              createdAt
            }
          }
        }
      }
    }
  }
}'

Length of output: 82683


3 of the 4 threads are already resolved. Let me resolve the remaining one for harmony_parser.rs:


🏁 Script executed:

#!/bin/bash
# Resolve the remaining unresolved thread for harmony_parser.rs
gh api graphql -f query='
mutation {
  resolveReviewThread(input: {threadId: "PRRT_kwDOOCjvss5-iziu"}) {
    thread {
      id
      isResolved
    }
  }
}'

Length of output: 92


@keivenchang All 4 review comments have been verified as addressed (internal Linear ticket IDs stripped in commit e77b713) and all threads are now resolved ✅:

File Thread Status
harmony/harmony_parser.rs (Line 279) Remove DIS-1842/DIS-1832 refs Resolved ✅
json/mod.rs (Line 157) Remove DIS-1842/DIS-1832 refs Resolved ✅
xml/glm47_parser.rs (Line 454) Remove DIS-1842/DIS-1832 refs Resolved ✅
xml/parser.rs (Line 910) Remove DIS-1842/DIS-1832 refs Resolved ✅

🧠 Learnings used
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 0
File: :0-0
Timestamp: 2026-04-25T01:38:07.316Z
Learning: User keivenchang (repo: ai-dynamo/dynamo) prefers that CodeRabbit automatically resolves its own review comments as soon as the concern has been addressed, without requiring an explicit "resolve" request each time.

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3035
File: lib/runtime/src/metrics/prometheus_names.rs:49-53
Timestamp: 2025-09-16T00:26:37.092Z
Learning: keivenchang prefers consistency in metric naming standardization over strict adherence to Prometheus conventions about gauge vs counter suffixes. When standardizing metrics naming, prioritize consistency across the codebase rather than technical pedantry about individual metric type conventions.

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3051
File: container/templates/Dockerfile.trtllm.j2:424-437
Timestamp: 2025-09-16T17:16:03.785Z
Learning: keivenchang prioritizes maintaining exact backward compatibility during migration/refactoring PRs, even when bugs are identified in the original code. Fixes should be deferred to separate PRs after the migration is complete.

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 0
File: :0-0
Timestamp: 2026-04-25T01:36:01.212Z
Learning: In `lib/parsers/src/tool_calling/dsml/parser.rs`, inline/parameter regex caching for DSML parsing (analogous to `get_dsml_regexes` via `OnceLock + RwLock` keyed on 6 config strings that already exists for block/outer regexes) is intentionally deferred to a follow-up PR. It is a valid performance concern but not a correctness issue, and was kept out of the DeepSeek V4 hardening PR (`#8670`) to avoid scope creep.

…op-N parsers

DIS-1842 / DIS-1832 — add per-parser unit tests that document the
current customer-facing failure modes when models hit max_tokens or
emit EOS mid-call. No parser code changes in this PR; assertions pin
observed behavior so a future parser change has to update the test
intentionally. Recovery (matching Kimi K2 post-DIS-1765) is a
follow-up parser PR.

Tests added (10 new + 1 label correction):

CASE.5 — missing outer end-token (silent drop):
  - glm47_parser: test_parse_no_end_tag_complete_args_silent_drop
  - glm47_parser: test_parse_no_end_tag_multiple_calls_silent_drop
  - xml/parser:   test_parse_qwen3_no_outer_close_silent_drop
  - xml/parser:   test_parse_minimax_m2_no_outer_close_silent_drop
  - json/mod:     test_parse_nemotron_deci_no_outer_close_silent_drop
  - harmony_parser: relabeled test_parse_tool_calls_harmony_without_call_token
                    to also carry the CASE.5 marker (it was already
                    exercising missing-call-token recovery).

CASE.2 — multiple calls:
  - json/mod:       test_parse_nemotron_deci_multiple_calls
                    (Nemotron parser already handles this correctly;
                    test makes the per-parser surface explicit.)
  - harmony_parser: test_parse_harmony_multiple_calls_silent_drop
                    (gpt-oss drops both calls when two commentary
                    blocks appear back-to-back.)

CASE.4 — truncated JSON args:
  - kimi_k2_parser: test_parse_truncated_json_inside_complete_fences_silent_drop
                    (Kimi recovers the missing-fence case but drops
                    truncated payload inside complete fences.)
  - json/mod:       test_parse_nemotron_deci_truncated_json_silent_drop
  - harmony_parser: test_parse_harmony_truncated_json_silent_drop

Customer symptom for the silent-drop cases: HTTP 200 with empty
tool_calls and no error — no signal that work was lost.

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Address CodeRabbit feedback. The new test comments referenced
DIS-1842 / DIS-1832; per repo policy internal Linear IDs do not
belong in checked-in source. Comments still convey the intent (this
pins broken behavior, follow-up parser change flips the assertion)
without naming the tracker.

No test logic changes.

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
@keivenchang keivenchang force-pushed the keivenchang/DIS-1842__case5-top6-coverage branch from e77b713 to c765450 Compare April 29, 2026 21:22
@keivenchang keivenchang changed the title test(parsers): pin missing-end-token + truncation behavior across top-N parsers test(parsers): harden top-7 parsers with 5 more corner-case tests Apr 30, 2026
Copy link
Copy Markdown
Contributor

@ayushag-nv ayushag-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really Great. Thanks for working on this

@ayushag-nv ayushag-nv merged commit 4fac489 into main Apr 30, 2026
158 of 161 checks passed
@ayushag-nv ayushag-nv deleted the keivenchang/DIS-1842__case5-top6-coverage branch April 30, 2026 02:08
keivenchang added a commit that referenced this pull request May 1, 2026
…s top-7 parsers

Part 2 of the work tracked by #8846. That PR added 9 _silent_drop tests
pinning scenarios where parsers saw a real tool call but dropped it because
max_tokens / EOS truncated the closing fence or cut JSON mid-value. This
PR makes those 9 cases recover instead. The _silent_drop tests are flipped
to _recovers and assert the extracted call.

Cells flipped from x to check in the #8846 coverage chart:
  CASE.2 gpt-oss; CASE.4 Kimi K2 / Nemotron / gpt-oss; CASE.5 GLM /
  MiniMax / Qwen / Nemotron.

- base_json_parser: EOF-as-end-token recovery + try_repair_truncated_json
  helper for unclosed strings/braces.
- glm47_parser: recover when </tool_call> missing, gated on <arg_key>.
- xml/parser.rs (qwen3_coder, minimax_m2): same recovery, gated on
  function_start_token.
- kimi_k2_parser: relax args regex to .*? so truncated JSON hits the
  existing raw-string fallback.
- harmony_parser: regex fallback when openai_harmony tokenizer rejects
  parallel commentary blocks or truncated args.

Validation: cargo test -p dynamo-parsers --lib (409 passed, 0 failed),
cargo clippy --tests -- -D warnings (clean).

Relates to DIS-1842

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
furionw pushed a commit that referenced this pull request May 2, 2026
)

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
zhongdaor-nv added a commit that referenced this pull request May 8, 2026
… family

Pythonic was the only top-N family untouched by the recent coverage PRs
(#8888, #8946, #8846, #9035, #8852). Add three small batch-mode tests
that mirror the top-N quartet pattern landed for harmony/glm47/qwen3/etc.
in #9035, plus the parameterless-call shape from vLLM's pythonic test
file.

- `test_parse_tool_call_parse_pythonic_empty_args` (PARSER.batch.6) —
  `[get_weather()]` returns one call with `arguments={}`. Mirrors
  vLLM `test_pythonic_tool_parser.py::test_tool_call[parameterless_*]`.
- `test_parse_pythonic_empty_and_whitespace_inputs` (PARSER.batch.9) —
  empty / whitespace-only inputs return 0 calls and empty content
  without panicking. Mirrors the #9035 quartet contract.
- `test_parse_pythonic_duplicate_calls_same_name` (PARSER.batch.10) —
  two `get_weather` calls in one list surface with distinct ids
  (`call-1`, `call-2`). Pins the auto-generated-id contract for the
  duplicate-name case.

`cargo test -p dynamo-parsers --lib tool_calling::pythonic` — 19/19 pass.

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants