docs(parsers): add DIS-1926 vLLM tool-parser test audit by zhongdaor-nv · Pull Request #9329 · ai-dynamo/dynamo

zhongdaor-nv · 2026-05-08T18:48:37Z

Overview

Companion artifact to #9290 (already merged — refined PARSER_CASES.md taxonomy from this audit). This PR adds the full per-test bidirectional audit that informed every change in #9290.

Marked draft because reviewers may want to argue per-row bucket assignments; the audit is a working reference doc rather than a stable contract.

What's in this PR

lib/parsers/VLLM_TEST_AUDIT.md (new file, 906 lines, 493 distinct test rows):

Source: vLLM main at b53c507bc91f87e28b03e9b54bbff7c76e97d58b (vllm/tool_parsers/*, tests/tool_parsers/*, tests/tool_use/*, tests/entrypoints/openai/tool_parsers/*).
Scope: 421 explicit test functions + 72 inherited common-suite rows from ToolParserTests.
Bucketing: every row carries one or more PARSER_CASES.md / REASONING_CASES.md / PIPELINE_CASES.md / FRONTEND_CASES.md tags, plus a clickable GitHub source link, plus a one-line behavioral note.

Re-bucketing summary (post PR #9127 taxonomy)

The audit was originally written against the old CASE.* labels in lib/parsers/TEST_CASES.md (deleted by #9127). Mechanical renames + per-row classification done:

244 streaming rows split per-row into PARSER.stream.{1,2,3,4} (single-call assembly / multi-call assembly / partial-token chunking / streaming termination)
26 fmt rows split per-row into PARSER.fmt.1 (function-name surface) vs PARSER.fmt.5 (argument-envelope shape: native ID, JSON field-order, arguments ↔ parameters alias)
Out-of-PARSER-scope buckets relocated: CASE.{11,18,25} → FRONTEND.{1,3,5,6}; CASE.12 → PIPELINE.finish_reason; CASE.{9,10,17} → REASONING.batch.{1,2}; CASE.20 → // helper; CASE.16 → inline-regression annotation; CASE.26 dissolved into PARSER.batch.4 impl-defined recovery contract

Mis-bucket fixes caught by review

FunctionGemma::test_multiple_tool_calls and Gemma4::TestExtractToolCalls.test_multiple_tool_calls were both labeled CASE.1 but assert len(tool_calls) == 2 — corrected to PARSER.batch.2.

Bucket-assignment refinements caught by review

test_unique_tool_call_ids (DSv3.2): drops fmt.5 (no native call-ID surface)
test_invalid_funcall_id_skipped (Kimi K2): fmt.5 → fmt.1 (validation, not preservation)
3 Mistral argument_before_name* parametrized rows: added missing fmt.5 tag

A staleness banner at the top documents the re-bucketing transformation and mis-bucket fixes for traceability.

Top findings (already in PR #9290 or flagged for follow-up)

Mistral v11+ wire format — STILL OPEN (parser doesn't exist; flagged in PARSER_CASES.md "Known production gaps")
PARSER.stream.{1..4} parser-tier in Kimi K2 / Qwen3 / Hermes / Pythonic / Mistral — partial closure via DSv4 (test(parsers): DIS-1842 — DSv4 + Kimi K2 unit-test coverage gaps #8946) + Gemma 4 (chore(frontend): Add Gemma 4 parser support + Test Cases #8852); 5 families remain
FRONTEND.3 (adjust_request) — CLOSED for 7 families via 28 new tests in lib/llm/tests/tool_choice.rs (test(parsers): DIS-1842 — DSv4 + Kimi K2 unit-test coverage gaps #8946 + test(parsers): Top-N models to have extra CASE.6+ coverage (case3) #9035)

Test plan

cargo check -p dynamo-parsers --tests passes (docs-only; new file)
All 493 test rows carry at least one new-taxonomy bucket
Bucket-summary tally adds up; "Total (distinct test rows) = 493" matches header
No leftover old CASE.* labels outside the staleness banner / "Old label" column / mis-bucket annotation notes
Internal review caught 4 bucket-assignment issues + 2 mis-bucketings; all fixed

Out of scope / follow-ups

Per-family ticket spawning: the 5 PARSER.stream.* gaps and the Mistral v11 parser are separate work items
DIS-1906 (cross-impl parser parity harness) can use this audit as fixture-row source

Closes the audit half of DIS-1926 (the taxonomy half landed in #9290).

github-actions · 2026-05-08T20:25:30Z

🌿 Fern Docs Preview: https://nvidia-preview-f6c88b3d-9658-488b-9af2-dc83b9126bca.docs.buildwithfern.com/dynamo/dev

Companion artifact to PR #9290 (PARSER_CASES.md taxonomy refinement). Adds the full per-test bidirectional audit that informed every change in that PR — every vLLM tool-parser test mapped onto the new (PR #9127) taxonomy with a clickable source link. `lib/parsers/VLLM_TEST_AUDIT.md` (new file, 906 lines, 493 distinct test rows): - **Source**: vLLM `main` at commit b53c507bc91f87e28b03e9b54bbff7c76e97d58b (`vllm/tool_parsers/*`, `tests/tool_parsers/*`, `tests/tool_use/*`, `tests/entrypoints/openai/tool_parsers/*`). - **Scope**: 421 explicit test functions + 72 inherited common-suite rows from `ToolParserTests`. - **Bucketing**: every row carries one or more `PARSER_CASES.md` / `REASONING_CASES.md` / `PIPELINE_CASES.md` / `FRONTEND_CASES.md` tags, plus a one-line behavioral note. Re-bucketing transformations applied (vs the original CASE.* labels the audit was first written against, before PR #9127): - 244 streaming rows split per-row into PARSER.stream.{1,2,3,4} (single-call assembly / multi-call assembly / partial-token chunking / streaming termination) - 26 fmt rows split per-row into PARSER.fmt.1 (function-name) vs PARSER.fmt.5 (argument-shape: native ID, JSON field-order, arguments↔parameters alias) - Out-of-PARSER-scope buckets relocated to sibling docs: CASE.{11,18,25} → FRONTEND.{1,3,5,6}; CASE.12 → PIPELINE.finish_reason; CASE.{9,10,17} → REASONING.batch.{1,2}; CASE.20 → `// helper`; CASE.16 → inline-regression annotation; CASE.26 dissolved into PARSER.batch.4 impl-defined recovery contract Two mis-bucketings caught and fixed during review: - FunctionGemma::test_multiple_tool_calls and Gemma4::TestExtractToolCalls.test_multiple_tool_calls were both labeled CASE.1 but assert len(tool_calls) == 2 — corrected to PARSER.batch.2. Four bucket-assignment refinements caught by review: - test_unique_tool_call_ids (DSv3.2) drops fmt.5 (no native call-ID surface; just parallel-call distinctness). - test_invalid_funcall_id_skipped (Kimi K2) moves fmt.5 → fmt.1 (validation, not preservation). - 3 Mistral `argument_before_name*` parametrized rows gain fmt.5 (canonical field-order swap test set referenced by PARSER_CASES.md). A staleness banner at the top documents the re-bucketing transformation and mis-bucket fixes for traceability. Top findings the audit informed (already addressed in PR #9290 or flagged for follow-up): 1. Mistral v11+ wire format — STILL OPEN (parser doesn't exist; flagged in PARSER_CASES.md "Known production gaps"). 2. PARSER.stream.{1..4} parser-tier coverage gap in 5 families (Kimi K2 / Qwen3 / Hermes / Pythonic / Mistral) — partial closure via DSv4 (#8946) and Gemma 4 (#8852). 3. CASE.25 / FRONTEND.3 (`adjust_request`) — CLOSED for 7 families via 28 new tests in `lib/llm/tests/tool_choice.rs` (#8946 + #9035). Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

…er tags - Split PARSER.batch.8 into .a/.b/.c/.d sub-buckets per narration position (before / after / sandwich / between-multi); 43 rows updated. - Helper-tag dedup per PARSER_CASES.md:35-38: rows previously double-tagged as PARSER.<bucket>.<n> + // helper now carry // helper only. 35 rows updated; PARSER.batch.7 86 -> 58. - Drop "Old label" column and staleness banner from Bucket Summary (taxonomy migration is settled). Adds 8.a-d, PARSER.harmony.2, and the sibling-doc dissolved row. Count refinements: stream.1 178 -> 177, stream.3 63 -> 58, fmt.2 9 -> 8, REASONING.batch.2 18 -> 36. Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

pull-request-size Bot added the size/XL label May 8, 2026

github-actions Bot added docs documentation Improvements or additions to documentation labels May 8, 2026

copy-pr-bot Bot temporarily deployed to GITLAB May 8, 2026 21:52 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 8, 2026 23:18 Inactive

zhongdaor-nv and others added 3 commits May 15, 2026 09:34

Move parser audits into tool_calling docs

fd87db4

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

keivenchang force-pushed the zhongdaor/dis-1926-vllm-test-audit-doc branch from f10e682 to fd87db4 Compare May 15, 2026 16:39

pull-request-size Bot added size/XXL and removed size/XL labels May 15, 2026

copy-pr-bot Bot temporarily deployed to GITLAB May 15, 2026 16:40 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 15, 2026 16:47 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(parsers): add DIS-1926 vLLM tool-parser test audit#9329

docs(parsers): add DIS-1926 vLLM tool-parser test audit#9329
zhongdaor-nv wants to merge 3 commits into
mainfrom
zhongdaor/dis-1926-vllm-test-audit-doc

zhongdaor-nv commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhongdaor-nv commented May 8, 2026

Overview

What's in this PR

Re-bucketing summary (post PR #9127 taxonomy)

Mis-bucket fixes caught by review

Bucket-assignment refinements caught by review

Top findings (already in PR #9290 or flagged for follow-up)

Test plan

Out of scope / follow-ups

Uh oh!

github-actions Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 8, 2026 •

edited

Loading