chore(parsers): Mapping vllm parser tests to new PARSER_CASES.md taxonomy#9290
Conversation
|
Note: lib/parsers/VLLM_TEST_AUDIT.md is only for reviewers to check correctness. I will remove this file before merging. |
WalkthroughThis PR expands the tool-call parser taxonomy documentation by introducing ChangesParser Taxonomy: Argument-Shape & Wire-Format
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
lib/parsers/PARSER_CASES.md (1)
121-126: ⚡ Quick winAdd direct GitHub links for cited PR references.
You reference incidents as
PR #...; please also include clickable GitHub URLs for those references to improve audit traceability in docs.As per coding guidelines "
**/*.md: Markdown documentation may reference Linear tickets for internal context, but should prefer to also include the matching GitHub link when one exists`."Also applies to: 329-333, 429-432
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/parsers/PARSER_CASES.md` around lines 121 - 126, Update the Markdown references that cite PR numbers to include direct clickable GitHub URLs: replace occurrences like "PR `#32768`" (and the other cited ranges around lines 329-333 and 429-432) with the full GitHub PR link for the corresponding repository/PR, keeping the existing text (e.g., "vLLM Kimi K2 PR `#32768`") but appending or replacing with "([vLLM#32768](https://github.com/vllm-org/vllm/pull/32768))" or the correct repo/PR URL; ensure the sentences mentioning PARSER.fmt.5, Kimi K2, and any other PR references consistently include the clickable link to satisfy the Markdown guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@lib/parsers/PARSER_CASES.md`:
- Around line 121-126: Update the Markdown references that cite PR numbers to
include direct clickable GitHub URLs: replace occurrences like "PR `#32768`" (and
the other cited ranges around lines 329-333 and 429-432) with the full GitHub PR
link for the corresponding repository/PR, keeping the existing text (e.g., "vLLM
Kimi K2 PR `#32768`") but appending or replacing with
"([vLLM#32768](https://github.com/vllm-org/vllm/pull/32768))" or the correct
repo/PR URL; ensure the sentences mentioning PARSER.fmt.5, Kimi K2, and any
other PR references consistently include the clickable link to satisfy the
Markdown guideline.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 75224add-ba0e-4dc8-a6ae-d509f3be65f2
📒 Files selected for processing (2)
lib/parsers/PARSER_CASES.mdlib/parsers/VLLM_TEST_AUDIT.md
55fb950 to
e03a365
Compare
ayushag-nv
left a comment
There was a problem hiding this comment.
Please remove VLLM_TEST_AUDIT.md. LGTM.
keivenchang
left a comment
There was a problem hiding this comment.
thanks so much for the research/comparisons! Our next step would be to look at the differences and fill in the gap (if/when possible).
…audit Output of DIS-1926 (research vLLM parser test coverage gaps). Doc-only change to `lib/parsers/PARSER_CASES.md`; no source touched. `cargo check -p dynamo-parsers --tests` passes. Refinements driven by gaps surfaced during a bidirectional diff against vLLM `tests/tool_parsers/*` at commit b53c507bc91f87e28b03e9b54bbff7c76e97d58b: - Split PARSER.fmt.1 (function-name surface) from new PARSER.fmt.5 (argument-envelope shape: native call-ID preservation, JSON field-order tolerance, arguments↔parameters key alias). The old CASE.21 (and an earlier draft of PARSER.fmt.1) conflated both axes. - Broaden PARSER.fmt.3 examples beyond Kimi K2's singular vs plural section tokens to include Mistral pre-v11 vs v11+ wire formats, Llama 3 with vs without `<|python_tag|>`, Hermes `qwen25` registry alias. - Add `Known production gaps` section flagging Mistral v11+ wire format (`[TOOL_CALLS]name{...args}` name-then-object) — Dynamo's `ToolCallConfig::mistral()` only handles pre-v11 (JSON-array body), while vLLM tests v11 extensively. v11 is the current Mistral-Small / Mistral-Large production path. Largest single Dynamo parser gap surfaced by the audit. - Promote regex-timeout / parser-exception containment to Universal Gaps (vLLM has explicit `test_regex_timeout_handling` for llama3_json / llama4_pythonic / pythonic and `*_streaming_exception_returns_none` for Mistral; Dynamo relies on Rust regex linear-time guarantees but does not pin failure-containment paths). - Cross-ref PARSER.batch.1 happy-path → PARSER.fmt.5 native-ID sub-axis. - Update Applicability summary and `Adding a new parser` minimum viable set to cover fmt.{1..5}. The full per-test bidirectional audit (493 test rows across 36 parser families, mapped onto the new taxonomy) lives outside this commit. It informed every refinement above; the audit itself is not committed because it's a working artifact rather than a stable reference doc. Top-3 P0 gap status from the audit: 1. Mistral v11 wire format — STILL OPEN (parser doesn't exist; flagged in the new `Known production gaps` section). 2. PARSER.stream.{1..4} parser-tier — partial; DSv4 (#8946) and Gemma 4 (#8852) added coverage; Kimi K2 / Qwen3 / Hermes / Pythonic / Mistral parser-tier streaming tests still gap. 3. CASE.25 / FRONTEND.3 (`adjust_request`) — CLOSED for 7 families via 28 new tests in `lib/llm/tests/tool_choice.rs` (#8946 + #9035). Coverage PRs since 2026-05-05 baseline: #8888 (silent-drop recoveries), #8946 (DSv4 + Kimi K2 coverage), #9035 (top-N CASE.6+ quartet), #8852 (Gemma 4 family), #9127 (taxonomy rename). Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
e03a365 to
5e06371
Compare
Companion artifact to PR #9290 (PARSER_CASES.md taxonomy refinement). Adds the full per-test bidirectional audit that informed every change in that PR — every vLLM tool-parser test mapped onto the new (PR #9127) taxonomy with a clickable source link. `lib/parsers/VLLM_TEST_AUDIT.md` (new file, 906 lines, 493 distinct test rows): - **Source**: vLLM `main` at commit b53c507bc91f87e28b03e9b54bbff7c76e97d58b (`vllm/tool_parsers/*`, `tests/tool_parsers/*`, `tests/tool_use/*`, `tests/entrypoints/openai/tool_parsers/*`). - **Scope**: 421 explicit test functions + 72 inherited common-suite rows from `ToolParserTests`. - **Bucketing**: every row carries one or more `PARSER_CASES.md` / `REASONING_CASES.md` / `PIPELINE_CASES.md` / `FRONTEND_CASES.md` tags, plus a one-line behavioral note. Re-bucketing transformations applied (vs the original CASE.* labels the audit was first written against, before PR #9127): - 244 streaming rows split per-row into PARSER.stream.{1,2,3,4} (single-call assembly / multi-call assembly / partial-token chunking / streaming termination) - 26 fmt rows split per-row into PARSER.fmt.1 (function-name) vs PARSER.fmt.5 (argument-shape: native ID, JSON field-order, arguments↔parameters alias) - Out-of-PARSER-scope buckets relocated to sibling docs: CASE.{11,18,25} → FRONTEND.{1,3,5,6}; CASE.12 → PIPELINE.finish_reason; CASE.{9,10,17} → REASONING.batch.{1,2}; CASE.20 → `// helper`; CASE.16 → inline-regression annotation; CASE.26 dissolved into PARSER.batch.4 impl-defined recovery contract Two mis-bucketings caught and fixed during review: - FunctionGemma::test_multiple_tool_calls and Gemma4::TestExtractToolCalls.test_multiple_tool_calls were both labeled CASE.1 but assert len(tool_calls) == 2 — corrected to PARSER.batch.2. Four bucket-assignment refinements caught by review: - test_unique_tool_call_ids (DSv3.2) drops fmt.5 (no native call-ID surface; just parallel-call distinctness). - test_invalid_funcall_id_skipped (Kimi K2) moves fmt.5 → fmt.1 (validation, not preservation). - 3 Mistral `argument_before_name*` parametrized rows gain fmt.5 (canonical field-order swap test set referenced by PARSER_CASES.md). A staleness banner at the top documents the re-bucketing transformation and mis-bucket fixes for traceability. Top findings the audit informed (already addressed in PR #9290 or flagged for follow-up): 1. Mistral v11+ wire format — STILL OPEN (parser doesn't exist; flagged in PARSER_CASES.md "Known production gaps"). 2. PARSER.stream.{1..4} parser-tier coverage gap in 5 families (Kimi K2 / Qwen3 / Hermes / Pythonic / Mistral) — partial closure via DSv4 (#8946) and Gemma 4 (#8852). 3. CASE.25 / FRONTEND.3 (`adjust_request`) — CLOSED for 7 families via 28 new tests in `lib/llm/tests/tool_choice.rs` (#8946 + #9035). Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Overview
Output of DIS-1926 — bidirectional diff between vLLM and Dynamo tool-parser test corpora, mapped onto the new
PARSER_CASES.mdtaxonomy (PR #9127). Doc-only changes; no source touched.Unblocks DIS-1906 (cross-impl parser parity harness) by giving it a stable, accurate label set with per-test bucket assignments.
What's in this PR
lib/parsers/PARSER_CASES.md(+96 / −12)Taxonomy refinements driven by gaps the audit surfaced:
PARSER.fmt.1→PARSER.fmt.5. OldCASE.21(and an earlier draft ofPARSER.fmt.1) conflated function-name surface concerns with argument-envelope shape concerns. Now:PARSER.fmt.1— function-name surface only (allowed identifier chars,functions.NAMEvs bareNAME, malformed-ID rejection).PARSER.fmt.5— argument-envelope shape: native call-ID preservation (Kimi K2 PR #32768), JSON field-order tolerance ({name, arguments}vs{arguments, name}),arguments↔parameterskey alias.PARSER.fmt.3examples — beyond Kimi K2 singular vs plural section tokens, document Mistral pre-v11 vs v11+ wire format, Llama 3 with vs without<|python_tag|>, Hermesqwen25registry alias.Known production gapssection — flags Mistral v11+ wire format ([TOOL_CALLS]name{...args}name-then-object) as a parser-implementation gap. Dynamo'sToolCallConfig::mistral()currently only handles pre-v11 (JSON-arraybody); vLLM tests v11 extensively. v11 is the current Mistral-Small / Mistral-Large production path.
test_regex_timeout_handlingforllama3_json/llama4_pythonic/pythonicand*_streaming_exception_returns_nonefor Mistral; Dynamo relies onRust regex linear-time guarantees but does not pin failure-containment paths.
PARSER.batch.1happy-path →PARSER.fmt.5native-ID sub-axis.Summary by CodeRabbit