Skip to content

chore(parsers): Mapping vllm parser tests to new PARSER_CASES.md taxonomy#9290

Merged
zhongdaor-nv merged 1 commit into
mainfrom
zhongdaor/dis-1926-research-vllm-parser-test-coverage-gaps
May 8, 2026
Merged

chore(parsers): Mapping vllm parser tests to new PARSER_CASES.md taxonomy#9290
zhongdaor-nv merged 1 commit into
mainfrom
zhongdaor/dis-1926-research-vllm-parser-test-coverage-gaps

Conversation

@zhongdaor-nv
Copy link
Copy Markdown
Contributor

@zhongdaor-nv zhongdaor-nv commented May 7, 2026

Overview

Output of DIS-1926 — bidirectional diff between vLLM and Dynamo tool-parser test corpora, mapped onto the new PARSER_CASES.md taxonomy (PR #9127). Doc-only changes; no source touched.

Unblocks DIS-1906 (cross-impl parser parity harness) by giving it a stable, accurate label set with per-test bucket assignments.

What's in this PR

lib/parsers/PARSER_CASES.md (+96 / −12)

Taxonomy refinements driven by gaps the audit surfaced:

  • Split PARSER.fmt.1PARSER.fmt.5. Old CASE.21 (and an earlier draft of PARSER.fmt.1) conflated function-name surface concerns with argument-envelope shape concerns. Now:
    • PARSER.fmt.1 — function-name surface only (allowed identifier chars, functions.NAME vs bare NAME, malformed-ID rejection).
    • PARSER.fmt.5 — argument-envelope shape: native call-ID preservation (Kimi K2 PR #32768), JSON field-order tolerance ({name, arguments} vs {arguments, name}), argumentsparameters key alias.
  • Broaden PARSER.fmt.3 examples — beyond Kimi K2 singular vs plural section tokens, document Mistral pre-v11 vs v11+ wire format, Llama 3 with vs without <|python_tag|>, Hermes qwen25 registry alias.
  • New Known production gaps section — flags Mistral v11+ wire format ([TOOL_CALLS]name{...args} name-then-object) as a parser-implementation gap. Dynamo's ToolCallConfig::mistral() currently only handles pre-v11 (JSON-array
    body); vLLM tests v11 extensively. v11 is the current Mistral-Small / Mistral-Large production path.
  • Promote regex-timeout / failure containment to Universal Gaps — vLLM has explicit test_regex_timeout_handling for llama3_json / llama4_pythonic / pythonic and *_streaming_exception_returns_none for Mistral; Dynamo relies on
    Rust regex linear-time guarantees but does not pin failure-containment paths.
  • Cross-ref PARSER.batch.1 happy-path → PARSER.fmt.5 native-ID sub-axis.

Summary by CodeRabbit

  • Documentation
    • Updated internal parser implementation guidelines and test coverage documentation to clarify format-conditional variants and argument-envelope shape conventions across different model formats.

@github-actions github-actions Bot added chore documentation Improvements or additions to documentation labels May 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

@zhongdaor-nv
Copy link
Copy Markdown
Contributor Author

Note: lib/parsers/VLLM_TEST_AUDIT.md is only for reviewers to check correctness. I will remove this file before merging.

@zhongdaor-nv zhongdaor-nv marked this pull request as ready for review May 7, 2026 23:37
@zhongdaor-nv zhongdaor-nv changed the title chore(parsers): DIS-1926 — bidirectional vLLM↔Dynamo audit + taxonomy… chore(parsers): Mapping vllm parser tests to new PARSER_CASES.md taxonomy May 7, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack

Walkthrough

This PR expands the tool-call parser taxonomy documentation by introducing PARSER.fmt.5 for argument-envelope shape conventions, detailed wire-format variant coverage in PARSER.fmt.3, clarifies PARSER.fmt.1 scope, and updates all cross-references and applicability checklists to reflect the new taxonomy dimension.

Changes

Parser Taxonomy: Argument-Shape & Wire-Format

Layer / File(s) Summary
Introduce PARSER.fmt.5 & Universal Gaps
lib/parsers/PARSER_CASES.md
First mention of PARSER.fmt.5 argument-envelope conventions; expands universal gaps section with regex-timeout/exception guidance and Mistral v11+ production gap note.
Define PARSER.fmt.5 Argument-Shape Conventions
lib/parsers/PARSER_CASES.md
Full PARSER.fmt.5 section defining three argument-envelope sub-axes: native call-ID preservation, JSON field-order tolerance (including arguments key named name edge case), and argument-key aliasing (arguments vs parameters), with references to named parametrized tests.
Expand PARSER.fmt.3 Wire-Format Variants
lib/parsers/PARSER_CASES.md
Detailed PARSER.fmt.3 documentation enumerating multiple wire-format spellings across Kimi K2, Mistral (pre-v11 vs v11+), Llama 3 (python_tag fence presence), and Hermes (qwen25 alias), emphasizing active-config registration constraints.
Clarify Format-Related Scope & Batch.1 Reference
lib/parsers/PARSER_CASES.md
Clarifies that PARSER.fmt.1 covers only function-name surface and excludes argument-envelope shape (covered by PARSER.fmt.5); updates PARSER.batch.1 happy-path requirements to assert native ToolCall.id preservation with PARSER.fmt.5 cross-reference.
Update Applicability Summary & New-Parser Checklist
lib/parsers/PARSER_CASES.md
Updates applicability summary table and new-parser-checklist to include PARSER.fmt.{1..5} coverage; explicitly calls out PARSER.fmt.5 requirement for JSON-family parsers.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title mentions 'vllm parser tests' and 'taxonomy', but the actual changes are documentation-only updates to PARSER_CASES.md taxonomy definitions themselves, not mapping of vLLM tests to the taxonomy. Revise the title to reflect that this is a documentation update to the PARSER_CASES.md taxonomy (e.g., 'docs(parsers): Update PARSER_CASES taxonomy with fmt.5 and audit findings' or similar).
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The description includes all required template sections: Overview (with linked issue context), detailed breakdown of What's in this PR with specific taxonomy changes, and implicit related-issue reference through DIS-1926/PR #9127.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
lib/parsers/PARSER_CASES.md (1)

121-126: ⚡ Quick win

Add direct GitHub links for cited PR references.

You reference incidents as PR #...; please also include clickable GitHub URLs for those references to improve audit traceability in docs.

As per coding guidelines "**/*.md: Markdown documentation may reference Linear tickets for internal context, but should prefer to also include the matching GitHub link when one exists`."

Also applies to: 329-333, 429-432

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/parsers/PARSER_CASES.md` around lines 121 - 126, Update the Markdown
references that cite PR numbers to include direct clickable GitHub URLs: replace
occurrences like "PR `#32768`" (and the other cited ranges around lines 329-333
and 429-432) with the full GitHub PR link for the corresponding repository/PR,
keeping the existing text (e.g., "vLLM Kimi K2 PR `#32768`") but appending or
replacing with "([vLLM#32768](https://github.com/vllm-org/vllm/pull/32768))" or
the correct repo/PR URL; ensure the sentences mentioning PARSER.fmt.5, Kimi K2,
and any other PR references consistently include the clickable link to satisfy
the Markdown guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lib/parsers/PARSER_CASES.md`:
- Around line 121-126: Update the Markdown references that cite PR numbers to
include direct clickable GitHub URLs: replace occurrences like "PR `#32768`" (and
the other cited ranges around lines 329-333 and 429-432) with the full GitHub PR
link for the corresponding repository/PR, keeping the existing text (e.g., "vLLM
Kimi K2 PR `#32768`") but appending or replacing with
"([vLLM#32768](https://github.com/vllm-org/vllm/pull/32768))" or the correct
repo/PR URL; ensure the sentences mentioning PARSER.fmt.5, Kimi K2, and any
other PR references consistently include the clickable link to satisfy the
Markdown guideline.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 75224add-ba0e-4dc8-a6ae-d509f3be65f2

📥 Commits

Reviewing files that changed from the base of the PR and between 73bc969 and e03a365.

📒 Files selected for processing (2)
  • lib/parsers/PARSER_CASES.md
  • lib/parsers/VLLM_TEST_AUDIT.md

@zhongdaor-nv zhongdaor-nv requested a review from a team as a code owner May 8, 2026 00:21
@zhongdaor-nv zhongdaor-nv force-pushed the zhongdaor/dis-1926-research-vllm-parser-test-coverage-gaps branch from 55fb950 to e03a365 Compare May 8, 2026 00:34
Copy link
Copy Markdown
Contributor

@ayushag-nv ayushag-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove VLLM_TEST_AUDIT.md. LGTM.

Copy link
Copy Markdown
Contributor

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks so much for the research/comparisons! Our next step would be to look at the differences and fill in the gap (if/when possible).

…audit

Output of DIS-1926 (research vLLM parser test coverage gaps). Doc-only
change to `lib/parsers/PARSER_CASES.md`; no source touched. `cargo check
-p dynamo-parsers --tests` passes.

Refinements driven by gaps surfaced during a bidirectional diff against
vLLM `tests/tool_parsers/*` at commit b53c507bc91f87e28b03e9b54bbff7c76e97d58b:

- Split PARSER.fmt.1 (function-name surface) from new PARSER.fmt.5
  (argument-envelope shape: native call-ID preservation, JSON field-order
  tolerance, arguments↔parameters key alias). The old CASE.21 (and an
  earlier draft of PARSER.fmt.1) conflated both axes.
- Broaden PARSER.fmt.3 examples beyond Kimi K2's singular vs plural
  section tokens to include Mistral pre-v11 vs v11+ wire formats, Llama 3
  with vs without `<|python_tag|>`, Hermes `qwen25` registry alias.
- Add `Known production gaps` section flagging Mistral v11+ wire format
  (`[TOOL_CALLS]name{...args}` name-then-object) — Dynamo's
  `ToolCallConfig::mistral()` only handles pre-v11 (JSON-array body),
  while vLLM tests v11 extensively. v11 is the current Mistral-Small /
  Mistral-Large production path. Largest single Dynamo parser gap
  surfaced by the audit.
- Promote regex-timeout / parser-exception containment to Universal Gaps
  (vLLM has explicit `test_regex_timeout_handling` for llama3_json /
  llama4_pythonic / pythonic and `*_streaming_exception_returns_none` for
  Mistral; Dynamo relies on Rust regex linear-time guarantees but does
  not pin failure-containment paths).
- Cross-ref PARSER.batch.1 happy-path → PARSER.fmt.5 native-ID sub-axis.
- Update Applicability summary and `Adding a new parser` minimum viable
  set to cover fmt.{1..5}.

The full per-test bidirectional audit (493 test rows across 36 parser
families, mapped onto the new taxonomy) lives outside this commit. It
informed every refinement above; the audit itself is not committed
because it's a working artifact rather than a stable reference doc.

Top-3 P0 gap status from the audit:

1. Mistral v11 wire format — STILL OPEN (parser doesn't exist; flagged in
   the new `Known production gaps` section).
2. PARSER.stream.{1..4} parser-tier — partial; DSv4 (#8946) and Gemma 4
   (#8852) added coverage; Kimi K2 / Qwen3 / Hermes / Pythonic / Mistral
   parser-tier streaming tests still gap.
3. CASE.25 / FRONTEND.3 (`adjust_request`) — CLOSED for 7 families via 28
   new tests in `lib/llm/tests/tool_choice.rs` (#8946 + #9035).

Coverage PRs since 2026-05-05 baseline: #8888 (silent-drop recoveries),
#8946 (DSv4 + Kimi K2 coverage), #9035 (top-N CASE.6+ quartet), #8852
(Gemma 4 family), #9127 (taxonomy rename).

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
@zhongdaor-nv zhongdaor-nv force-pushed the zhongdaor/dis-1926-research-vllm-parser-test-coverage-gaps branch from e03a365 to 5e06371 Compare May 8, 2026 17:20
@pull-request-size pull-request-size Bot added size/M and removed size/XL labels May 8, 2026
Comment thread lib/parsers/PARSER_CASES.md
@zhongdaor-nv zhongdaor-nv merged commit 3d451f2 into main May 8, 2026
56 checks passed
@zhongdaor-nv zhongdaor-nv deleted the zhongdaor/dis-1926-research-vllm-parser-test-coverage-gaps branch May 8, 2026 18:18
keivenchang pushed a commit that referenced this pull request May 15, 2026
Companion artifact to PR #9290 (PARSER_CASES.md taxonomy refinement).
Adds the full per-test bidirectional audit that informed every change in
that PR — every vLLM tool-parser test mapped onto the new (PR #9127)
taxonomy with a clickable source link.

`lib/parsers/VLLM_TEST_AUDIT.md` (new file, 906 lines, 493 distinct
test rows):

- **Source**: vLLM `main` at commit b53c507bc91f87e28b03e9b54bbff7c76e97d58b
  (`vllm/tool_parsers/*`, `tests/tool_parsers/*`, `tests/tool_use/*`,
  `tests/entrypoints/openai/tool_parsers/*`).
- **Scope**: 421 explicit test functions + 72 inherited common-suite
  rows from `ToolParserTests`.
- **Bucketing**: every row carries one or more `PARSER_CASES.md` /
  `REASONING_CASES.md` / `PIPELINE_CASES.md` / `FRONTEND_CASES.md`
  tags, plus a one-line behavioral note.

Re-bucketing transformations applied (vs the original CASE.* labels
the audit was first written against, before PR #9127):

- 244 streaming rows split per-row into PARSER.stream.{1,2,3,4}
  (single-call assembly / multi-call assembly / partial-token
  chunking / streaming termination)
- 26 fmt rows split per-row into PARSER.fmt.1 (function-name) vs
  PARSER.fmt.5 (argument-shape: native ID, JSON field-order,
  arguments↔parameters alias)
- Out-of-PARSER-scope buckets relocated to sibling docs:
  CASE.{11,18,25} → FRONTEND.{1,3,5,6}; CASE.12 →
  PIPELINE.finish_reason; CASE.{9,10,17} → REASONING.batch.{1,2};
  CASE.20 → `// helper`; CASE.16 → inline-regression annotation;
  CASE.26 dissolved into PARSER.batch.4 impl-defined recovery
  contract

Two mis-bucketings caught and fixed during review:
- FunctionGemma::test_multiple_tool_calls and
  Gemma4::TestExtractToolCalls.test_multiple_tool_calls were both
  labeled CASE.1 but assert len(tool_calls) == 2 — corrected to
  PARSER.batch.2.

Four bucket-assignment refinements caught by review:
- test_unique_tool_call_ids (DSv3.2) drops fmt.5 (no native call-ID
  surface; just parallel-call distinctness).
- test_invalid_funcall_id_skipped (Kimi K2) moves fmt.5 → fmt.1
  (validation, not preservation).
- 3 Mistral `argument_before_name*` parametrized rows gain fmt.5
  (canonical field-order swap test set referenced by PARSER_CASES.md).

A staleness banner at the top documents the re-bucketing transformation
and mis-bucket fixes for traceability.

Top findings the audit informed (already addressed in PR #9290 or
flagged for follow-up):

1. Mistral v11+ wire format — STILL OPEN (parser doesn't exist;
   flagged in PARSER_CASES.md "Known production gaps").
2. PARSER.stream.{1..4} parser-tier coverage gap in 5 families
   (Kimi K2 / Qwen3 / Hermes / Pythonic / Mistral) — partial
   closure via DSv4 (#8946) and Gemma 4 (#8852).
3. CASE.25 / FRONTEND.3 (`adjust_request`) — CLOSED for 7 families
   via 28 new tests in `lib/llm/tests/tool_choice.rs` (#8946 + #9035).

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore documentation Improvements or additions to documentation size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants