chore(frontend): Add Gemma 4 parser support + Test Cases#8852
Conversation
WalkthroughThis PR adds comprehensive Gemma 4 model support by introducing a tool-calling parser with custom Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
lib/parsers/src/tool_calling/gemma4/parser.rs (1)
4-5: Incomplete reference link in comment.The PR reference link on line 5 contains a placeholder (
...) instead of the actual PR number. Consider either completing the link or removing it if the upstream PR hasn't been merged yet.// Reference implementations: -// https://github.com/vllm-project/vllm/pull/.../vllm/tool_parsers/gemma4_tool_parser.py +// https://github.com/vllm-project/vllm/blob/main/vllm/tool_parsers/gemma4_tool_parser.py🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/parsers/src/tool_calling/gemma4/parser.rs` around lines 4 - 5, The inline comment in parser.rs contains an incomplete PR URL ("https://github.com/vllm-project/vllm/pull/.../vllm/tool_parsers/gemma4_tool_parser.py"); update that comment by replacing the "..." with the actual PR number/path if available, or remove the placeholder link entirely (keep or replace with a note like "reference implementation in vllm/tool_parsers/gemma4_tool_parser.py" if the PR is not merged) so the reference in the header comment of lib/parsers/src/tool_calling/gemma4/parser.rs is accurate and not misleading.lib/parsers/src/reasoning/gemma4_parser.rs (1)
102-122: Verify underflow safety inresolve_prefix.Line 104 computes
accum.len() - raw_reasoning.len(). While the current call sites ensureraw_reasoningis always a suffix ofaccum(pushed before calling), this invariant isn't documented or enforced at the function boundary.Consider adding a debug assertion or a brief comment documenting this precondition:
📝 Suggested documentation
/// Case 3: accumulated reasoning diverged from the prefix — emit the /// buffered reasoning verbatim (data preservation). +/// +/// # Precondition +/// `raw_reasoning` must be a suffix of `accum` (i.e., `accum.len() >= raw_reasoning.len()`). fn resolve_prefix<'a>(accum: &'a str, raw_reasoning: &'a str) -> (&'a str, bool) { + debug_assert!(accum.len() >= raw_reasoning.len(), "raw_reasoning must be a suffix of accum"); if accum.starts_with(THOUGHT_PREFIX) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/parsers/src/reasoning/gemma4_parser.rs` around lines 102 - 122, The function resolve_prefix relies on the precondition that raw_reasoning is a suffix of accum (used when computing accum.len() - raw_reasoning.len()), so add a debug assertion and brief comment documenting this invariant: insert something like a comment above resolve_prefix mentioning "precondition: raw_reasoning must be a suffix of accum" and a debug_assert!(accum.ends_with(raw_reasoning)) at the start of resolve_prefix to make the precondition explicit and fail fast in debug builds; reference resolve_prefix and the THOUGHT_PREFIX logic when adding the assertion and comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@lib/parsers/src/reasoning/gemma4_parser.rs`:
- Around line 102-122: The function resolve_prefix relies on the precondition
that raw_reasoning is a suffix of accum (used when computing accum.len() -
raw_reasoning.len()), so add a debug assertion and brief comment documenting
this invariant: insert something like a comment above resolve_prefix mentioning
"precondition: raw_reasoning must be a suffix of accum" and a
debug_assert!(accum.ends_with(raw_reasoning)) at the start of resolve_prefix to
make the precondition explicit and fail fast in debug builds; reference
resolve_prefix and the THOUGHT_PREFIX logic when adding the assertion and
comment.
In `@lib/parsers/src/tool_calling/gemma4/parser.rs`:
- Around line 4-5: The inline comment in parser.rs contains an incomplete PR URL
("https://github.com/vllm-project/vllm/pull/.../vllm/tool_parsers/gemma4_tool_parser.py");
update that comment by replacing the "..." with the actual PR number/path if
available, or remove the placeholder link entirely (keep or replace with a note
like "reference implementation in vllm/tool_parsers/gemma4_tool_parser.py" if
the PR is not merged) so the reference in the header comment of
lib/parsers/src/tool_calling/gemma4/parser.rs is accurate and not misleading.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7704350f-d1fc-436c-8990-6ab2d110a063
📒 Files selected for processing (13)
docs/agents/reasoning.mddocs/agents/tool-calling.mdexamples/chat_templates/README.mdexamples/chat_templates/gemma4_tool.jinjalib/llm/src/preprocessor.rslib/parsers/README.mdlib/parsers/src/reasoning/gemma4_parser.rslib/parsers/src/reasoning/mod.rslib/parsers/src/tool_calling/config.rslib/parsers/src/tool_calling/gemma4/mod.rslib/parsers/src/tool_calling/gemma4/parser.rslib/parsers/src/tool_calling/mod.rslib/parsers/src/tool_calling/parsers.rs
Add tool-call and reasoning parsers for Google Gemma 4 thinking models.
Gemma 4 uses a custom non-JSON serialization grammar that does not match any
existing parser family:
<|tool_call>call:func_name{location:<|"|>Tokyo<|"|>,count:42}<tool_call|>
Bare unquoted keys, custom <|"|> string delimiter, supports nested objects
and arrays, multiple tool calls concatenated without separators.
Reasoning grammar is wrapped between channel markers with a `thought\n`
role-label prefix that this parser strips:
<|channel>thought\n...chain of thought...<channel|>
Changes:
* Tool-call parser (lib/parsers/src/tool_calling/gemma4/) — new ParserConfig
variant `Gemma4`; recursive-descent parser into `serde_json::Value`;
registered as `gemma4` (alias `gemma-4`); inline #[cfg(test)] coverage of
CASE.1..CASE.24 from TEST_CASES.md including malformed args, truncation
recovery, parallel calls, function names with hyphens/dots, and HTML/newlines
inside string values.
* Reasoning parser (lib/parsers/src/reasoning/gemma4_parser.rs) — implements
the `ReasoningParser` trait directly (mirrors `granite_parser.rs` style)
with streaming `thought\n` prefix-stripping (3-case logic: full prefix,
strict prefix buffered, diverged emitted verbatim) and partial-marker
overlap detection at chunk boundaries; registered as `gemma4`
(alias `gemma-4`).
* Vendored chat template (examples/chat_templates/gemma4_tool.jinja) —
verbatim copy of vLLM's tool_chat_template_gemma4.jinja with SPDX
attribution. Required because the upstream HF Gemma 4 chat template does
not emit the <|"|>-delimited tool-definition encoding the parser expects.
Pass via `--custom-jinja-template`.
* Docs updates (docs/agents/{tool-calling,reasoning}.md, lib/parsers/README.md)
with parser table rows and a recommended-pairing line:
`--dyn-tool-call-parser gemma4 --dyn-reasoning-parser gemma4
--custom-jinja-template examples/chat_templates/gemma4_tool.jinja`
Reference: vllm-project/vllm tool_parsers/gemma4_tool_parser.py and
reasoning/gemma4_reasoning_parser.py.
AI assistance was used to draft this change; all changed lines were
manually reviewed against the upstream vLLM reference implementation.
Signed-off-by: David Oy <david.oy@baseten.co>
Behavior-diff against upstream vLLM main turned up four real bugs in
the initial port plus one reasoning-parser gap and one optimization
gap. All fixed; tests added.
Tool-call parser (lib/parsers/src/tool_calling/gemma4/parser.rs):
* Null-keyword case-sensitivity (upstream `value_str.lower() in (...)`):
the previous hard-coded list missed `NONE`, `Nul`, etc. Replaced with
`try_consume_keyword` that does ASCII-case-insensitive matching with
a trailing-word-char guard so `nullable` doesn't get mis-matched as
the `null` keyword plus leftovers.
* Unterminated `<|"|>` string in args (upstream
`_parse_gemma4_args:146-149`): when the closing delimiter is missing,
upstream takes "everything after the opening delimiter" as the value.
We were bailing instead, which then triggered the entire-args-object
empty-fallback in `parse_single_call` — i.e. a single truncated
trailing arg silently lost ALL prior args. Now we mirror upstream and
return the truncated tail as the string value.
* Empty value `key:` followed by `,` / `}` / EOF (upstream
`_parse_gemma4_args:128-131`): upstream emits `{"key": ""}`. We were
erroring out of `parse_value` and dropping the whole args object.
Detect this case in `parse_object_body` and insert an empty-string.
* Truncated tool call (start marker, no end marker): upstream
`extract_tool_calls:421-424` returns the raw `model_output` as
content when no complete tool call could be parsed. We were
consuming the bytes silently, hiding model truncation from callers.
Echo the bytes back as `normal_text` instead.
Reasoning parser (lib/parsers/src/reasoning/gemma4_parser.rs):
* Dangling end marker (`<channel|>` present, `<|channel>` absent): this
happens when a tokenizer with `skip_special_tokens=True` strips the
start token but leaves the end token. Upstream's offline parser
still extracts text-before as reasoning. We were returning the whole
text as `normal_text` and losing the reasoning content. Added a
match arm for `(None, Some(end))` that splits on the end marker.
Preprocessor optimization (lib/llm/src/preprocessor.rs):
* `is_reasoning_disabled_by_request` now recognizes `gemma4`/`gemma-4`
with `enable_thinking: false` and short-circuits the parser, matching
the existing pattern for `kimi_k25` / `deepseek_v4` / `nemotron_nano`.
Without this, the parser still ran on every chunk just to fall
through (since no `<|channel>` markers arrive when the chat template
doesn't inject `<|think|>`).
Tests added/renamed:
* `truncated_tail_echoed_complete_prior_survives` (was
`truncated_call_dropped_complete_prior_survives`).
* `args_grammar_unterminated_string_takes_remainder` (was
`args_grammar_unterminated_string_errors`).
* `args_grammar_empty_value_yields_empty_string`,
`args_grammar_trailing_empty_value`,
`args_grammar_null_aliases_case_insensitive`,
`args_grammar_keyword_prefix_not_consumed`,
`incomplete_tool_call_echoes_raw_bytes`.
* `detect_dangling_end_marker_extracts_prefix_as_reasoning`,
`detect_dangling_end_marker_strips_thought_prefix`.
* Four new cases in `test_is_reasoning_disabled_by_request` covering
`gemma4` and `gemma-4` with `enable_thinking: false` / `true` / no args.
Behavior intentionally diverges from upstream in one place: Rust's
between-call text join (`Sure thing. <call> All set.` → preserves
`"Sure thing. All set."`) follows the established kimi_k2 Dynamo
pattern rather than upstream Gemma 4's prefix-only behavior. This
keeps the parser registry internally consistent.
AI assistance was used to draft this change; all changed lines were
manually reviewed against the upstream vLLM reference implementation.
Signed-off-by: David Oy <david.oy@baseten.co>
* parser.rs:4 — replace placeholder PR URL (`/pull/.../...`) with the resolved upstream link (`/blob/main/...`) and a note that vLLM PR #38826 / #39027 are merged. Reviewers can now click through. * gemma4_parser.rs::resolve_prefix — make the "raw_reasoning is a suffix of accum" precondition explicit with a doc-comment block and a `debug_assert!` that fails fast in debug builds. Documents the invariant that the streaming driver upholds (it always pushes the raw delta into self.reasoning_accum immediately before calling resolve_prefix). No behavior change in release builds; only docstring clarity and a debug-mode assertion. Tests still 44/44. Signed-off-by: David Oy <david.oy@baseten.co>
c65e5da to
f9c7a6d
Compare
|
Pushed
Pre-commit job failureThe
This is due to:
So three test files now block every external-contributor PR rebased onto main:
|
e1954c6 to
f9c7a6d
Compare
|
Thanks @the-david-oy ! CC @richardhuo-nv @keivenchang to help review |
|
/ok to test f9c7a6d |
keivenchang
left a comment
There was a problem hiding this comment.
FYI I added comments on test-case categories (and pushed it). No logic change.
Augment the 12 previously-unannotated tests across the gemma4 tool-call and reasoning parsers with their corresponding CASE.* numbers from lib/parsers/TEST_CASES.md, matching the convention already used by the other tests in this PR and across the rest of the parser suite. Comment-only — no functional changes; cargo test output unchanged. Mapping summary: - args_grammar_*: CASE.4/5/6/7 (grammar entry-point + truncation paths) - incomplete_tool_call_echoes_raw_bytes: CASE.5 (missing end marker) - detect_dangling_end_marker_*: CASE.5, CASE.13 (missing-start-marker recovery) - streaming_multiple_reasoning_spans: CASE.2 (multiple spans) Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
keivenchang
left a comment
There was a problem hiding this comment.
two notes — both pointers, no inline comments needed.
1. Pre-commit hook unrelated to this PR. pytest-marker-report currently fails on this branch with 49 missing-marker tests across test_audio_protocol.py, test_video_protocol.py, test_video_utils.py. those 49 are pre-existing on the branch — main has 0 missing. fix already merged on main as cc1af6a90 (#8849). merging main / rebasing should clear the hook.
2. CASE. coverage gaps to close (per lib/parsers/TEST_CASES.md).*
- To add: CASE.9 (paired reasoning + tool in same response), CASE.14 (empty
tool_calls/ null at API edge), CASE.15 (duplicate tool calls — universal gap, no parser covers this today). - Framework-level (cross-parser suites, not in this PR's scope): CASE.11 (tool_choice modes), CASE.12 (finish_reason).
- N/A (worth a one-line
// CASE.X — N/Aper the doc's stated convention): CASE.23 (markers fixed, no variants), CASE.xml1/2 (not XML-family), CASE.harmony1 (not Harmony).
Two small changes from @keivenchang on PR ai-dynamo#8852: * `try_tool_call_parse_gemma4`: add a one-line note explaining that the first `<tool_call|>` after the start marker wins and that we don't scan inside `<|"|>` string values for embedded markers — so a model emitting a tool-call literal inside a string argument will truncate the call. Matches upstream vLLM regex behavior. * `is_reasoning_disabled_by_request` `gemma4` arm: switch from a hand-rolled `enable_thinking` check to the shared `thinking_bool_from_args` helper that the `deepseek_v4` arm uses, for symmetry. The helper already accepts both `thinking` and `enable_thinking` keys, so behavior is unchanged. No test changes needed; existing parametric cases continue to pass. Signed-off-by: David Oy <david.oy@baseten.co>
Replace the manual `find(TOOL_CALL_START)` + `find(TOOL_CALL_END)` extraction in `try_tool_call_parse_gemma4` with a direct `regex.captures_iter(message)` pass, mirroring upstream's `extract_tool_calls` in vllm/tool_parsers/gemma4_tool_parser.py. Why: the manual approach used flat byte-search for the end marker without requiring `}<tool_call|>` adjacency. A `<tool_call|>` literal embedded inside a `<|"|>` string value (e.g. a `description` arg documenting the tool-call format) would falsely terminate the call, causing the regex on the truncated block to fail and the call to be silently dropped. The regex-based approach requires the `}` adjacency and is robust to the embedded case. Also fix `find_tool_call_end_position_gemma4` similarly so the streaming jail respects the same adjacency. Tests: 46/46 (added `embedded_tool_call_marker_in_string_value` and `find_end_position_skips_embedded_marker`). Plus a comment-cleanup pass on this PR's earlier task-historical inline comments — keeping only those that document non-obvious invariants (e.g. resolve_prefix precondition, regex adjacency note). Signed-off-by: David Oy <david.oy@baseten.co>
vLLM's `Gemma4ToolParser` and `Gemma4ReasoningParser` rely on the engine emitting `<|tool_call>`, `<|channel>`, etc. — all special-token IDs in the Gemma 4 tokenizer — verbatim in the decoded text. Upstream's OAI server flips `skip_special_tokens=False` per-request via the `adjust_request` hook on each parser. Dynamo's mode-A frontend bypasses that hook, so requests against `--dyn-tool-call-parser gemma4` / `--dyn-reasoning-parser gemma4` get the engine default (`skip_special_tokens=true`), which strips the markers before the parser sees them. End result: tool_calls + reasoning_content silently empty, raw text dumped into `content`. Add a `parser_requires_special_tokens` helper in the preprocessor and intercept `output_options.skip_special_tokens` in `preprocess_request`: when the helper returns true and the user hasn't explicitly set the field, default it to `Some(false)` so the parsers actually see the markers they're matching on. Narrow per-parser table for now (gemma4 + gemma-4 alias only). Other upstream parsers (hermes, kimi_k2, jamba, deepseek_v3.2, etc.) also have `adjust_request` hooks for the same reason and may be silently broken in mode A today; that's a separate audit / follow-up. Tests: parametric `test_parser_requires_special_tokens` covers the positive (gemma4/gemma-4, paired or single) and negative (hermes, kimi_k2, no parsers) cases. CI will exercise the integration via the existing dynamo-llm test suite on Linux. Signed-off-by: David Oy <david.oy@baseten.co>
CI's `cargo fmt --check` flagged the parametric tuples in `test_parser_requires_special_tokens` — single-line form exceeded the max-width and rustfmt wanted them expanded. Run `cargo fmt -p dynamo-llm` to apply. No behavior change. Signed-off-by: David Oy <david.oy@baseten.co>
keivenchang
left a comment
There was a problem hiding this comment.
hey David, both round-1 comments addressed cleanly — regex-based extraction is the right shape, and the skip_special_tokens hook is a nice unprompted add. tiny future-facing nit inline.
can you also go through all the cases in lib/parsers/TEST_CASES.md and confirm coverage / mark explicit N/A for the ones that don't apply? i think 9, 14, 15 are still uncovered.
Almost there, thanks!
|
/ok to test bd44405 |
Per reviewer audit against lib/parsers/TEST_CASES.md, three generic categories were uncovered in the gemma4 tests: * CASE.9 — paired reasoning + tool call. Added one test on each side: the tool-call parser must extract the call when `<|channel>` markers are also present in the input (production runs the reasoning parser first, but the tool-call parser should be robust on its own); the reasoning parser must NOT consume the following `<|tool_call>` markers as part of reasoning content. * CASE.14 — empty content / null args. Two tests: empty input yields zero calls + empty normal_text; `null` / `none` / `nil` arg values round-trip as JSON null. * CASE.15 — duplicate tool calls (same function name). Verifies both calls appear with distinct UUIDs; client decides whether duplicate invocation is intended (per the documented contract in TEST_CASES.md). Plus explicit N/A annotations for cases that don't apply to gemma4 in both test modules — universal cross-parser gaps (CASE.11 tool_choice, CASE.12 finish_reason — hermes-only suites today), family-specific categories (CASE.xml*, CASE.harmony1), and CASE.8 streaming (covered indirectly via the streaming jail's CASE.20 helpers, same pattern as kimi_k2). Surfaces gaps to a future reader rather than leaving them ambiguously absent. Tests: 51/51 pass (was 46). Signed-off-by: David Oy <david.oy@baseten.co>
Absolutely, done! |
|
/ok to test 54a5fca |
|
The failing docs link check is coming from the main branch, not any of the changes in this PR. |
|
Thank you! Are you able to merge? I don't think I have permission. |
Signed-off-by: David Oy <david.oy@baseten.co> Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
… family Pythonic was the only top-N family untouched by the recent coverage PRs (#8888, #8946, #8846, #9035, #8852). Add three small batch-mode tests that mirror the top-N quartet pattern landed for harmony/glm47/qwen3/etc. in #9035, plus the parameterless-call shape from vLLM's pythonic test file. - `test_parse_tool_call_parse_pythonic_empty_args` (PARSER.batch.6) — `[get_weather()]` returns one call with `arguments={}`. Mirrors vLLM `test_pythonic_tool_parser.py::test_tool_call[parameterless_*]`. - `test_parse_pythonic_empty_and_whitespace_inputs` (PARSER.batch.9) — empty / whitespace-only inputs return 0 calls and empty content without panicking. Mirrors the #9035 quartet contract. - `test_parse_pythonic_duplicate_calls_same_name` (PARSER.batch.10) — two `get_weather` calls in one list surface with distinct ids (`call-1`, `call-2`). Pins the auto-generated-id contract for the duplicate-name case. `cargo test -p dynamo-parsers --lib tool_calling::pythonic` — 19/19 pass. Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
…audit Output of DIS-1926 (research vLLM parser test coverage gaps). Doc-only change to `lib/parsers/PARSER_CASES.md`; no source touched. `cargo check -p dynamo-parsers --tests` passes. Refinements driven by gaps surfaced during a bidirectional diff against vLLM `tests/tool_parsers/*` at commit b53c507bc91f87e28b03e9b54bbff7c76e97d58b: - Split PARSER.fmt.1 (function-name surface) from new PARSER.fmt.5 (argument-envelope shape: native call-ID preservation, JSON field-order tolerance, arguments↔parameters key alias). The old CASE.21 (and an earlier draft of PARSER.fmt.1) conflated both axes. - Broaden PARSER.fmt.3 examples beyond Kimi K2's singular vs plural section tokens to include Mistral pre-v11 vs v11+ wire formats, Llama 3 with vs without `<|python_tag|>`, Hermes `qwen25` registry alias. - Add `Known production gaps` section flagging Mistral v11+ wire format (`[TOOL_CALLS]name{...args}` name-then-object) — Dynamo's `ToolCallConfig::mistral()` only handles pre-v11 (JSON-array body), while vLLM tests v11 extensively. v11 is the current Mistral-Small / Mistral-Large production path. Largest single Dynamo parser gap surfaced by the audit. - Promote regex-timeout / parser-exception containment to Universal Gaps (vLLM has explicit `test_regex_timeout_handling` for llama3_json / llama4_pythonic / pythonic and `*_streaming_exception_returns_none` for Mistral; Dynamo relies on Rust regex linear-time guarantees but does not pin failure-containment paths). - Cross-ref PARSER.batch.1 happy-path → PARSER.fmt.5 native-ID sub-axis. - Update Applicability summary and `Adding a new parser` minimum viable set to cover fmt.{1..5}. The full per-test bidirectional audit (493 test rows across 36 parser families, mapped onto the new taxonomy) lives outside this commit. It informed every refinement above; the audit itself is not committed because it's a working artifact rather than a stable reference doc. Top-3 P0 gap status from the audit: 1. Mistral v11 wire format — STILL OPEN (parser doesn't exist; flagged in the new `Known production gaps` section). 2. PARSER.stream.{1..4} parser-tier — partial; DSv4 (#8946) and Gemma 4 (#8852) added coverage; Kimi K2 / Qwen3 / Hermes / Pythonic / Mistral parser-tier streaming tests still gap. 3. CASE.25 / FRONTEND.3 (`adjust_request`) — CLOSED for 7 families via 28 new tests in `lib/llm/tests/tool_choice.rs` (#8946 + #9035). Coverage PRs since 2026-05-05 baseline: #8888 (silent-drop recoveries), #8946 (DSv4 + Kimi K2 coverage), #9035 (top-N CASE.6+ quartet), #8852 (Gemma 4 family), #9127 (taxonomy rename). Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Companion artifact to PR #9290 (PARSER_CASES.md taxonomy refinement). Adds the full per-test bidirectional audit that informed every change in that PR — every vLLM tool-parser test mapped onto the new (PR #9127) taxonomy with a clickable source link. `lib/parsers/VLLM_TEST_AUDIT.md` (new file, 906 lines, 493 distinct test rows): - **Source**: vLLM `main` at commit b53c507bc91f87e28b03e9b54bbff7c76e97d58b (`vllm/tool_parsers/*`, `tests/tool_parsers/*`, `tests/tool_use/*`, `tests/entrypoints/openai/tool_parsers/*`). - **Scope**: 421 explicit test functions + 72 inherited common-suite rows from `ToolParserTests`. - **Bucketing**: every row carries one or more `PARSER_CASES.md` / `REASONING_CASES.md` / `PIPELINE_CASES.md` / `FRONTEND_CASES.md` tags, plus a one-line behavioral note. Re-bucketing transformations applied (vs the original CASE.* labels the audit was first written against, before PR #9127): - 244 streaming rows split per-row into PARSER.stream.{1,2,3,4} (single-call assembly / multi-call assembly / partial-token chunking / streaming termination) - 26 fmt rows split per-row into PARSER.fmt.1 (function-name) vs PARSER.fmt.5 (argument-shape: native ID, JSON field-order, arguments↔parameters alias) - Out-of-PARSER-scope buckets relocated to sibling docs: CASE.{11,18,25} → FRONTEND.{1,3,5,6}; CASE.12 → PIPELINE.finish_reason; CASE.{9,10,17} → REASONING.batch.{1,2}; CASE.20 → `// helper`; CASE.16 → inline-regression annotation; CASE.26 dissolved into PARSER.batch.4 impl-defined recovery contract Two mis-bucketings caught and fixed during review: - FunctionGemma::test_multiple_tool_calls and Gemma4::TestExtractToolCalls.test_multiple_tool_calls were both labeled CASE.1 but assert len(tool_calls) == 2 — corrected to PARSER.batch.2. Four bucket-assignment refinements caught by review: - test_unique_tool_call_ids (DSv3.2) drops fmt.5 (no native call-ID surface; just parallel-call distinctness). - test_invalid_funcall_id_skipped (Kimi K2) moves fmt.5 → fmt.1 (validation, not preservation). - 3 Mistral `argument_before_name*` parametrized rows gain fmt.5 (canonical field-order swap test set referenced by PARSER_CASES.md). A staleness banner at the top documents the re-bucketing transformation and mis-bucket fixes for traceability. Top findings the audit informed (already addressed in PR #9290 or flagged for follow-up): 1. Mistral v11+ wire format — STILL OPEN (parser doesn't exist; flagged in PARSER_CASES.md "Known production gaps"). 2. PARSER.stream.{1..4} parser-tier coverage gap in 5 families (Kimi K2 / Qwen3 / Hermes / Pythonic / Mistral) — partial closure via DSv4 (#8946) and Gemma 4 (#8852). 3. CASE.25 / FRONTEND.3 (`adjust_request`) — CLOSED for 7 families via 28 new tests in `lib/llm/tests/tool_choice.rs` (#8946 + #9035). Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Overview:
Add tool-call and reasoning parsers for Google Gemma 4 thinking models to Dynamo's frontend, plus a vendored chat template. With this change, Dynamo's native preprocessor (mode A in
docs/agents/chat-processor-options.md) can serve any Gemma 4 deployment that runs on vLLM ≥ v0.19.0 (where Gemma 4 support landed via vllm-project/vllm#38826 and #39027) end-to-end with structuredtool_callsandreasoning_content.Gemma 4's serialization grammar does not match any existing Dynamo parser family, so this PR introduces a new
Gemma4variant inParserConfigrather than force-fitting it into JSON/XML/DSML. The grammar uses bare unquoted keys, a custom<|"|>string delimiter, and fixed<|tool_call>...<tool_call|>markers:Reasoning is wrapped between channel markers with a
thought\nrole-label prefix that this parser strips:Recommended pairing for a Gemma 4 deployment:
Details:
The PR is two commits to make the upstream-diff alignments easy to review separately from the initial port.
Commit 1 —
chore(frontend): Add Gemma 4 parser support + Test CasesNew files:
lib/parsers/src/tool_calling/gemma4/{mod.rs,parser.rs}— recursive-descent parser for the Gemma 4 argument grammar (booleans, numbers,<|"|>-delimited strings, nested objects, arrays) emittingserde_json::Value. Inline#[cfg(test)]coverage of CASE.1–CASE.24 fromlib/parsers/TEST_CASES.md.lib/parsers/src/reasoning/gemma4_parser.rs—Gemma4ReasoningParserimplementing theReasoningParsertrait directly (mirrorsgranite_parser.rsshape) with<|channel>...<channel|>delimiters,thought\nprefix stripping (the 3-case streaming logic from upstream's reasoning parser), and partial-marker overlap detection at chunk boundaries.examples/chat_templates/gemma4_tool.jinja— verbatim copy of upstream vLLMexamples/tool_chat_template_gemma4.jinjawith SPDX attribution. Required because the stock HF Gemma 4 chat template does not emit the<|"|>-delimited tool-definition encoding the parser expects.examples/chat_templates/README.md— per-template usage notes.Modified files:
lib/parsers/src/tool_calling/{config.rs,parsers.rs,mod.rs}— addedParserConfig::Gemma4variant,ToolCallConfig::gemma4()factory, registered namesgemma4(aliasgemma-4), and dispatchedtry_tool_call_parse_gemma4/detect_tool_call_start_gemma4/find_tool_call_end_position_gemma4from the central registry.lib/parsers/src/reasoning/mod.rs— addedReasoningParserType::Gemma4variant and registered the same names.lib/parsers/README.md— bumped parser counts (17 → 18 tool-call, 14 → 15 reasoning) and added rows in the parser-family cheat sheets.docs/agents/{tool-calling,reasoning}.md— added parser-table rows and the recommended-pairing line.Commit 2 —
fix(frontend): align Gemma 4 parsers with upstream vLLM mainBehavior-diff against upstream vLLM
main(recently-merged Gemma 4 PRs and bugfix follow-ups) surfaced four real bugs in the initial port plus one reasoning-parser gap and one preprocessor optimization gap. All fixed:parser.rs) — upstream's_parse_gemma4_valuedoesvalue_str.lower() in ("null","none","nil"). The first-pass port hard-coded a small set of mixed-case spellings and missedNONE,Nul, etc. Replaced withtry_consume_keywordthat does ASCII-case-insensitive matching with a trailing-word-char guard sonullabledoesn't get mis-matched asnullplus leftovers.<|"|>string in args (parser.rs) — upstream's_parse_gemma4_args:146-149takes "everything after the opening delimiter" as the value when the closing delimiter is missing. The first-pass port bailed instead, which then triggered the entire-args-object empty-fallback inparse_single_call— i.e. a single truncated trailing arg silently lost ALL prior args. Now mirrors upstream.key:followed by,/}/EOF (parser.rs) — upstream's_parse_gemma4_args:128-131emits{"key": ""}. The first-pass port errored out ofparse_valueand dropped the whole args object. Detect this case inparse_object_bodyand insert an empty-string.parser.rs) — upstreamextract_tool_calls:421-424returns the rawmodel_outputas content when no complete tool call could be parsed. The first-pass port consumed those bytes silently, hiding model truncation from callers. Now echoes the bytes back asnormal_text.gemma4_parser.rs) — when<channel|>is present but<|channel>is absent (tokenizer withskip_special_tokens=Truestrips the start token), upstream's offline parser still extracts text-before as reasoning. Added a(None, Some(end))match arm.is_reasoning_disabled_by_requestpolish (lib/llm/src/preprocessor.rs) — addedgemma4/gemma-4recognition withenable_thinking: falseshort-circuit, matching the existing pattern forkimi_k25/deepseek_v4/nemotron_nano. Without this, the parser still ran on every chunk just to fall through (since no<|channel>markers arrive when the chat template doesn't inject<|think|>).One intentional divergence from upstream: between-call text is preserved (
Sure thing. <call> All set.→"Sure thing. All set.") rather than dropped, because that matches the establishedkimi_k2_parser.rspattern in the rest of Dynamo's parser registry. Documented inline.Where should the reviewer start?
lib/parsers/src/tool_calling/gemma4/parser.rs— the centerpiece. Pay particular attention to:Cursor/parse_object_body/parse_value/parse_array/parse_number/try_consume_keywordchain (lines ~198–402). The grammar parses straight intoserde_json::Value.try_tool_call_parse_gemma4(lines ~91–131) for the outer call-extraction loop and the truncated-call echo behavior.#[cfg(test)] mod testsblock — 31 tests covering CASE.1–CASE.24 plus targeted cases from upstream'stests/tool_parsers/test_gemma4_tool_parser.py(test_unterminated_string,test_empty_value,test_incomplete_tool_call, parallel calls, function names with hyphens/dots, HTML/newlines in string values, case-insensitive null aliases, keyword-prefix non-consumption).lib/parsers/src/reasoning/gemma4_parser.rs— 12 tests including the newdetect_dangling_end_marker_*cases that mirror upstream'sINVALID_SIMPLE/INVALID_COMPLETEtest fixtures, plus the streamingthought\nprefix-split-across-deltas cases.lib/parsers/src/tool_calling/config.rs— the newParserConfig::Gemma4unit variant and theToolCallConfig::gemma4()factory. The variant is unit (no per-instance config) because Gemma 4's markers are fixed and not user-tunable, unlikeKimiK2ParserConfig/Glm47ParserConfig.lib/parsers/src/tool_calling/parsers.rs— the registry/dispatch edits:map.insert("gemma4", ...)+ the new dispatch arm intry_tool_call_parseand the partial-prefix detection arm indetect_tool_call_start.lib/llm/src/preprocessor.rs— only theis_reasoning_disabled_by_requesttable entry plus four parametric test cases. Worth glancing at to confirm it follows the existingkimi_k25/deepseek_v4pattern.examples/chat_templates/gemma4_tool.jinja— verbatim copy of upstream vLLM's template, only a 3-line SPDX/comment header added at the top. Reviewers candiffagainsthttps://github.com/vllm-project/vllm/blob/main/examples/tool_chat_template_gemma4.jinjato verify.Validation:
cargo build -p dynamo-parsers— clean (0 errors, 0 warnings).cargo test -p dynamo-parsers --lib gemma4— 44/44 pass.cargo test -p dynamo-parsers(full crate) — 443 pass, 5 ignored, no regressions in other parsers.cargo fmt --check --all— workspace clean.cargo clippy -p dynamo-parsers— no warnings on the new code (one pre-existing warning inglm47_parser.rs:177, untouched by this PR).pre-commit run --files <13 changed files>— all hooks pass (codespell, case-conflict, merge-conflict, mixed-line-ending, trailing-whitespace, pytest-marker-report).cargo test -p dynamo-llm --lib is_reasoning_disabled_by_request— could not run on the development host because thedynamo-llmcrate has a pre-existing macOS link error (libstdc++not available on modern macOS, replaced bylibc++). The four new parametric test cases are small and surgical, mirroring the existing pattern forkimi_k25anddeepseek_v4; CI on Linux will exercise them.Out of scope (explicit non-goals — file as follow-ups if desired):
lib/llm/src/preprocessor/prompt/deepseek_v4.rs). The vendored jinja covers Phase 1 via--custom-jinja-template; a native formatter is a several-day port of the 333-line jinja and not blocking.components/src/dynamo/frontend/sglang_prepost.py). Only relevant for--dyn-chat-processor sglang(mode C). Whether SGLang has its owngemma4parser is a separate question.lib/llm/tests/data/vllm/gemma4/. Those need real captured chunks from a Gemma 4 vLLM deployment; will follow up after a real-world capture.Watch list (upstream vLLM bugfix PRs the maintainers may want to track):
The vLLM Gemma 4 parsers are still settling — there are several open bugfix PRs touching internals (#39484 STRING_DELIM leak, #39311 nested-array bracket handling, #39070 argument truncation, #39588 selective reasoning, #39885 multi-turn streaming leak, #39570 chat-template HF resync). The public marker contract and parser name
gemma4are stable, so this PR is unaffected, but the vendored jinja may want a periodic re-sync against upstream once #39570 lands. Filing a tracking issue is recommended.AI-Generated Code Disclosure:
Per Dynamo's contribution guide (
docs/contribution-guide.md§AI-Generated Code): AI assistance was used to draft this change. Every changed line was manually reviewed against the upstream vLLM reference implementations (vllm/tool_parsers/gemma4_tool_parser.py,vllm/reasoning/gemma4_reasoning_parser.py,examples/tool_chat_template_gemma4.jinja) and against the analog Dynamo parsers (kimi_k2_parser.rs,granite_parser.rs,dsml/parser.rs). The submitter understands and can defend every line of the change end-to-end, including the recursive-descent grammar parser, the streamingthought\nprefix-stripping state machine, and the four upstream-diff fixes in commit 2.Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Documentation