chore(frontend): Add Gemma 4 parser support + Test Cases by the-david-oy · Pull Request #8852 · ai-dynamo/dynamo

the-david-oy · 2026-04-29T20:04:19Z

Overview:

Add tool-call and reasoning parsers for Google Gemma 4 thinking models to Dynamo's frontend, plus a vendored chat template. With this change, Dynamo's native preprocessor (mode A in docs/agents/chat-processor-options.md) can serve any Gemma 4 deployment that runs on vLLM ≥ v0.19.0 (where Gemma 4 support landed via vllm-project/vllm#38826 and #39027) end-to-end with structured tool_calls and reasoning_content.

Gemma 4's serialization grammar does not match any existing Dynamo parser family, so this PR introduces a new Gemma4 variant in ParserConfig rather than force-fitting it into JSON/XML/DSML. The grammar uses bare unquoted keys, a custom <|"|> string delimiter, and fixed <|tool_call>...<tool_call|> markers:

<|tool_call>call:get_weather{location:<|"|>Tokyo<|"|>,unit:<|"|>celsius<|"|>,count:5}<tool_call|>

Reasoning is wrapped between channel markers with a thought\n role-label prefix that this parser strips:

<|channel>thought
...chain of thought reasoning...<channel|>
Final answer text.

Recommended pairing for a Gemma 4 deployment:

--dyn-tool-call-parser gemma4 \
--dyn-reasoning-parser gemma4 \
--custom-jinja-template examples/chat_templates/gemma4_tool.jinja

Details:

The PR is two commits to make the upstream-diff alignments easy to review separately from the initial port.

Commit 1 — chore(frontend): Add Gemma 4 parser support + Test Cases

New files:

lib/parsers/src/tool_calling/gemma4/{mod.rs,parser.rs} — recursive-descent parser for the Gemma 4 argument grammar (booleans, numbers, <|"|>-delimited strings, nested objects, arrays) emitting serde_json::Value. Inline #[cfg(test)] coverage of CASE.1–CASE.24 from lib/parsers/TEST_CASES.md.
lib/parsers/src/reasoning/gemma4_parser.rs — Gemma4ReasoningParser implementing the ReasoningParser trait directly (mirrors granite_parser.rs shape) with <|channel>...<channel|> delimiters, thought\n prefix stripping (the 3-case streaming logic from upstream's reasoning parser), and partial-marker overlap detection at chunk boundaries.
examples/chat_templates/gemma4_tool.jinja — verbatim copy of upstream vLLM examples/tool_chat_template_gemma4.jinja with SPDX attribution. Required because the stock HF Gemma 4 chat template does not emit the <|"|>-delimited tool-definition encoding the parser expects.
examples/chat_templates/README.md — per-template usage notes.

Modified files:

lib/parsers/src/tool_calling/{config.rs,parsers.rs,mod.rs} — added ParserConfig::Gemma4 variant, ToolCallConfig::gemma4() factory, registered names gemma4 (alias gemma-4), and dispatched try_tool_call_parse_gemma4 / detect_tool_call_start_gemma4 / find_tool_call_end_position_gemma4 from the central registry.
lib/parsers/src/reasoning/mod.rs — added ReasoningParserType::Gemma4 variant and registered the same names.
lib/parsers/README.md — bumped parser counts (17 → 18 tool-call, 14 → 15 reasoning) and added rows in the parser-family cheat sheets.
docs/agents/{tool-calling,reasoning}.md — added parser-table rows and the recommended-pairing line.

Commit 2 — fix(frontend): align Gemma 4 parsers with upstream vLLM main

Behavior-diff against upstream vLLM main (recently-merged Gemma 4 PRs and bugfix follow-ups) surfaced four real bugs in the initial port plus one reasoning-parser gap and one preprocessor optimization gap. All fixed:

Null-keyword case-insensitivity (parser.rs) — upstream's _parse_gemma4_value does value_str.lower() in ("null","none","nil"). The first-pass port hard-coded a small set of mixed-case spellings and missed NONE, Nul, etc. Replaced with try_consume_keyword that does ASCII-case-insensitive matching with a trailing-word-char guard so nullable doesn't get mis-matched as null plus leftovers.
Unterminated <|"|> string in args (parser.rs) — upstream's _parse_gemma4_args:146-149 takes "everything after the opening delimiter" as the value when the closing delimiter is missing. The first-pass port bailed instead, which then triggered the entire-args-object empty-fallback in parse_single_call — i.e. a single truncated trailing arg silently lost ALL prior args. Now mirrors upstream.
Empty value key: followed by ,/}/EOF (parser.rs) — upstream's _parse_gemma4_args:128-131 emits {"key": ""}. The first-pass port errored out of parse_value and dropped the whole args object. Detect this case in parse_object_body and insert an empty-string.
Truncated tool call (parser.rs) — upstream extract_tool_calls:421-424 returns the raw model_output as content when no complete tool call could be parsed. The first-pass port consumed those bytes silently, hiding model truncation from callers. Now echoes the bytes back as normal_text.
Reasoning: dangling end marker (gemma4_parser.rs) — when <channel|> is present but <|channel> is absent (tokenizer with skip_special_tokens=True strips the start token), upstream's offline parser still extracts text-before as reasoning. Added a (None, Some(end)) match arm.
is_reasoning_disabled_by_request polish (lib/llm/src/preprocessor.rs) — added gemma4/gemma-4 recognition with enable_thinking: false short-circuit, matching the existing pattern for kimi_k25 / deepseek_v4 / nemotron_nano. Without this, the parser still ran on every chunk just to fall through (since no <|channel> markers arrive when the chat template doesn't inject <|think|>).

One intentional divergence from upstream: between-call text is preserved (Sure thing. <call> All set. → "Sure thing. All set.") rather than dropped, because that matches the established kimi_k2_parser.rs pattern in the rest of Dynamo's parser registry. Documented inline.

Where should the reviewer start?

lib/parsers/src/tool_calling/gemma4/parser.rs — the centerpiece. Pay particular attention to:
- The recursive-descent Cursor / parse_object_body / parse_value / parse_array / parse_number / try_consume_keyword chain (lines ~198–402). The grammar parses straight into serde_json::Value.
- try_tool_call_parse_gemma4 (lines ~91–131) for the outer call-extraction loop and the truncated-call echo behavior.
- The #[cfg(test)] mod tests block — 31 tests covering CASE.1–CASE.24 plus targeted cases from upstream's tests/tool_parsers/test_gemma4_tool_parser.py (test_unterminated_string, test_empty_value, test_incomplete_tool_call, parallel calls, function names with hyphens/dots, HTML/newlines in string values, case-insensitive null aliases, keyword-prefix non-consumption).
lib/parsers/src/reasoning/gemma4_parser.rs — 12 tests including the new detect_dangling_end_marker_* cases that mirror upstream's INVALID_SIMPLE / INVALID_COMPLETE test fixtures, plus the streaming thought\n prefix-split-across-deltas cases.
lib/parsers/src/tool_calling/config.rs — the new ParserConfig::Gemma4 unit variant and the ToolCallConfig::gemma4() factory. The variant is unit (no per-instance config) because Gemma 4's markers are fixed and not user-tunable, unlike KimiK2ParserConfig / Glm47ParserConfig.
lib/parsers/src/tool_calling/parsers.rs — the registry/dispatch edits: map.insert("gemma4", ...) + the new dispatch arm in try_tool_call_parse and the partial-prefix detection arm in detect_tool_call_start.
lib/llm/src/preprocessor.rs — only the is_reasoning_disabled_by_request table entry plus four parametric test cases. Worth glancing at to confirm it follows the existing kimi_k25 / deepseek_v4 pattern.
examples/chat_templates/gemma4_tool.jinja — verbatim copy of upstream vLLM's template, only a 3-line SPDX/comment header added at the top. Reviewers can diff against https://github.com/vllm-project/vllm/blob/main/examples/tool_chat_template_gemma4.jinja to verify.

Validation:

cargo build -p dynamo-parsers — clean (0 errors, 0 warnings).
cargo test -p dynamo-parsers --lib gemma4 — 44/44 pass.
cargo test -p dynamo-parsers (full crate) — 443 pass, 5 ignored, no regressions in other parsers.
cargo fmt --check --all — workspace clean.
cargo clippy -p dynamo-parsers — no warnings on the new code (one pre-existing warning in glm47_parser.rs:177, untouched by this PR).
pre-commit run --files <13 changed files> — all hooks pass (codespell, case-conflict, merge-conflict, mixed-line-ending, trailing-whitespace, pytest-marker-report).
cargo test -p dynamo-llm --lib is_reasoning_disabled_by_request — could not run on the development host because the dynamo-llm crate has a pre-existing macOS link error (libstdc++ not available on modern macOS, replaced by libc++). The four new parametric test cases are small and surgical, mirroring the existing pattern for kimi_k25 and deepseek_v4; CI on Linux will exercise them.

Out of scope (explicit non-goals — file as follow-ups if desired):

Native Rust chat-template formatter for Gemma 4 (analogous to lib/llm/src/preprocessor/prompt/deepseek_v4.rs). The vendored jinja covers Phase 1 via --custom-jinja-template; a native formatter is a several-day port of the 333-line jinja and not blocking.
SGLang frontend wiring (components/src/dynamo/frontend/sglang_prepost.py). Only relevant for --dyn-chat-processor sglang (mode C). Whether SGLang has its own gemma4 parser is a separate question.
Streaming-jail integration tests under lib/llm/tests/data/vllm/gemma4/. Those need real captured chunks from a Gemma 4 vLLM deployment; will follow up after a real-world capture.
Backporting to older Dynamo release branches.

Watch list (upstream vLLM bugfix PRs the maintainers may want to track):

The vLLM Gemma 4 parsers are still settling — there are several open bugfix PRs touching internals (#39484 STRING_DELIM leak, #39311 nested-array bracket handling, #39070 argument truncation, #39588 selective reasoning, #39885 multi-turn streaming leak, #39570 chat-template HF resync). The public marker contract and parser name gemma4 are stable, so this PR is unaffected, but the vendored jinja may want a periodic re-sync against upstream once #39570 lands. Filing a tracking issue is recommended.

AI-Generated Code Disclosure:

Per Dynamo's contribution guide (docs/contribution-guide.md §AI-Generated Code): AI assistance was used to draft this change. Every changed line was manually reviewed against the upstream vLLM reference implementations (vllm/tool_parsers/gemma4_tool_parser.py, vllm/reasoning/gemma4_reasoning_parser.py, examples/tool_chat_template_gemma4.jinja) and against the analog Dynamo parsers (kimi_k2_parser.rs, granite_parser.rs, dsml/parser.rs). The submitter understands and can defend every line of the change end-to-end, including the recursive-descent grammar parser, the streaming thought\n prefix-stripping state machine, and the four upstream-diff fixes in commit 2.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Relates to GitHub issue: [CONTRIBUTION]: Add Gemma 4 tool-call and reasoning parsers #8851

Summary by CodeRabbit

New Features
- Added support for Gemma 4 models with tool-calling and reasoning parsing capabilities
- New custom Jinja chat template for Gemma 4 tool formatting
Documentation
- Added documentation for Gemma 4 tool-calling and reasoning parsers
- Added README guide for custom Jinja chat templates with Gemma 4 example

copy-pr-bot · 2026-04-29T20:04:22Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-04-29T20:04:34Z

🌿 Fern Docs Preview: https://nvidia-preview-c0db9638-dc48-4efd-adfc-a8f8ee236842.docs.buildwithfern.com/dynamo/dev

coderabbitai · 2026-04-29T20:10:57Z

Walkthrough

This PR adds comprehensive Gemma 4 model support by introducing a tool-calling parser with custom <|tool_call> grammar, a channel-based reasoning parser for thinking outputs, a corresponding Jinja chat template, and infrastructure to register these parsers in the existing framework.

Changes

Cohort / File(s)	Summary
Documentation & Examples `docs/agents/reasoning.md`, `docs/agents/tool-calling.md`, `examples/chat_templates/README.md`, `lib/parsers/README.md`	Added documentation for new `gemma4` reasoning and tool-calling parsers, including model family support, vLLM aliases, grammar details, configuration guidance, and updated parser registry counts (tool-calling: 17→18, reasoning: 14→15).
Chat Template `examples/chat_templates/gemma4_tool.jinja`	New Jinja template for Gemma 4 tool formatting, with macros for tool/function declarations, argument serialization, reasoning/thought channels, and tool response blocks. Handles multimodal content (text/image/audio/video) and streaming generation prompts.
Reasoning Parser Implementation `lib/parsers/src/reasoning/gemma4_parser.rs`	New `Gemma4ReasoningParser` struct extracting thinking content from `<
Reasoning Parser Infrastructure `lib/parsers/src/reasoning/mod.rs`	Added `Gemma4ReasoningParser` re-export, `ReasoningParserType::Gemma4` enum variant, and registry entries for `"gemma4"` and `"gemma-4"` aliases.
Tool-Calling Parser Implementation `lib/parsers/src/tool_calling/gemma4/parser.rs`, `lib/parsers/src/tool_calling/gemma4/mod.rs`	New Gemma 4 tool-calling parser detecting `<
Tool-Calling Infrastructure `lib/parsers/src/tool_calling/mod.rs`, `lib/parsers/src/tool_calling/config.rs`, `lib/parsers/src/tool_calling/parsers.rs`	Added `gemma4` module, `ParserConfig::Gemma4` variant, `ToolCallConfig::gemma4()` factory, and registry dispatch for both `"gemma4"` and `"gemma-4"` aliases routing to Gemma 4 parser helpers.
Preprocessor Logic `lib/llm/src/preprocessor.rs`	Extended `is_reasoning_disabled_by_request` to recognize `gemma4`/`gemma-4` aliases and consult `chat_template_args["enable_thinking"]` to determine reasoning disable state, with fallback to `false` when missing.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main changes: addition of Gemma 4 parser support and test cases for the frontend.
Description check	✅ Passed	The PR description comprehensively covers the required template sections: Overview, Details, Where should the reviewer start, and Related Issues, with extensive technical depth and validation results.
Docstring Coverage	✅ Passed	Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

lib/parsers/src/tool_calling/gemma4/parser.rs (1)

4-5: Incomplete reference link in comment.

The PR reference link on line 5 contains a placeholder (...) instead of the actual PR number. Consider either completing the link or removing it if the upstream PR hasn't been merged yet.

 // Reference implementations:
-// https://github.com/vllm-project/vllm/pull/.../vllm/tool_parsers/gemma4_tool_parser.py
+// https://github.com/vllm-project/vllm/blob/main/vllm/tool_parsers/gemma4_tool_parser.py

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/parsers/src/tool_calling/gemma4/parser.rs` around lines 4 - 5, The inline
comment in parser.rs contains an incomplete PR URL
("https://github.com/vllm-project/vllm/pull/.../vllm/tool_parsers/gemma4_tool_parser.py");
update that comment by replacing the "..." with the actual PR number/path if
available, or remove the placeholder link entirely (keep or replace with a note
like "reference implementation in vllm/tool_parsers/gemma4_tool_parser.py" if
the PR is not merged) so the reference in the header comment of
lib/parsers/src/tool_calling/gemma4/parser.rs is accurate and not misleading.

lib/parsers/src/reasoning/gemma4_parser.rs (1)

102-122: Verify underflow safety in resolve_prefix.

Line 104 computes accum.len() - raw_reasoning.len(). While the current call sites ensure raw_reasoning is always a suffix of accum (pushed before calling), this invariant isn't documented or enforced at the function boundary.

Consider adding a debug assertion or a brief comment documenting this precondition:

📝 Suggested documentation

 /// Case 3: accumulated reasoning diverged from the prefix — emit the
 ///   buffered reasoning verbatim (data preservation).
+///
+/// # Precondition
+/// `raw_reasoning` must be a suffix of `accum` (i.e., `accum.len() >= raw_reasoning.len()`).
 fn resolve_prefix<'a>(accum: &'a str, raw_reasoning: &'a str) -> (&'a str, bool) {
+    debug_assert!(accum.len() >= raw_reasoning.len(), "raw_reasoning must be a suffix of accum");
     if accum.starts_with(THOUGHT_PREFIX) {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/parsers/src/reasoning/gemma4_parser.rs` around lines 102 - 122, The
function resolve_prefix relies on the precondition that raw_reasoning is a
suffix of accum (used when computing accum.len() - raw_reasoning.len()), so add
a debug assertion and brief comment documenting this invariant: insert something
like a comment above resolve_prefix mentioning "precondition: raw_reasoning must
be a suffix of accum" and a debug_assert!(accum.ends_with(raw_reasoning)) at the
start of resolve_prefix to make the precondition explicit and fail fast in debug
builds; reference resolve_prefix and the THOUGHT_PREFIX logic when adding the
assertion and comment.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@lib/parsers/src/reasoning/gemma4_parser.rs`:
- Around line 102-122: The function resolve_prefix relies on the precondition
that raw_reasoning is a suffix of accum (used when computing accum.len() -
raw_reasoning.len()), so add a debug assertion and brief comment documenting
this invariant: insert something like a comment above resolve_prefix mentioning
"precondition: raw_reasoning must be a suffix of accum" and a
debug_assert!(accum.ends_with(raw_reasoning)) at the start of resolve_prefix to
make the precondition explicit and fail fast in debug builds; reference
resolve_prefix and the THOUGHT_PREFIX logic when adding the assertion and
comment.

In `@lib/parsers/src/tool_calling/gemma4/parser.rs`:
- Around line 4-5: The inline comment in parser.rs contains an incomplete PR URL
("https://github.com/vllm-project/vllm/pull/.../vllm/tool_parsers/gemma4_tool_parser.py");
update that comment by replacing the "..." with the actual PR number/path if
available, or remove the placeholder link entirely (keep or replace with a note
like "reference implementation in vllm/tool_parsers/gemma4_tool_parser.py" if
the PR is not merged) so the reference in the header comment of
lib/parsers/src/tool_calling/gemma4/parser.rs is accurate and not misleading.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7704350f-d1fc-436c-8990-6ab2d110a063

📥 Commits

Reviewing files that changed from the base of the PR and between 665b54b and c65e5da.

📒 Files selected for processing (13)

docs/agents/reasoning.md
docs/agents/tool-calling.md
examples/chat_templates/README.md
examples/chat_templates/gemma4_tool.jinja
lib/llm/src/preprocessor.rs
lib/parsers/README.md
lib/parsers/src/reasoning/gemma4_parser.rs
lib/parsers/src/reasoning/mod.rs
lib/parsers/src/tool_calling/config.rs
lib/parsers/src/tool_calling/gemma4/mod.rs
lib/parsers/src/tool_calling/gemma4/parser.rs
lib/parsers/src/tool_calling/mod.rs
lib/parsers/src/tool_calling/parsers.rs

Add tool-call and reasoning parsers for Google Gemma 4 thinking models. Gemma 4 uses a custom non-JSON serialization grammar that does not match any existing parser family: <|tool_call>call:func_name{location:<|"|>Tokyo<|"|>,count:42}<tool_call|> Bare unquoted keys, custom <|"|> string delimiter, supports nested objects and arrays, multiple tool calls concatenated without separators. Reasoning grammar is wrapped between channel markers with a `thought\n` role-label prefix that this parser strips: <|channel>thought\n...chain of thought...<channel|> Changes: * Tool-call parser (lib/parsers/src/tool_calling/gemma4/) — new ParserConfig variant `Gemma4`; recursive-descent parser into `serde_json::Value`; registered as `gemma4` (alias `gemma-4`); inline #[cfg(test)] coverage of CASE.1..CASE.24 from TEST_CASES.md including malformed args, truncation recovery, parallel calls, function names with hyphens/dots, and HTML/newlines inside string values. * Reasoning parser (lib/parsers/src/reasoning/gemma4_parser.rs) — implements the `ReasoningParser` trait directly (mirrors `granite_parser.rs` style) with streaming `thought\n` prefix-stripping (3-case logic: full prefix, strict prefix buffered, diverged emitted verbatim) and partial-marker overlap detection at chunk boundaries; registered as `gemma4` (alias `gemma-4`). * Vendored chat template (examples/chat_templates/gemma4_tool.jinja) — verbatim copy of vLLM's tool_chat_template_gemma4.jinja with SPDX attribution. Required because the upstream HF Gemma 4 chat template does not emit the <|"|>-delimited tool-definition encoding the parser expects. Pass via `--custom-jinja-template`. * Docs updates (docs/agents/{tool-calling,reasoning}.md, lib/parsers/README.md) with parser table rows and a recommended-pairing line: `--dyn-tool-call-parser gemma4 --dyn-reasoning-parser gemma4 --custom-jinja-template examples/chat_templates/gemma4_tool.jinja` Reference: vllm-project/vllm tool_parsers/gemma4_tool_parser.py and reasoning/gemma4_reasoning_parser.py. AI assistance was used to draft this change; all changed lines were manually reviewed against the upstream vLLM reference implementation. Signed-off-by: David Oy <david.oy@baseten.co>

Behavior-diff against upstream vLLM main turned up four real bugs in the initial port plus one reasoning-parser gap and one optimization gap. All fixed; tests added. Tool-call parser (lib/parsers/src/tool_calling/gemma4/parser.rs): * Null-keyword case-sensitivity (upstream `value_str.lower() in (...)`): the previous hard-coded list missed `NONE`, `Nul`, etc. Replaced with `try_consume_keyword` that does ASCII-case-insensitive matching with a trailing-word-char guard so `nullable` doesn't get mis-matched as the `null` keyword plus leftovers. * Unterminated `<|"|>` string in args (upstream `_parse_gemma4_args:146-149`): when the closing delimiter is missing, upstream takes "everything after the opening delimiter" as the value. We were bailing instead, which then triggered the entire-args-object empty-fallback in `parse_single_call` — i.e. a single truncated trailing arg silently lost ALL prior args. Now we mirror upstream and return the truncated tail as the string value. * Empty value `key:` followed by `,` / `}` / EOF (upstream `_parse_gemma4_args:128-131`): upstream emits `{"key": ""}`. We were erroring out of `parse_value` and dropping the whole args object. Detect this case in `parse_object_body` and insert an empty-string. * Truncated tool call (start marker, no end marker): upstream `extract_tool_calls:421-424` returns the raw `model_output` as content when no complete tool call could be parsed. We were consuming the bytes silently, hiding model truncation from callers. Echo the bytes back as `normal_text` instead. Reasoning parser (lib/parsers/src/reasoning/gemma4_parser.rs): * Dangling end marker (`<channel|>` present, `<|channel>` absent): this happens when a tokenizer with `skip_special_tokens=True` strips the start token but leaves the end token. Upstream's offline parser still extracts text-before as reasoning. We were returning the whole text as `normal_text` and losing the reasoning content. Added a match arm for `(None, Some(end))` that splits on the end marker. Preprocessor optimization (lib/llm/src/preprocessor.rs): * `is_reasoning_disabled_by_request` now recognizes `gemma4`/`gemma-4` with `enable_thinking: false` and short-circuits the parser, matching the existing pattern for `kimi_k25` / `deepseek_v4` / `nemotron_nano`. Without this, the parser still ran on every chunk just to fall through (since no `<|channel>` markers arrive when the chat template doesn't inject `<|think|>`). Tests added/renamed: * `truncated_tail_echoed_complete_prior_survives` (was `truncated_call_dropped_complete_prior_survives`). * `args_grammar_unterminated_string_takes_remainder` (was `args_grammar_unterminated_string_errors`). * `args_grammar_empty_value_yields_empty_string`, `args_grammar_trailing_empty_value`, `args_grammar_null_aliases_case_insensitive`, `args_grammar_keyword_prefix_not_consumed`, `incomplete_tool_call_echoes_raw_bytes`. * `detect_dangling_end_marker_extracts_prefix_as_reasoning`, `detect_dangling_end_marker_strips_thought_prefix`. * Four new cases in `test_is_reasoning_disabled_by_request` covering `gemma4` and `gemma-4` with `enable_thinking: false` / `true` / no args. Behavior intentionally diverges from upstream in one place: Rust's between-call text join (`Sure thing. <call> All set.` → preserves `"Sure thing. All set."`) follows the established kimi_k2 Dynamo pattern rather than upstream Gemma 4's prefix-only behavior. This keeps the parser registry internally consistent. AI assistance was used to draft this change; all changed lines were manually reviewed against the upstream vLLM reference implementation. Signed-off-by: David Oy <david.oy@baseten.co>

* parser.rs:4 — replace placeholder PR URL (`/pull/.../...`) with the resolved upstream link (`/blob/main/...`) and a note that vLLM PR #38826 / #39027 are merged. Reviewers can now click through. * gemma4_parser.rs::resolve_prefix — make the "raw_reasoning is a suffix of accum" precondition explicit with a doc-comment block and a `debug_assert!` that fails fast in debug builds. Documents the invariant that the streaming driver upholds (it always pushes the raw delta into self.reasoning_accum immediately before calling resolve_prefix). No behavior change in release builds; only docstring clarity and a debug-mode assertion. Tests still 44/44. Signed-off-by: David Oy <david.oy@baseten.co>

the-david-oy · 2026-04-29T20:14:27Z

Pushed f9c7a6d with two updates:

Rebased onto current main (was on 2e31245, now on 665b54b). The rebase was clean, no conflicts.
Addressed both CodeRabbit nits in a third commit chore(frontend): address CodeRabbit nits on Gemma 4 parsers:
- parser.rs:4 — replaced the placeholder PR URL with the resolved upstream link and noted vLLM PR #38826 / #39027 are merged.
- gemma4_parser.rs::resolve_prefix — made the "raw_reasoning is a suffix of accum" precondition explicit via doc-block + debug_assert!. Behavior unchanged in release builds.

Pre-commit job failure

The pre-commit check was failing on pre-existing breakage in main, not on anything in this PR. Specifically the pytest-marker-report hook reports 49 tests missing required markers across files I do not touch — most prominently components/src/dynamo/common/tests/test_video_protocol.py and test_video_utils.py, which were added 2 hours ago by #8491 (feat: formalize video streaming). Confirming this is upstream:

gh run list --repo ai-dynamo/dynamo --branch main shows the most recent Pre Merge run on main commit 665b54b (the one I rebased onto) is also failure.
The hook has pass_filenames: false in .pre-commit-config.yaml, so it always validates the full repo regardless of which files a given PR changed.
All other CI checks on this PR are passing or pending: cargo clippy, DCO, copyright-checks, lychee, dco-comment, changed-files, pr_reminder, Validate PR title, rust-clippy (.) etc. all green.

This is due to:

test: enforce pytest markers on components/src tests #7796 (test: enforce pytest markers on components/src tests) merged at 2026-04-29 18:06 UTC, making the pytest-marker-report hook strict for tests under components/src/.
feat: formalize video streaming #8491 (feat: formalize video streaming) merged at 2026-04-29 18:45 UTC — its CI ran against the pre-strict hook, so it passed; once test: enforce pytest markers on components/src tests #7796 was merged 39 minutes earlier, feat: formalize video streaming #8491's tests retroactively became non-compliant.
The same situation applies to test_audio_protocol.py (added by an earlier PR whose CI ran before test: enforce pytest markers on components/src tests #7796).

So three test files now block every external-contributor PR rebased onto main:

components/src/dynamo/common/tests/test_video_protocol.py (25 tests)
components/src/dynamo/common/tests/test_video_utils.py (7 tests)
components/src/dynamo/common/tests/test_audio_protocol.py (17 tests)

rmccorm4 · 2026-04-29T20:24:12Z

Thanks @the-david-oy ! CC @richardhuo-nv @keivenchang to help review

rmccorm4 · 2026-04-29T20:24:35Z

/ok to test f9c7a6d

keivenchang

FYI I added comments on test-case categories (and pushed it). No logic change.

Augment the 12 previously-unannotated tests across the gemma4 tool-call and reasoning parsers with their corresponding CASE.* numbers from lib/parsers/TEST_CASES.md, matching the convention already used by the other tests in this PR and across the rest of the parser suite. Comment-only — no functional changes; cargo test output unchanged. Mapping summary: - args_grammar_*: CASE.4/5/6/7 (grammar entry-point + truncation paths) - incomplete_tool_call_echoes_raw_bytes: CASE.5 (missing end marker) - detect_dangling_end_marker_*: CASE.5, CASE.13 (missing-start-marker recovery) - streaming_multiple_reasoning_spans: CASE.2 (multiple spans) Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

keivenchang

two notes — both pointers, no inline comments needed.

1. Pre-commit hook unrelated to this PR. pytest-marker-report currently fails on this branch with 49 missing-marker tests across test_audio_protocol.py, test_video_protocol.py, test_video_utils.py. those 49 are pre-existing on the branch — main has 0 missing. fix already merged on main as cc1af6a90 (#8849). merging main / rebasing should clear the hook.

2. CASE. coverage gaps to close (per lib/parsers/TEST_CASES.md).*

To add: CASE.9 (paired reasoning + tool in same response), CASE.14 (empty tool_calls / null at API edge), CASE.15 (duplicate tool calls — universal gap, no parser covers this today).
Framework-level (cross-parser suites, not in this PR's scope): CASE.11 (tool_choice modes), CASE.12 (finish_reason).
N/A (worth a one-line // CASE.X — N/A per the doc's stated convention): CASE.23 (markers fixed, no variants), CASE.xml1/2 (not XML-family), CASE.harmony1 (not Harmony).

@keivenchang

Two small changes from @keivenchang on PR ai-dynamo#8852: * `try_tool_call_parse_gemma4`: add a one-line note explaining that the first `<tool_call|>` after the start marker wins and that we don't scan inside `<|"|>` string values for embedded markers — so a model emitting a tool-call literal inside a string argument will truncate the call. Matches upstream vLLM regex behavior. * `is_reasoning_disabled_by_request` `gemma4` arm: switch from a hand-rolled `enable_thinking` check to the shared `thinking_bool_from_args` helper that the `deepseek_v4` arm uses, for symmetry. The helper already accepts both `thinking` and `enable_thinking` keys, so behavior is unchanged. No test changes needed; existing parametric cases continue to pass. Signed-off-by: David Oy <david.oy@baseten.co>

Replace the manual `find(TOOL_CALL_START)` + `find(TOOL_CALL_END)` extraction in `try_tool_call_parse_gemma4` with a direct `regex.captures_iter(message)` pass, mirroring upstream's `extract_tool_calls` in vllm/tool_parsers/gemma4_tool_parser.py. Why: the manual approach used flat byte-search for the end marker without requiring `}<tool_call|>` adjacency. A `<tool_call|>` literal embedded inside a `<|"|>` string value (e.g. a `description` arg documenting the tool-call format) would falsely terminate the call, causing the regex on the truncated block to fail and the call to be silently dropped. The regex-based approach requires the `}` adjacency and is robust to the embedded case. Also fix `find_tool_call_end_position_gemma4` similarly so the streaming jail respects the same adjacency. Tests: 46/46 (added `embedded_tool_call_marker_in_string_value` and `find_end_position_skips_embedded_marker`). Plus a comment-cleanup pass on this PR's earlier task-historical inline comments — keeping only those that document non-obvious invariants (e.g. resolve_prefix precondition, regex adjacency note). Signed-off-by: David Oy <david.oy@baseten.co>

vLLM's `Gemma4ToolParser` and `Gemma4ReasoningParser` rely on the engine emitting `<|tool_call>`, `<|channel>`, etc. — all special-token IDs in the Gemma 4 tokenizer — verbatim in the decoded text. Upstream's OAI server flips `skip_special_tokens=False` per-request via the `adjust_request` hook on each parser. Dynamo's mode-A frontend bypasses that hook, so requests against `--dyn-tool-call-parser gemma4` / `--dyn-reasoning-parser gemma4` get the engine default (`skip_special_tokens=true`), which strips the markers before the parser sees them. End result: tool_calls + reasoning_content silently empty, raw text dumped into `content`. Add a `parser_requires_special_tokens` helper in the preprocessor and intercept `output_options.skip_special_tokens` in `preprocess_request`: when the helper returns true and the user hasn't explicitly set the field, default it to `Some(false)` so the parsers actually see the markers they're matching on. Narrow per-parser table for now (gemma4 + gemma-4 alias only). Other upstream parsers (hermes, kimi_k2, jamba, deepseek_v3.2, etc.) also have `adjust_request` hooks for the same reason and may be silently broken in mode A today; that's a separate audit / follow-up. Tests: parametric `test_parser_requires_special_tokens` covers the positive (gemma4/gemma-4, paired or single) and negative (hermes, kimi_k2, no parsers) cases. CI will exercise the integration via the existing dynamo-llm test suite on Linux. Signed-off-by: David Oy <david.oy@baseten.co>

CI's `cargo fmt --check` flagged the parametric tuples in `test_parser_requires_special_tokens` — single-line form exceeded the max-width and rustfmt wanted them expanded. Run `cargo fmt -p dynamo-llm` to apply. No behavior change. Signed-off-by: David Oy <david.oy@baseten.co>

keivenchang

hey David, both round-1 comments addressed cleanly — regex-based extraction is the right shape, and the skip_special_tokens hook is a nice unprompted add. tiny future-facing nit inline.

can you also go through all the cases in lib/parsers/TEST_CASES.md and confirm coverage / mark explicit N/A for the ones that don't apply? i think 9, 14, 15 are still uncovered.

Almost there, thanks!

richardhuo-nv · 2026-04-30T05:59:57Z

/ok to test bd44405

Per reviewer audit against lib/parsers/TEST_CASES.md, three generic categories were uncovered in the gemma4 tests: * CASE.9 — paired reasoning + tool call. Added one test on each side: the tool-call parser must extract the call when `<|channel>` markers are also present in the input (production runs the reasoning parser first, but the tool-call parser should be robust on its own); the reasoning parser must NOT consume the following `<|tool_call>` markers as part of reasoning content. * CASE.14 — empty content / null args. Two tests: empty input yields zero calls + empty normal_text; `null` / `none` / `nil` arg values round-trip as JSON null. * CASE.15 — duplicate tool calls (same function name). Verifies both calls appear with distinct UUIDs; client decides whether duplicate invocation is intended (per the documented contract in TEST_CASES.md). Plus explicit N/A annotations for cases that don't apply to gemma4 in both test modules — universal cross-parser gaps (CASE.11 tool_choice, CASE.12 finish_reason — hermes-only suites today), family-specific categories (CASE.xml*, CASE.harmony1), and CASE.8 streaming (covered indirectly via the streaming jail's CASE.20 helpers, same pattern as kimi_k2). Surfaces gaps to a future reader rather than leaving them ambiguously absent. Tests: 51/51 pass (was 46). Signed-off-by: David Oy <david.oy@baseten.co>

the-david-oy · 2026-04-30T14:51:32Z

can you also go through all the cases in lib/parsers/TEST_CASES.md and confirm coverage / mark explicit N/A for the ones that don't apply? i think 9, 14, 15 are still uncovered.

Absolutely, done!

richardhuo-nv · 2026-04-30T16:31:58Z

/ok to test 54a5fca

the-david-oy · 2026-04-30T17:08:02Z

The failing docs link check is coming from the main branch, not any of the changes in this PR.

keivenchang

nice work David — all CASE gaps closed, N/A annotations match the doc convention. thanks, ship it. Oh, and that Doc link error, it is not a Mandatory check.

the-david-oy · 2026-04-30T18:12:36Z

Thank you! Are you able to merge? I don't think I have permission.

Signed-off-by: David Oy <david.oy@baseten.co> Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>

… family Pythonic was the only top-N family untouched by the recent coverage PRs (#8888, #8946, #8846, #9035, #8852). Add three small batch-mode tests that mirror the top-N quartet pattern landed for harmony/glm47/qwen3/etc. in #9035, plus the parameterless-call shape from vLLM's pythonic test file. - `test_parse_tool_call_parse_pythonic_empty_args` (PARSER.batch.6) — `[get_weather()]` returns one call with `arguments={}`. Mirrors vLLM `test_pythonic_tool_parser.py::test_tool_call[parameterless_*]`. - `test_parse_pythonic_empty_and_whitespace_inputs` (PARSER.batch.9) — empty / whitespace-only inputs return 0 calls and empty content without panicking. Mirrors the #9035 quartet contract. - `test_parse_pythonic_duplicate_calls_same_name` (PARSER.batch.10) — two `get_weather` calls in one list surface with distinct ids (`call-1`, `call-2`). Pins the auto-generated-id contract for the duplicate-name case. `cargo test -p dynamo-parsers --lib tool_calling::pythonic` — 19/19 pass. Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

…audit Output of DIS-1926 (research vLLM parser test coverage gaps). Doc-only change to `lib/parsers/PARSER_CASES.md`; no source touched. `cargo check -p dynamo-parsers --tests` passes. Refinements driven by gaps surfaced during a bidirectional diff against vLLM `tests/tool_parsers/*` at commit b53c507bc91f87e28b03e9b54bbff7c76e97d58b: - Split PARSER.fmt.1 (function-name surface) from new PARSER.fmt.5 (argument-envelope shape: native call-ID preservation, JSON field-order tolerance, arguments↔parameters key alias). The old CASE.21 (and an earlier draft of PARSER.fmt.1) conflated both axes. - Broaden PARSER.fmt.3 examples beyond Kimi K2's singular vs plural section tokens to include Mistral pre-v11 vs v11+ wire formats, Llama 3 with vs without `<|python_tag|>`, Hermes `qwen25` registry alias. - Add `Known production gaps` section flagging Mistral v11+ wire format (`[TOOL_CALLS]name{...args}` name-then-object) — Dynamo's `ToolCallConfig::mistral()` only handles pre-v11 (JSON-array body), while vLLM tests v11 extensively. v11 is the current Mistral-Small / Mistral-Large production path. Largest single Dynamo parser gap surfaced by the audit. - Promote regex-timeout / parser-exception containment to Universal Gaps (vLLM has explicit `test_regex_timeout_handling` for llama3_json / llama4_pythonic / pythonic and `*_streaming_exception_returns_none` for Mistral; Dynamo relies on Rust regex linear-time guarantees but does not pin failure-containment paths). - Cross-ref PARSER.batch.1 happy-path → PARSER.fmt.5 native-ID sub-axis. - Update Applicability summary and `Adding a new parser` minimum viable set to cover fmt.{1..5}. The full per-test bidirectional audit (493 test rows across 36 parser families, mapped onto the new taxonomy) lives outside this commit. It informed every refinement above; the audit itself is not committed because it's a working artifact rather than a stable reference doc. Top-3 P0 gap status from the audit: 1. Mistral v11 wire format — STILL OPEN (parser doesn't exist; flagged in the new `Known production gaps` section). 2. PARSER.stream.{1..4} parser-tier — partial; DSv4 (#8946) and Gemma 4 (#8852) added coverage; Kimi K2 / Qwen3 / Hermes / Pythonic / Mistral parser-tier streaming tests still gap. 3. CASE.25 / FRONTEND.3 (`adjust_request`) — CLOSED for 7 families via 28 new tests in `lib/llm/tests/tool_choice.rs` (#8946 + #9035). Coverage PRs since 2026-05-05 baseline: #8888 (silent-drop recoveries), #8946 (DSv4 + Kimi K2 coverage), #9035 (top-N CASE.6+ quartet), #8852 (Gemma 4 family), #9127 (taxonomy rename). Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

Companion artifact to PR #9290 (PARSER_CASES.md taxonomy refinement). Adds the full per-test bidirectional audit that informed every change in that PR — every vLLM tool-parser test mapped onto the new (PR #9127) taxonomy with a clickable source link. `lib/parsers/VLLM_TEST_AUDIT.md` (new file, 906 lines, 493 distinct test rows): - **Source**: vLLM `main` at commit b53c507bc91f87e28b03e9b54bbff7c76e97d58b (`vllm/tool_parsers/*`, `tests/tool_parsers/*`, `tests/tool_use/*`, `tests/entrypoints/openai/tool_parsers/*`). - **Scope**: 421 explicit test functions + 72 inherited common-suite rows from `ToolParserTests`. - **Bucketing**: every row carries one or more `PARSER_CASES.md` / `REASONING_CASES.md` / `PIPELINE_CASES.md` / `FRONTEND_CASES.md` tags, plus a one-line behavioral note. Re-bucketing transformations applied (vs the original CASE.* labels the audit was first written against, before PR #9127): - 244 streaming rows split per-row into PARSER.stream.{1,2,3,4} (single-call assembly / multi-call assembly / partial-token chunking / streaming termination) - 26 fmt rows split per-row into PARSER.fmt.1 (function-name) vs PARSER.fmt.5 (argument-shape: native ID, JSON field-order, arguments↔parameters alias) - Out-of-PARSER-scope buckets relocated to sibling docs: CASE.{11,18,25} → FRONTEND.{1,3,5,6}; CASE.12 → PIPELINE.finish_reason; CASE.{9,10,17} → REASONING.batch.{1,2}; CASE.20 → `// helper`; CASE.16 → inline-regression annotation; CASE.26 dissolved into PARSER.batch.4 impl-defined recovery contract Two mis-bucketings caught and fixed during review: - FunctionGemma::test_multiple_tool_calls and Gemma4::TestExtractToolCalls.test_multiple_tool_calls were both labeled CASE.1 but assert len(tool_calls) == 2 — corrected to PARSER.batch.2. Four bucket-assignment refinements caught by review: - test_unique_tool_call_ids (DSv3.2) drops fmt.5 (no native call-ID surface; just parallel-call distinctness). - test_invalid_funcall_id_skipped (Kimi K2) moves fmt.5 → fmt.1 (validation, not preservation). - 3 Mistral `argument_before_name*` parametrized rows gain fmt.5 (canonical field-order swap test set referenced by PARSER_CASES.md). A staleness banner at the top documents the re-bucketing transformation and mis-bucket fixes for traceability. Top findings the audit informed (already addressed in PR #9290 or flagged for follow-up): 1. Mistral v11+ wire format — STILL OPEN (parser doesn't exist; flagged in PARSER_CASES.md "Known production gaps"). 2. PARSER.stream.{1..4} parser-tier coverage gap in 5 families (Kimi K2 / Qwen3 / Hermes / Pythonic / Mistral) — partial closure via DSv4 (#8946) and Gemma 4 (#8852). 3. CASE.25 / FRONTEND.3 (`adjust_request`) — CLOSED for 7 families via 28 new tests in `lib/llm/tests/tool_choice.rs` (#8946 + #9035). Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

the-david-oy requested review from a team as code owners April 29, 2026 20:04

pull-request-size Bot added the size/XXL label Apr 29, 2026

github-actions Bot added external-contribution Pull request is from an external contributor chore documentation Improvements or additions to documentation frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Apr 29, 2026

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

the-david-oy added 3 commits April 29, 2026 13:11

the-david-oy force-pushed the feat/gemma4-parsers branch from c65e5da to f9c7a6d Compare April 29, 2026 20:14

the-david-oy force-pushed the feat/gemma4-parsers branch from e1954c6 to f9c7a6d Compare April 29, 2026 20:23

rmccorm4 requested review from keivenchang and richardhuo-nv April 29, 2026 20:23

copy-pr-bot Bot temporarily deployed to GITLAB April 29, 2026 20:24 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 29, 2026 21:10 Inactive

keivenchang reviewed Apr 29, 2026

View reviewed changes

Comment thread lib/parsers/src/tool_calling/gemma4/parser.rs Outdated

Comment thread lib/llm/src/preprocessor.rs

keivenchang reviewed Apr 29, 2026

View reviewed changes

the-david-oy added 2 commits April 29, 2026 16:01

the-david-oy added 2 commits April 29, 2026 17:18

the-david-oy requested a review from keivenchang April 30, 2026 00:54

keivenchang reviewed Apr 30, 2026

View reviewed changes

Comment thread lib/llm/src/preprocessor.rs

copy-pr-bot Bot temporarily deployed to GITLAB April 30, 2026 06:00 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 30, 2026 16:32 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 30, 2026 16:41 Inactive

keivenchang approved these changes Apr 30, 2026

View reviewed changes

richardhuo-nv approved these changes Apr 30, 2026

View reviewed changes

rmccorm4 merged commit ce3e228 into ai-dynamo:main Apr 30, 2026
93 of 95 checks passed

keivenchang mentioned this pull request May 1, 2026

test(revalidate): chore(frontend): Add Gemma 4 parser support + Test Cases #8852 #8969

Closed

zhongdaor-nv mentioned this pull request May 7, 2026

chore(parsers): Mapping vllm parser tests to new PARSER_CASES.md taxonomy #9290

Merged

zhongdaor-nv mentioned this pull request May 8, 2026

docs(parsers): add DIS-1926 vLLM tool-parser test audit #9329

Draft

5 tasks

Conversation

the-david-oy commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Validation:

Out of scope (explicit non-goals — file as follow-ups if desired):

Watch list (upstream vLLM bugfix PRs the maintainers may want to track):

AI-Generated Code Disclosure:

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 29, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

the-david-oy commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-commit job failure

Uh oh!

rmccorm4 commented Apr 29, 2026

Uh oh!

rmccorm4 commented Apr 29, 2026

Uh oh!

keivenchang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

keivenchang left a comment

Choose a reason for hiding this comment

Uh oh!

keivenchang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

richardhuo-nv commented Apr 30, 2026

Uh oh!

the-david-oy commented Apr 30, 2026

Uh oh!

richardhuo-nv commented Apr 30, 2026

Uh oh!

the-david-oy commented Apr 30, 2026

Uh oh!

keivenchang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

the-david-oy commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

the-david-oy commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented Apr 29, 2026 •

edited

Loading

the-david-oy commented Apr 29, 2026 •

edited

Loading

keivenchang left a comment •

edited

Loading