Skip to content

chore(frontend): Add Gemma 4 parser support + Test Cases#8852

Merged
rmccorm4 merged 9 commits into
ai-dynamo:mainfrom
the-david-oy:feat/gemma4-parsers
Apr 30, 2026
Merged

chore(frontend): Add Gemma 4 parser support + Test Cases#8852
rmccorm4 merged 9 commits into
ai-dynamo:mainfrom
the-david-oy:feat/gemma4-parsers

Conversation

@the-david-oy
Copy link
Copy Markdown
Contributor

@the-david-oy the-david-oy commented Apr 29, 2026

Overview:

Add tool-call and reasoning parsers for Google Gemma 4 thinking models to Dynamo's frontend, plus a vendored chat template. With this change, Dynamo's native preprocessor (mode A in docs/agents/chat-processor-options.md) can serve any Gemma 4 deployment that runs on vLLM ≥ v0.19.0 (where Gemma 4 support landed via vllm-project/vllm#38826 and #39027) end-to-end with structured tool_calls and reasoning_content.

Gemma 4's serialization grammar does not match any existing Dynamo parser family, so this PR introduces a new Gemma4 variant in ParserConfig rather than force-fitting it into JSON/XML/DSML. The grammar uses bare unquoted keys, a custom <|"|> string delimiter, and fixed <|tool_call>...<tool_call|> markers:

<|tool_call>call:get_weather{location:<|"|>Tokyo<|"|>,unit:<|"|>celsius<|"|>,count:5}<tool_call|>

Reasoning is wrapped between channel markers with a thought\n role-label prefix that this parser strips:

<|channel>thought
...chain of thought reasoning...<channel|>
Final answer text.

Recommended pairing for a Gemma 4 deployment:

--dyn-tool-call-parser gemma4 \
--dyn-reasoning-parser gemma4 \
--custom-jinja-template examples/chat_templates/gemma4_tool.jinja

Details:

The PR is two commits to make the upstream-diff alignments easy to review separately from the initial port.

Commit 1 — chore(frontend): Add Gemma 4 parser support + Test Cases

New files:

  • lib/parsers/src/tool_calling/gemma4/{mod.rs,parser.rs} — recursive-descent parser for the Gemma 4 argument grammar (booleans, numbers, <|"|>-delimited strings, nested objects, arrays) emitting serde_json::Value. Inline #[cfg(test)] coverage of CASE.1–CASE.24 from lib/parsers/TEST_CASES.md.
  • lib/parsers/src/reasoning/gemma4_parser.rsGemma4ReasoningParser implementing the ReasoningParser trait directly (mirrors granite_parser.rs shape) with <|channel>...<channel|> delimiters, thought\n prefix stripping (the 3-case streaming logic from upstream's reasoning parser), and partial-marker overlap detection at chunk boundaries.
  • examples/chat_templates/gemma4_tool.jinja — verbatim copy of upstream vLLM examples/tool_chat_template_gemma4.jinja with SPDX attribution. Required because the stock HF Gemma 4 chat template does not emit the <|"|>-delimited tool-definition encoding the parser expects.
  • examples/chat_templates/README.md — per-template usage notes.

Modified files:

  • lib/parsers/src/tool_calling/{config.rs,parsers.rs,mod.rs} — added ParserConfig::Gemma4 variant, ToolCallConfig::gemma4() factory, registered names gemma4 (alias gemma-4), and dispatched try_tool_call_parse_gemma4 / detect_tool_call_start_gemma4 / find_tool_call_end_position_gemma4 from the central registry.
  • lib/parsers/src/reasoning/mod.rs — added ReasoningParserType::Gemma4 variant and registered the same names.
  • lib/parsers/README.md — bumped parser counts (17 → 18 tool-call, 14 → 15 reasoning) and added rows in the parser-family cheat sheets.
  • docs/agents/{tool-calling,reasoning}.md — added parser-table rows and the recommended-pairing line.

Commit 2 — fix(frontend): align Gemma 4 parsers with upstream vLLM main

Behavior-diff against upstream vLLM main (recently-merged Gemma 4 PRs and bugfix follow-ups) surfaced four real bugs in the initial port plus one reasoning-parser gap and one preprocessor optimization gap. All fixed:

  1. Null-keyword case-insensitivity (parser.rs) — upstream's _parse_gemma4_value does value_str.lower() in ("null","none","nil"). The first-pass port hard-coded a small set of mixed-case spellings and missed NONE, Nul, etc. Replaced with try_consume_keyword that does ASCII-case-insensitive matching with a trailing-word-char guard so nullable doesn't get mis-matched as null plus leftovers.
  2. Unterminated <|"|> string in args (parser.rs) — upstream's _parse_gemma4_args:146-149 takes "everything after the opening delimiter" as the value when the closing delimiter is missing. The first-pass port bailed instead, which then triggered the entire-args-object empty-fallback in parse_single_call — i.e. a single truncated trailing arg silently lost ALL prior args. Now mirrors upstream.
  3. Empty value key: followed by ,/}/EOF (parser.rs) — upstream's _parse_gemma4_args:128-131 emits {"key": ""}. The first-pass port errored out of parse_value and dropped the whole args object. Detect this case in parse_object_body and insert an empty-string.
  4. Truncated tool call (parser.rs) — upstream extract_tool_calls:421-424 returns the raw model_output as content when no complete tool call could be parsed. The first-pass port consumed those bytes silently, hiding model truncation from callers. Now echoes the bytes back as normal_text.
  5. Reasoning: dangling end marker (gemma4_parser.rs) — when <channel|> is present but <|channel> is absent (tokenizer with skip_special_tokens=True strips the start token), upstream's offline parser still extracts text-before as reasoning. Added a (None, Some(end)) match arm.
  6. is_reasoning_disabled_by_request polish (lib/llm/src/preprocessor.rs) — added gemma4/gemma-4 recognition with enable_thinking: false short-circuit, matching the existing pattern for kimi_k25 / deepseek_v4 / nemotron_nano. Without this, the parser still ran on every chunk just to fall through (since no <|channel> markers arrive when the chat template doesn't inject <|think|>).

One intentional divergence from upstream: between-call text is preserved (Sure thing. <call> All set."Sure thing. All set.") rather than dropped, because that matches the established kimi_k2_parser.rs pattern in the rest of Dynamo's parser registry. Documented inline.

Where should the reviewer start?

  1. lib/parsers/src/tool_calling/gemma4/parser.rs — the centerpiece. Pay particular attention to:
    • The recursive-descent Cursor / parse_object_body / parse_value / parse_array / parse_number / try_consume_keyword chain (lines ~198–402). The grammar parses straight into serde_json::Value.
    • try_tool_call_parse_gemma4 (lines ~91–131) for the outer call-extraction loop and the truncated-call echo behavior.
    • The #[cfg(test)] mod tests block — 31 tests covering CASE.1–CASE.24 plus targeted cases from upstream's tests/tool_parsers/test_gemma4_tool_parser.py (test_unterminated_string, test_empty_value, test_incomplete_tool_call, parallel calls, function names with hyphens/dots, HTML/newlines in string values, case-insensitive null aliases, keyword-prefix non-consumption).
  2. lib/parsers/src/reasoning/gemma4_parser.rs — 12 tests including the new detect_dangling_end_marker_* cases that mirror upstream's INVALID_SIMPLE / INVALID_COMPLETE test fixtures, plus the streaming thought\n prefix-split-across-deltas cases.
  3. lib/parsers/src/tool_calling/config.rs — the new ParserConfig::Gemma4 unit variant and the ToolCallConfig::gemma4() factory. The variant is unit (no per-instance config) because Gemma 4's markers are fixed and not user-tunable, unlike KimiK2ParserConfig / Glm47ParserConfig.
  4. lib/parsers/src/tool_calling/parsers.rs — the registry/dispatch edits: map.insert("gemma4", ...) + the new dispatch arm in try_tool_call_parse and the partial-prefix detection arm in detect_tool_call_start.
  5. lib/llm/src/preprocessor.rs — only the is_reasoning_disabled_by_request table entry plus four parametric test cases. Worth glancing at to confirm it follows the existing kimi_k25 / deepseek_v4 pattern.
  6. examples/chat_templates/gemma4_tool.jinja — verbatim copy of upstream vLLM's template, only a 3-line SPDX/comment header added at the top. Reviewers can diff against https://github.com/vllm-project/vllm/blob/main/examples/tool_chat_template_gemma4.jinja to verify.

Validation:

  • cargo build -p dynamo-parsers — clean (0 errors, 0 warnings).
  • cargo test -p dynamo-parsers --lib gemma444/44 pass.
  • cargo test -p dynamo-parsers (full crate) — 443 pass, 5 ignored, no regressions in other parsers.
  • cargo fmt --check --all — workspace clean.
  • cargo clippy -p dynamo-parsers — no warnings on the new code (one pre-existing warning in glm47_parser.rs:177, untouched by this PR).
  • pre-commit run --files <13 changed files> — all hooks pass (codespell, case-conflict, merge-conflict, mixed-line-ending, trailing-whitespace, pytest-marker-report).
  • cargo test -p dynamo-llm --lib is_reasoning_disabled_by_request — could not run on the development host because the dynamo-llm crate has a pre-existing macOS link error (libstdc++ not available on modern macOS, replaced by libc++). The four new parametric test cases are small and surgical, mirroring the existing pattern for kimi_k25 and deepseek_v4; CI on Linux will exercise them.

Out of scope (explicit non-goals — file as follow-ups if desired):

  • Native Rust chat-template formatter for Gemma 4 (analogous to lib/llm/src/preprocessor/prompt/deepseek_v4.rs). The vendored jinja covers Phase 1 via --custom-jinja-template; a native formatter is a several-day port of the 333-line jinja and not blocking.
  • SGLang frontend wiring (components/src/dynamo/frontend/sglang_prepost.py). Only relevant for --dyn-chat-processor sglang (mode C). Whether SGLang has its own gemma4 parser is a separate question.
  • Streaming-jail integration tests under lib/llm/tests/data/vllm/gemma4/. Those need real captured chunks from a Gemma 4 vLLM deployment; will follow up after a real-world capture.
  • Backporting to older Dynamo release branches.

Watch list (upstream vLLM bugfix PRs the maintainers may want to track):

The vLLM Gemma 4 parsers are still settling — there are several open bugfix PRs touching internals (#39484 STRING_DELIM leak, #39311 nested-array bracket handling, #39070 argument truncation, #39588 selective reasoning, #39885 multi-turn streaming leak, #39570 chat-template HF resync). The public marker contract and parser name gemma4 are stable, so this PR is unaffected, but the vendored jinja may want a periodic re-sync against upstream once #39570 lands. Filing a tracking issue is recommended.

AI-Generated Code Disclosure:

Per Dynamo's contribution guide (docs/contribution-guide.md §AI-Generated Code): AI assistance was used to draft this change. Every changed line was manually reviewed against the upstream vLLM reference implementations (vllm/tool_parsers/gemma4_tool_parser.py, vllm/reasoning/gemma4_reasoning_parser.py, examples/tool_chat_template_gemma4.jinja) and against the analog Dynamo parsers (kimi_k2_parser.rs, granite_parser.rs, dsml/parser.rs). The submitter understands and can defend every line of the change end-to-end, including the recursive-descent grammar parser, the streaming thought\n prefix-stripping state machine, and the four upstream-diff fixes in commit 2.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)


Open in Devin Review

Summary by CodeRabbit

  • New Features

    • Added support for Gemma 4 models with tool-calling and reasoning parsing capabilities
    • New custom Jinja chat template for Gemma 4 tool formatting
  • Documentation

    • Added documentation for Gemma 4 tool-calling and reasoning parsers
    • Added README guide for custom Jinja chat templates with Gemma 4 example

@the-david-oy the-david-oy requested review from a team as code owners April 29, 2026 20:04
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 29, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 29, 2026

@github-actions github-actions Bot added external-contribution Pull request is from an external contributor chore documentation Improvements or additions to documentation frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Apr 29, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Walkthrough

This PR adds comprehensive Gemma 4 model support by introducing a tool-calling parser with custom <|tool_call> grammar, a channel-based reasoning parser for thinking outputs, a corresponding Jinja chat template, and infrastructure to register these parsers in the existing framework.

Changes

Cohort / File(s) Summary
Documentation & Examples
docs/agents/reasoning.md, docs/agents/tool-calling.md, examples/chat_templates/README.md, lib/parsers/README.md
Added documentation for new gemma4 reasoning and tool-calling parsers, including model family support, vLLM aliases, grammar details, configuration guidance, and updated parser registry counts (tool-calling: 17→18, reasoning: 14→15).
Chat Template
examples/chat_templates/gemma4_tool.jinja
New Jinja template for Gemma 4 tool formatting, with macros for tool/function declarations, argument serialization, reasoning/thought channels, and tool response blocks. Handles multimodal content (text/image/audio/video) and streaming generation prompts.
Reasoning Parser Implementation
lib/parsers/src/reasoning/gemma4_parser.rs
New Gemma4ReasoningParser struct extracting thinking content from `<
Reasoning Parser Infrastructure
lib/parsers/src/reasoning/mod.rs
Added Gemma4ReasoningParser re-export, ReasoningParserType::Gemma4 enum variant, and registry entries for "gemma4" and "gemma-4" aliases.
Tool-Calling Parser Implementation
lib/parsers/src/tool_calling/gemma4/parser.rs, lib/parsers/src/tool_calling/gemma4/mod.rs
New Gemma 4 tool-calling parser detecting `<
Tool-Calling Infrastructure
lib/parsers/src/tool_calling/mod.rs, lib/parsers/src/tool_calling/config.rs, lib/parsers/src/tool_calling/parsers.rs
Added gemma4 module, ParserConfig::Gemma4 variant, ToolCallConfig::gemma4() factory, and registry dispatch for both "gemma4" and "gemma-4" aliases routing to Gemma 4 parser helpers.
Preprocessor Logic
lib/llm/src/preprocessor.rs
Extended is_reasoning_disabled_by_request to recognize gemma4/gemma-4 aliases and consult chat_template_args["enable_thinking"] to determine reasoning disable state, with fallback to false when missing.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: addition of Gemma 4 parser support and test cases for the frontend.
Description check ✅ Passed The PR description comprehensively covers the required template sections: Overview, Details, Where should the reviewer start, and Related Issues, with extensive technical depth and validation results.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
lib/parsers/src/tool_calling/gemma4/parser.rs (1)

4-5: Incomplete reference link in comment.

The PR reference link on line 5 contains a placeholder (...) instead of the actual PR number. Consider either completing the link or removing it if the upstream PR hasn't been merged yet.

 // Reference implementations:
-// https://github.com/vllm-project/vllm/pull/.../vllm/tool_parsers/gemma4_tool_parser.py
+// https://github.com/vllm-project/vllm/blob/main/vllm/tool_parsers/gemma4_tool_parser.py
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/parsers/src/tool_calling/gemma4/parser.rs` around lines 4 - 5, The inline
comment in parser.rs contains an incomplete PR URL
("https://github.com/vllm-project/vllm/pull/.../vllm/tool_parsers/gemma4_tool_parser.py");
update that comment by replacing the "..." with the actual PR number/path if
available, or remove the placeholder link entirely (keep or replace with a note
like "reference implementation in vllm/tool_parsers/gemma4_tool_parser.py" if
the PR is not merged) so the reference in the header comment of
lib/parsers/src/tool_calling/gemma4/parser.rs is accurate and not misleading.
lib/parsers/src/reasoning/gemma4_parser.rs (1)

102-122: Verify underflow safety in resolve_prefix.

Line 104 computes accum.len() - raw_reasoning.len(). While the current call sites ensure raw_reasoning is always a suffix of accum (pushed before calling), this invariant isn't documented or enforced at the function boundary.

Consider adding a debug assertion or a brief comment documenting this precondition:

📝 Suggested documentation
 /// Case 3: accumulated reasoning diverged from the prefix — emit the
 ///   buffered reasoning verbatim (data preservation).
+///
+/// # Precondition
+/// `raw_reasoning` must be a suffix of `accum` (i.e., `accum.len() >= raw_reasoning.len()`).
 fn resolve_prefix<'a>(accum: &'a str, raw_reasoning: &'a str) -> (&'a str, bool) {
+    debug_assert!(accum.len() >= raw_reasoning.len(), "raw_reasoning must be a suffix of accum");
     if accum.starts_with(THOUGHT_PREFIX) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/parsers/src/reasoning/gemma4_parser.rs` around lines 102 - 122, The
function resolve_prefix relies on the precondition that raw_reasoning is a
suffix of accum (used when computing accum.len() - raw_reasoning.len()), so add
a debug assertion and brief comment documenting this invariant: insert something
like a comment above resolve_prefix mentioning "precondition: raw_reasoning must
be a suffix of accum" and a debug_assert!(accum.ends_with(raw_reasoning)) at the
start of resolve_prefix to make the precondition explicit and fail fast in debug
builds; reference resolve_prefix and the THOUGHT_PREFIX logic when adding the
assertion and comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@lib/parsers/src/reasoning/gemma4_parser.rs`:
- Around line 102-122: The function resolve_prefix relies on the precondition
that raw_reasoning is a suffix of accum (used when computing accum.len() -
raw_reasoning.len()), so add a debug assertion and brief comment documenting
this invariant: insert something like a comment above resolve_prefix mentioning
"precondition: raw_reasoning must be a suffix of accum" and a
debug_assert!(accum.ends_with(raw_reasoning)) at the start of resolve_prefix to
make the precondition explicit and fail fast in debug builds; reference
resolve_prefix and the THOUGHT_PREFIX logic when adding the assertion and
comment.

In `@lib/parsers/src/tool_calling/gemma4/parser.rs`:
- Around line 4-5: The inline comment in parser.rs contains an incomplete PR URL
("https://github.com/vllm-project/vllm/pull/.../vllm/tool_parsers/gemma4_tool_parser.py");
update that comment by replacing the "..." with the actual PR number/path if
available, or remove the placeholder link entirely (keep or replace with a note
like "reference implementation in vllm/tool_parsers/gemma4_tool_parser.py" if
the PR is not merged) so the reference in the header comment of
lib/parsers/src/tool_calling/gemma4/parser.rs is accurate and not misleading.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7704350f-d1fc-436c-8990-6ab2d110a063

📥 Commits

Reviewing files that changed from the base of the PR and between 665b54b and c65e5da.

📒 Files selected for processing (13)
  • docs/agents/reasoning.md
  • docs/agents/tool-calling.md
  • examples/chat_templates/README.md
  • examples/chat_templates/gemma4_tool.jinja
  • lib/llm/src/preprocessor.rs
  • lib/parsers/README.md
  • lib/parsers/src/reasoning/gemma4_parser.rs
  • lib/parsers/src/reasoning/mod.rs
  • lib/parsers/src/tool_calling/config.rs
  • lib/parsers/src/tool_calling/gemma4/mod.rs
  • lib/parsers/src/tool_calling/gemma4/parser.rs
  • lib/parsers/src/tool_calling/mod.rs
  • lib/parsers/src/tool_calling/parsers.rs

Add tool-call and reasoning parsers for Google Gemma 4 thinking models.

Gemma 4 uses a custom non-JSON serialization grammar that does not match any
existing parser family:

  <|tool_call>call:func_name{location:<|"|>Tokyo<|"|>,count:42}<tool_call|>

Bare unquoted keys, custom <|"|> string delimiter, supports nested objects
and arrays, multiple tool calls concatenated without separators.

Reasoning grammar is wrapped between channel markers with a `thought\n`
role-label prefix that this parser strips:

  <|channel>thought\n...chain of thought...<channel|>

Changes:

* Tool-call parser (lib/parsers/src/tool_calling/gemma4/) — new ParserConfig
  variant `Gemma4`; recursive-descent parser into `serde_json::Value`;
  registered as `gemma4` (alias `gemma-4`); inline #[cfg(test)] coverage of
  CASE.1..CASE.24 from TEST_CASES.md including malformed args, truncation
  recovery, parallel calls, function names with hyphens/dots, and HTML/newlines
  inside string values.
* Reasoning parser (lib/parsers/src/reasoning/gemma4_parser.rs) — implements
  the `ReasoningParser` trait directly (mirrors `granite_parser.rs` style)
  with streaming `thought\n` prefix-stripping (3-case logic: full prefix,
  strict prefix buffered, diverged emitted verbatim) and partial-marker
  overlap detection at chunk boundaries; registered as `gemma4`
  (alias `gemma-4`).
* Vendored chat template (examples/chat_templates/gemma4_tool.jinja) —
  verbatim copy of vLLM's tool_chat_template_gemma4.jinja with SPDX
  attribution. Required because the upstream HF Gemma 4 chat template does
  not emit the <|"|>-delimited tool-definition encoding the parser expects.
  Pass via `--custom-jinja-template`.
* Docs updates (docs/agents/{tool-calling,reasoning}.md, lib/parsers/README.md)
  with parser table rows and a recommended-pairing line:
  `--dyn-tool-call-parser gemma4 --dyn-reasoning-parser gemma4
   --custom-jinja-template examples/chat_templates/gemma4_tool.jinja`

Reference: vllm-project/vllm tool_parsers/gemma4_tool_parser.py and
reasoning/gemma4_reasoning_parser.py.

AI assistance was used to draft this change; all changed lines were
manually reviewed against the upstream vLLM reference implementation.

Signed-off-by: David Oy <david.oy@baseten.co>
Behavior-diff against upstream vLLM main turned up four real bugs in
the initial port plus one reasoning-parser gap and one optimization
gap. All fixed; tests added.

Tool-call parser (lib/parsers/src/tool_calling/gemma4/parser.rs):

* Null-keyword case-sensitivity (upstream `value_str.lower() in (...)`):
  the previous hard-coded list missed `NONE`, `Nul`, etc. Replaced with
  `try_consume_keyword` that does ASCII-case-insensitive matching with
  a trailing-word-char guard so `nullable` doesn't get mis-matched as
  the `null` keyword plus leftovers.
* Unterminated `<|"|>` string in args (upstream
  `_parse_gemma4_args:146-149`): when the closing delimiter is missing,
  upstream takes "everything after the opening delimiter" as the value.
  We were bailing instead, which then triggered the entire-args-object
  empty-fallback in `parse_single_call` — i.e. a single truncated
  trailing arg silently lost ALL prior args. Now we mirror upstream and
  return the truncated tail as the string value.
* Empty value `key:` followed by `,` / `}` / EOF (upstream
  `_parse_gemma4_args:128-131`): upstream emits `{"key": ""}`. We were
  erroring out of `parse_value` and dropping the whole args object.
  Detect this case in `parse_object_body` and insert an empty-string.
* Truncated tool call (start marker, no end marker): upstream
  `extract_tool_calls:421-424` returns the raw `model_output` as
  content when no complete tool call could be parsed. We were
  consuming the bytes silently, hiding model truncation from callers.
  Echo the bytes back as `normal_text` instead.

Reasoning parser (lib/parsers/src/reasoning/gemma4_parser.rs):

* Dangling end marker (`<channel|>` present, `<|channel>` absent): this
  happens when a tokenizer with `skip_special_tokens=True` strips the
  start token but leaves the end token. Upstream's offline parser
  still extracts text-before as reasoning. We were returning the whole
  text as `normal_text` and losing the reasoning content. Added a
  match arm for `(None, Some(end))` that splits on the end marker.

Preprocessor optimization (lib/llm/src/preprocessor.rs):

* `is_reasoning_disabled_by_request` now recognizes `gemma4`/`gemma-4`
  with `enable_thinking: false` and short-circuits the parser, matching
  the existing pattern for `kimi_k25` / `deepseek_v4` / `nemotron_nano`.
  Without this, the parser still ran on every chunk just to fall
  through (since no `<|channel>` markers arrive when the chat template
  doesn't inject `<|think|>`).

Tests added/renamed:

* `truncated_tail_echoed_complete_prior_survives` (was
  `truncated_call_dropped_complete_prior_survives`).
* `args_grammar_unterminated_string_takes_remainder` (was
  `args_grammar_unterminated_string_errors`).
* `args_grammar_empty_value_yields_empty_string`,
  `args_grammar_trailing_empty_value`,
  `args_grammar_null_aliases_case_insensitive`,
  `args_grammar_keyword_prefix_not_consumed`,
  `incomplete_tool_call_echoes_raw_bytes`.
* `detect_dangling_end_marker_extracts_prefix_as_reasoning`,
  `detect_dangling_end_marker_strips_thought_prefix`.
* Four new cases in `test_is_reasoning_disabled_by_request` covering
  `gemma4` and `gemma-4` with `enable_thinking: false` / `true` / no args.

Behavior intentionally diverges from upstream in one place: Rust's
between-call text join (`Sure thing. <call> All set.` → preserves
`"Sure thing.  All set."`) follows the established kimi_k2 Dynamo
pattern rather than upstream Gemma 4's prefix-only behavior. This
keeps the parser registry internally consistent.

AI assistance was used to draft this change; all changed lines were
manually reviewed against the upstream vLLM reference implementation.

Signed-off-by: David Oy <david.oy@baseten.co>
* parser.rs:4 — replace placeholder PR URL (`/pull/.../...`) with the
  resolved upstream link (`/blob/main/...`) and a note that vLLM PR
  #38826 / #39027 are merged. Reviewers can now click through.
* gemma4_parser.rs::resolve_prefix — make the "raw_reasoning is a
  suffix of accum" precondition explicit with a doc-comment block and
  a `debug_assert!` that fails fast in debug builds. Documents the
  invariant that the streaming driver upholds (it always pushes the
  raw delta into self.reasoning_accum immediately before calling
  resolve_prefix).

No behavior change in release builds; only docstring clarity and a
debug-mode assertion. Tests still 44/44.

Signed-off-by: David Oy <david.oy@baseten.co>
@the-david-oy
Copy link
Copy Markdown
Contributor Author

the-david-oy commented Apr 29, 2026

Pushed f9c7a6d with two updates:

  1. Rebased onto current main (was on 2e31245, now on 665b54b). The rebase was clean, no conflicts.
  2. Addressed both CodeRabbit nits in a third commit chore(frontend): address CodeRabbit nits on Gemma 4 parsers:
    • parser.rs:4 — replaced the placeholder PR URL with the resolved upstream link and noted vLLM PR #38826 / #39027 are merged.
    • gemma4_parser.rs::resolve_prefix — made the "raw_reasoning is a suffix of accum" precondition explicit via doc-block + debug_assert!. Behavior unchanged in release builds.

Pre-commit job failure

The pre-commit check was failing on pre-existing breakage in main, not on anything in this PR. Specifically the pytest-marker-report hook reports 49 tests missing required markers across files I do not touch — most prominently components/src/dynamo/common/tests/test_video_protocol.py and test_video_utils.py, which were added 2 hours ago by #8491 (feat: formalize video streaming). Confirming this is upstream:

  • gh run list --repo ai-dynamo/dynamo --branch main shows the most recent Pre Merge run on main commit 665b54b (the one I rebased onto) is also failure.
  • The hook has pass_filenames: false in .pre-commit-config.yaml, so it always validates the full repo regardless of which files a given PR changed.
  • All other CI checks on this PR are passing or pending: cargo clippy, DCO, copyright-checks, lychee, dco-comment, changed-files, pr_reminder, Validate PR title, rust-clippy (.) etc. all green.

This is due to:

So three test files now block every external-contributor PR rebased onto main:

  • components/src/dynamo/common/tests/test_video_protocol.py (25 tests)
  • components/src/dynamo/common/tests/test_video_utils.py (7 tests)
  • components/src/dynamo/common/tests/test_audio_protocol.py (17 tests)

@rmccorm4
Copy link
Copy Markdown
Contributor

Thanks @the-david-oy ! CC @richardhuo-nv @keivenchang to help review

@rmccorm4
Copy link
Copy Markdown
Contributor

/ok to test f9c7a6d

Copy link
Copy Markdown
Contributor

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I added comments on test-case categories (and pushed it). No logic change.

Comment thread lib/parsers/src/tool_calling/gemma4/parser.rs Outdated
Comment thread lib/llm/src/preprocessor.rs
Augment the 12 previously-unannotated tests across the gemma4 tool-call
and reasoning parsers with their corresponding CASE.* numbers from
lib/parsers/TEST_CASES.md, matching the convention already used by the
other tests in this PR and across the rest of the parser suite.

Comment-only — no functional changes; cargo test output unchanged.

Mapping summary:
- args_grammar_*: CASE.4/5/6/7 (grammar entry-point + truncation paths)
- incomplete_tool_call_echoes_raw_bytes: CASE.5 (missing end marker)
- detect_dangling_end_marker_*: CASE.5, CASE.13 (missing-start-marker recovery)
- streaming_multiple_reasoning_spans: CASE.2 (multiple spans)

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two notes — both pointers, no inline comments needed.

1. Pre-commit hook unrelated to this PR. pytest-marker-report currently fails on this branch with 49 missing-marker tests across test_audio_protocol.py, test_video_protocol.py, test_video_utils.py. those 49 are pre-existing on the branch — main has 0 missing. fix already merged on main as cc1af6a90 (#8849). merging main / rebasing should clear the hook.

2. CASE. coverage gaps to close (per lib/parsers/TEST_CASES.md).*

  • To add: CASE.9 (paired reasoning + tool in same response), CASE.14 (empty tool_calls / null at API edge), CASE.15 (duplicate tool calls — universal gap, no parser covers this today).
  • Framework-level (cross-parser suites, not in this PR's scope): CASE.11 (tool_choice modes), CASE.12 (finish_reason).
  • N/A (worth a one-line // CASE.X — N/A per the doc's stated convention): CASE.23 (markers fixed, no variants), CASE.xml1/2 (not XML-family), CASE.harmony1 (not Harmony).

Two small changes from @keivenchang on PR ai-dynamo#8852:

* `try_tool_call_parse_gemma4`: add a one-line note explaining that
  the first `<tool_call|>` after the start marker wins and that we
  don't scan inside `<|"|>` string values for embedded markers — so
  a model emitting a tool-call literal inside a string argument will
  truncate the call. Matches upstream vLLM regex behavior.
* `is_reasoning_disabled_by_request` `gemma4` arm: switch from a
  hand-rolled `enable_thinking` check to the shared
  `thinking_bool_from_args` helper that the `deepseek_v4` arm uses,
  for symmetry. The helper already accepts both `thinking` and
  `enable_thinking` keys, so behavior is unchanged.

No test changes needed; existing parametric cases continue to pass.

Signed-off-by: David Oy <david.oy@baseten.co>
Replace the manual `find(TOOL_CALL_START)` + `find(TOOL_CALL_END)`
extraction in `try_tool_call_parse_gemma4` with a direct
`regex.captures_iter(message)` pass, mirroring upstream's
`extract_tool_calls` in vllm/tool_parsers/gemma4_tool_parser.py.

Why: the manual approach used flat byte-search for the end marker
without requiring `}<tool_call|>` adjacency. A `<tool_call|>` literal
embedded inside a `<|"|>` string value (e.g. a `description` arg
documenting the tool-call format) would falsely terminate the call,
causing the regex on the truncated block to fail and the call to be
silently dropped. The regex-based approach requires the `}` adjacency
and is robust to the embedded case.

Also fix `find_tool_call_end_position_gemma4` similarly so the
streaming jail respects the same adjacency.

Tests: 46/46 (added `embedded_tool_call_marker_in_string_value` and
`find_end_position_skips_embedded_marker`).

Plus a comment-cleanup pass on this PR's earlier task-historical
inline comments — keeping only those that document non-obvious
invariants (e.g. resolve_prefix precondition, regex adjacency note).

Signed-off-by: David Oy <david.oy@baseten.co>
vLLM's `Gemma4ToolParser` and `Gemma4ReasoningParser` rely on the
engine emitting `<|tool_call>`, `<|channel>`, etc. — all special-token
IDs in the Gemma 4 tokenizer — verbatim in the decoded text. Upstream's
OAI server flips `skip_special_tokens=False` per-request via the
`adjust_request` hook on each parser. Dynamo's mode-A frontend bypasses
that hook, so requests against `--dyn-tool-call-parser gemma4` /
`--dyn-reasoning-parser gemma4` get the engine default
(`skip_special_tokens=true`), which strips the markers before the
parser sees them. End result: tool_calls + reasoning_content silently
empty, raw text dumped into `content`.

Add a `parser_requires_special_tokens` helper in the preprocessor and
intercept `output_options.skip_special_tokens` in `preprocess_request`:
when the helper returns true and the user hasn't explicitly set the
field, default it to `Some(false)` so the parsers actually see the
markers they're matching on.

Narrow per-parser table for now (gemma4 + gemma-4 alias only). Other
upstream parsers (hermes, kimi_k2, jamba, deepseek_v3.2, etc.) also
have `adjust_request` hooks for the same reason and may be silently
broken in mode A today; that's a separate audit / follow-up.

Tests: parametric `test_parser_requires_special_tokens` covers the
positive (gemma4/gemma-4, paired or single) and negative (hermes,
kimi_k2, no parsers) cases. CI will exercise the integration via
the existing dynamo-llm test suite on Linux.

Signed-off-by: David Oy <david.oy@baseten.co>
CI's `cargo fmt --check` flagged the parametric tuples in
`test_parser_requires_special_tokens` — single-line form exceeded the
max-width and rustfmt wanted them expanded. Run `cargo fmt -p
dynamo-llm` to apply. No behavior change.

Signed-off-by: David Oy <david.oy@baseten.co>
Copy link
Copy Markdown
Contributor

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey David, both round-1 comments addressed cleanly — regex-based extraction is the right shape, and the skip_special_tokens hook is a nice unprompted add. tiny future-facing nit inline.

can you also go through all the cases in lib/parsers/TEST_CASES.md and confirm coverage / mark explicit N/A for the ones that don't apply? i think 9, 14, 15 are still uncovered.

Almost there, thanks!

Comment thread lib/llm/src/preprocessor.rs
@richardhuo-nv
Copy link
Copy Markdown
Contributor

/ok to test bd44405

Per reviewer audit against lib/parsers/TEST_CASES.md, three generic
categories were uncovered in the gemma4 tests:

* CASE.9  — paired reasoning + tool call. Added one test on each side:
  the tool-call parser must extract the call when `<|channel>` markers
  are also present in the input (production runs the reasoning parser
  first, but the tool-call parser should be robust on its own); the
  reasoning parser must NOT consume the following `<|tool_call>` markers
  as part of reasoning content.
* CASE.14 — empty content / null args. Two tests: empty input yields
  zero calls + empty normal_text; `null` / `none` / `nil` arg values
  round-trip as JSON null.
* CASE.15 — duplicate tool calls (same function name). Verifies both
  calls appear with distinct UUIDs; client decides whether duplicate
  invocation is intended (per the documented contract in TEST_CASES.md).

Plus explicit N/A annotations for cases that don't apply to gemma4 in
both test modules — universal cross-parser gaps (CASE.11 tool_choice,
CASE.12 finish_reason — hermes-only suites today), family-specific
categories (CASE.xml*, CASE.harmony1), and CASE.8 streaming (covered
indirectly via the streaming jail's CASE.20 helpers, same pattern as
kimi_k2). Surfaces gaps to a future reader rather than leaving them
ambiguously absent.

Tests: 51/51 pass (was 46).
Signed-off-by: David Oy <david.oy@baseten.co>
@the-david-oy
Copy link
Copy Markdown
Contributor Author

can you also go through all the cases in lib/parsers/TEST_CASES.md and confirm coverage / mark explicit N/A for the ones that don't apply? i think 9, 14, 15 are still uncovered.

Absolutely, done!

@richardhuo-nv
Copy link
Copy Markdown
Contributor

/ok to test 54a5fca

@the-david-oy
Copy link
Copy Markdown
Contributor Author

The failing docs link check is coming from the main branch, not any of the changes in this PR.

Copy link
Copy Markdown
Contributor

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work David — all CASE gaps closed, N/A annotations match the doc convention. thanks, ship it. Oh, and that Doc link error, it is not a Mandatory check.

@the-david-oy
Copy link
Copy Markdown
Contributor Author

Thank you! Are you able to merge? I don't think I have permission.

@rmccorm4 rmccorm4 merged commit ce3e228 into ai-dynamo:main Apr 30, 2026
93 of 95 checks passed
furionw pushed a commit that referenced this pull request May 2, 2026
Signed-off-by: David Oy <david.oy@baseten.co>
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
zhongdaor-nv added a commit that referenced this pull request May 8, 2026
… family

Pythonic was the only top-N family untouched by the recent coverage PRs
(#8888, #8946, #8846, #9035, #8852). Add three small batch-mode tests
that mirror the top-N quartet pattern landed for harmony/glm47/qwen3/etc.
in #9035, plus the parameterless-call shape from vLLM's pythonic test
file.

- `test_parse_tool_call_parse_pythonic_empty_args` (PARSER.batch.6) —
  `[get_weather()]` returns one call with `arguments={}`. Mirrors
  vLLM `test_pythonic_tool_parser.py::test_tool_call[parameterless_*]`.
- `test_parse_pythonic_empty_and_whitespace_inputs` (PARSER.batch.9) —
  empty / whitespace-only inputs return 0 calls and empty content
  without panicking. Mirrors the #9035 quartet contract.
- `test_parse_pythonic_duplicate_calls_same_name` (PARSER.batch.10) —
  two `get_weather` calls in one list surface with distinct ids
  (`call-1`, `call-2`). Pins the auto-generated-id contract for the
  duplicate-name case.

`cargo test -p dynamo-parsers --lib tool_calling::pythonic` — 19/19 pass.

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
zhongdaor-nv added a commit that referenced this pull request May 8, 2026
…audit

Output of DIS-1926 (research vLLM parser test coverage gaps). Doc-only
change to `lib/parsers/PARSER_CASES.md`; no source touched. `cargo check
-p dynamo-parsers --tests` passes.

Refinements driven by gaps surfaced during a bidirectional diff against
vLLM `tests/tool_parsers/*` at commit b53c507bc91f87e28b03e9b54bbff7c76e97d58b:

- Split PARSER.fmt.1 (function-name surface) from new PARSER.fmt.5
  (argument-envelope shape: native call-ID preservation, JSON field-order
  tolerance, arguments↔parameters key alias). The old CASE.21 (and an
  earlier draft of PARSER.fmt.1) conflated both axes.
- Broaden PARSER.fmt.3 examples beyond Kimi K2's singular vs plural
  section tokens to include Mistral pre-v11 vs v11+ wire formats, Llama 3
  with vs without `<|python_tag|>`, Hermes `qwen25` registry alias.
- Add `Known production gaps` section flagging Mistral v11+ wire format
  (`[TOOL_CALLS]name{...args}` name-then-object) — Dynamo's
  `ToolCallConfig::mistral()` only handles pre-v11 (JSON-array body),
  while vLLM tests v11 extensively. v11 is the current Mistral-Small /
  Mistral-Large production path. Largest single Dynamo parser gap
  surfaced by the audit.
- Promote regex-timeout / parser-exception containment to Universal Gaps
  (vLLM has explicit `test_regex_timeout_handling` for llama3_json /
  llama4_pythonic / pythonic and `*_streaming_exception_returns_none` for
  Mistral; Dynamo relies on Rust regex linear-time guarantees but does
  not pin failure-containment paths).
- Cross-ref PARSER.batch.1 happy-path → PARSER.fmt.5 native-ID sub-axis.
- Update Applicability summary and `Adding a new parser` minimum viable
  set to cover fmt.{1..5}.

The full per-test bidirectional audit (493 test rows across 36 parser
families, mapped onto the new taxonomy) lives outside this commit. It
informed every refinement above; the audit itself is not committed
because it's a working artifact rather than a stable reference doc.

Top-3 P0 gap status from the audit:

1. Mistral v11 wire format — STILL OPEN (parser doesn't exist; flagged in
   the new `Known production gaps` section).
2. PARSER.stream.{1..4} parser-tier — partial; DSv4 (#8946) and Gemma 4
   (#8852) added coverage; Kimi K2 / Qwen3 / Hermes / Pythonic / Mistral
   parser-tier streaming tests still gap.
3. CASE.25 / FRONTEND.3 (`adjust_request`) — CLOSED for 7 families via 28
   new tests in `lib/llm/tests/tool_choice.rs` (#8946 + #9035).

Coverage PRs since 2026-05-05 baseline: #8888 (silent-drop recoveries),
#8946 (DSv4 + Kimi K2 coverage), #9035 (top-N CASE.6+ quartet), #8852
(Gemma 4 family), #9127 (taxonomy rename).

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
keivenchang pushed a commit that referenced this pull request May 15, 2026
Companion artifact to PR #9290 (PARSER_CASES.md taxonomy refinement).
Adds the full per-test bidirectional audit that informed every change in
that PR — every vLLM tool-parser test mapped onto the new (PR #9127)
taxonomy with a clickable source link.

`lib/parsers/VLLM_TEST_AUDIT.md` (new file, 906 lines, 493 distinct
test rows):

- **Source**: vLLM `main` at commit b53c507bc91f87e28b03e9b54bbff7c76e97d58b
  (`vllm/tool_parsers/*`, `tests/tool_parsers/*`, `tests/tool_use/*`,
  `tests/entrypoints/openai/tool_parsers/*`).
- **Scope**: 421 explicit test functions + 72 inherited common-suite
  rows from `ToolParserTests`.
- **Bucketing**: every row carries one or more `PARSER_CASES.md` /
  `REASONING_CASES.md` / `PIPELINE_CASES.md` / `FRONTEND_CASES.md`
  tags, plus a one-line behavioral note.

Re-bucketing transformations applied (vs the original CASE.* labels
the audit was first written against, before PR #9127):

- 244 streaming rows split per-row into PARSER.stream.{1,2,3,4}
  (single-call assembly / multi-call assembly / partial-token
  chunking / streaming termination)
- 26 fmt rows split per-row into PARSER.fmt.1 (function-name) vs
  PARSER.fmt.5 (argument-shape: native ID, JSON field-order,
  arguments↔parameters alias)
- Out-of-PARSER-scope buckets relocated to sibling docs:
  CASE.{11,18,25} → FRONTEND.{1,3,5,6}; CASE.12 →
  PIPELINE.finish_reason; CASE.{9,10,17} → REASONING.batch.{1,2};
  CASE.20 → `// helper`; CASE.16 → inline-regression annotation;
  CASE.26 dissolved into PARSER.batch.4 impl-defined recovery
  contract

Two mis-bucketings caught and fixed during review:
- FunctionGemma::test_multiple_tool_calls and
  Gemma4::TestExtractToolCalls.test_multiple_tool_calls were both
  labeled CASE.1 but assert len(tool_calls) == 2 — corrected to
  PARSER.batch.2.

Four bucket-assignment refinements caught by review:
- test_unique_tool_call_ids (DSv3.2) drops fmt.5 (no native call-ID
  surface; just parallel-call distinctness).
- test_invalid_funcall_id_skipped (Kimi K2) moves fmt.5 → fmt.1
  (validation, not preservation).
- 3 Mistral `argument_before_name*` parametrized rows gain fmt.5
  (canonical field-order swap test set referenced by PARSER_CASES.md).

A staleness banner at the top documents the re-bucketing transformation
and mis-bucket fixes for traceability.

Top findings the audit informed (already addressed in PR #9290 or
flagged for follow-up):

1. Mistral v11+ wire format — STILL OPEN (parser doesn't exist;
   flagged in PARSER_CASES.md "Known production gaps").
2. PARSER.stream.{1..4} parser-tier coverage gap in 5 families
   (Kimi K2 / Qwen3 / Hermes / Pythonic / Mistral) — partial
   closure via DSv4 (#8946) and Gemma 4 (#8852).
3. CASE.25 / FRONTEND.3 (`adjust_request`) — CLOSED for 7 families
   via 28 new tests in `lib/llm/tests/tool_choice.rs` (#8946 + #9035).

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore documentation Improvements or additions to documentation external-contribution Pull request is from an external contributor frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants