Skip to content

Fix gemma4 prefill parsing#22325

Open
Quairon-Nailo wants to merge 2 commits intoggml-org:masterfrom
Quairon-Nailo:fix-gemma4-prefill-parsing
Open

Fix gemma4 prefill parsing#22325
Quairon-Nailo wants to merge 2 commits intoggml-org:masterfrom
Quairon-Nailo:fix-gemma4-prefill-parsing

Conversation

@Quairon-Nailo
Copy link
Copy Markdown

Overview

This PR fixes an issue in the peg-gemma4 chat format parser where prefilled reasoning blocks cause the parser to fail and silently drop the rest of the model's response.

The Problem:
As reported and discussed here, the parser's thought rule currently strictly expects the generation to begin with the <|channel> token. When a frontend (like SillyTavern) prefills the <|channel> token into the context to force reasoning, the newly generated text begins with thought\n... or similar. Because it lacks the leading <|channel>, the thought rule fails. The parser falls back to evaluating it as standard content, which is designed to halt at an unmatched <channel|> token to prevent trailing hallucinations. Consequently, the parser stops prematurely right after the reasoning block, truncating the actual response.

The Solution:

  1. The parser now scans data.prompt specifically within the current assistant turn (after <|turn>model) to detect if there is an unclosed <|channel> tag. If an unclosed tag is detected in the current turn, the parser modifies its root structure to use a first-message rule. This bypasses standard choice (|) evaluations, eliminating PEG buffering and allowing the prefilled reasoning block to stream live to the user. A new resumed-thought rule uses an optional prefix ("thought") and captures all text up to the closing <channel|> tag. This gracefully handles both standard tag-only prefills and arbitrary text prefills seamlessly.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES - AI assisted in diagnosing the PEG grammar limitations with prefilled tags, formulating the resumed-thought fallback rule, and adding the context-boundary check to restore streaming functionality. I have manually reviewed, rigorously tested edge cases, and fully understand the implemented logic.

@aldehir
Copy link
Copy Markdown
Contributor

aldehir commented Apr 24, 2026

If its primarily a prefill issue, I guess I'll see how we can accommodate. Seems to be a hot issue lately...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants