Fix gemma4 prefill parsing by Quairon-Nailo · Pull Request #22325 · ggml-org/llama.cpp

Quairon-Nailo · 2026-04-24T15:29:11Z

Overview

This PR fixes an issue in the peg-gemma4 chat format parser where prefilled reasoning blocks cause the parser to fail and silently drop the rest of the model's response.

The Problem:
As reported and discussed here, the parser's thought rule currently strictly expects the generation to begin with the <|channel> token. When a frontend (like SillyTavern) prefills the <|channel> token into the context to force reasoning, the newly generated text begins with thought\n... or similar. Because it lacks the leading <|channel>, the thought rule fails. The parser falls back to evaluating it as standard content, which is designed to halt at an unmatched <channel|> token to prevent trailing hallucinations. Consequently, the parser stops prematurely right after the reasoning block, truncating the actual response.

The Solution:

The parser now scans data.prompt specifically within the current assistant turn (after <|turn>model) to detect if there is an unclosed <|channel> tag. If an unclosed tag is detected in the current turn, the parser modifies its root structure to use a first-message rule. This bypasses standard choice (|) evaluations, eliminating PEG buffering and allowing the prefilled reasoning block to stream live to the user. A new resumed-thought rule uses an optional prefix ("thought") and captures all text up to the closing <channel|> tag. This gracefully handles both standard tag-only prefills and arbitrary text prefills seamlessly.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES - AI assisted in diagnosing the PEG grammar limitations with prefilled tags, formulating the resumed-thought fallback rule, and adding the context-boundary check to restore streaming functionality. I have manually reviewed, rigorously tested edge cases, and fully understand the implemented logic.

aldehir · 2026-04-24T16:47:13Z

If its primarily a prefill issue, I guess I'll see how we can accommodate. Seems to be a hot issue lately...

Quairon-Nailo added 2 commits April 24, 2026 12:41

common/gemma4 : fix parsing of prefilled reasoning blocks

3229eb2

common/gemma4 : fix parsing of prefilled reasoning blocks

3f168c0

Quairon-Nailo requested a review from a team as a code owner April 24, 2026 15:29

Quairon-Nailo mentioned this pull request Apr 24, 2026

common/gemma4 : handle parsing edge cases #21760

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix gemma4 prefill parsing#22325

Fix gemma4 prefill parsing#22325
Quairon-Nailo wants to merge 2 commits intoggml-org:masterfrom
Quairon-Nailo:fix-gemma4-prefill-parsing

Quairon-Nailo commented Apr 24, 2026

Uh oh!

aldehir commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Quairon-Nailo commented Apr 24, 2026

Overview

Requirements

Uh oh!

aldehir commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants