Eval bug: Infinite repetition loop in llama-server with peg-gemma4 parser during tool calls

When using the newly implemented Gemma 4 (peg-gemma4) chat format in llama-server, the model enters an infinite repetition loop during tool-calling. The server appears to re-parse the entire model response turn for every single token generated, and the model continuously repeats the same tool call without ever reaching an EOT (End of Turn) or EOS.
Environment

    Build: llama.cpp (full-vulkan)
WARNING: radv is not a conformant Vulkan implementation, testing use only.
load_backend: loaded Vulkan backend from /app/libggml-vulkan.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
version: 8643 (f49e91787)
built with GNU 15.2.0 for Linux x86_64

    Server: llama-server

    Model: gemma-4-26B-A4B-it-UD-Q4_K_M.gguf

Actual Behavior

The model generates a tool call, but instead of finishing the turn, it repeats the same call indefinitely. The logs show the PEG parser running a full re-parse for every token:

The server continues to send streamed chunks for the same tool call index (e.g., index: 83) repeatedly. The model never exits the <|tool_call|> block.
Expected Behavior

The model should generate the tool call once, close the tags correctly (e.g., <tool_call|>), and stop generating to wait for the tool output.

### Operating systems

Linux

### GGML backends

Vulkan

### Hardware

E5-2640v4 on Supermicro X10DRi with ReBAR BIOS Patch applied (32G BAR available)
Radeon AI Pro R9700 32G

### Models

gemma-4-26B-A4B-it-UD-Q4_K_M.gguf

### Problem description & steps to reproduce

Steps to Reproduce

    Run llama-server with a Gemma 4 model using the default auto-detected peg-gemma4 format.
    Provide a system prompt containing tool definitions (e.g., Home Assistant Assist tools).
    Issue a prompt that triggers a tool call (e.g., "What is the outside temperature?").
    Observe the server logs and the client response.

### First Bad Commit

_No response_

### Relevant log output

Plaintext

[llama-cpp-vulkan] Parsing PEG input with format peg-gemma4: <|turn>model
[llama-cpp-vulkan] <|channel>thought
[llama-cpp-vulkan] <channel|><|tool_call>call:assist__GetLiveContext{}...
[llama-cpp-vulkan] slot process_toke: id 0 | next token: 70940 'assist'

[llama-cpp-vulkan] Parsing PEG input with format peg-gemma4: <|turn>model
[llama-cpp-vulkan] <|channel>thought
[llama-cpp-vulkan] <channel|><|tool_call>call:assist__GetLiveContext{}...
[llama-cpp-vulkan] slot process_toke: id 0 | next token: 1269 '__'

[llama-cpp-vulkan] Parsing PEG input with format peg-gemma4: <|turn>model
...


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Infinite repetition loop in llama-server with peg-gemma4 parser during tool calls #21375

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Infinite repetition loop in llama-server with peg-gemma4 parser during tool calls #21375

Description

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions