Skip to content

Eval bug: Infinite repetition loop in llama-server with peg-gemma4 parser during tool calls #21375

@mamama1

Description

@mamama1

When using the newly implemented Gemma 4 (peg-gemma4) chat format in llama-server, the model enters an infinite repetition loop during tool-calling. The server appears to re-parse the entire model response turn for every single token generated, and the model continuously repeats the same tool call without ever reaching an EOT (End of Turn) or EOS.
Environment

Build: llama.cpp (full-vulkan)

WARNING: radv is not a conformant Vulkan implementation, testing use only.
load_backend: loaded Vulkan backend from /app/libggml-vulkan.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
version: 8643 (f49e917)
built with GNU 15.2.0 for Linux x86_64

Server: llama-server

Model: gemma-4-26B-A4B-it-UD-Q4_K_M.gguf

Actual Behavior

The model generates a tool call, but instead of finishing the turn, it repeats the same call indefinitely. The logs show the PEG parser running a full re-parse for every token:

The server continues to send streamed chunks for the same tool call index (e.g., index: 83) repeatedly. The model never exits the <|tool_call|> block.
Expected Behavior

The model should generate the tool call once, close the tags correctly (e.g., <tool_call|>), and stop generating to wait for the tool output.

Operating systems

Linux

GGML backends

Vulkan

Hardware

E5-2640v4 on Supermicro X10DRi with ReBAR BIOS Patch applied (32G BAR available)
Radeon AI Pro R9700 32G

Models

gemma-4-26B-A4B-it-UD-Q4_K_M.gguf

Problem description & steps to reproduce

Steps to Reproduce

Run llama-server with a Gemma 4 model using the default auto-detected peg-gemma4 format.
Provide a system prompt containing tool definitions (e.g., Home Assistant Assist tools).
Issue a prompt that triggers a tool call (e.g., "What is the outside temperature?").
Observe the server logs and the client response.

First Bad Commit

No response

Relevant log output

Plaintext

[llama-cpp-vulkan] Parsing PEG input with format peg-gemma4: <|turn>model
[llama-cpp-vulkan] <|channel>thought
[llama-cpp-vulkan] <channel|><|tool_call>call:assist__GetLiveContext{}...
[llama-cpp-vulkan] slot process_toke: id 0 | next token: 70940 'assist'

[llama-cpp-vulkan] Parsing PEG input with format peg-gemma4: <|turn>model
[llama-cpp-vulkan] <|channel>thought
[llama-cpp-vulkan] <channel|><|tool_call>call:assist__GetLiveContext{}...
[llama-cpp-vulkan] slot process_toke: id 0 | next token: 1269 '__'

[llama-cpp-vulkan] Parsing PEG input with format peg-gemma4: <|turn>model
...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions