Skip to content

Eval bug: Chat template verification prevents running strict templates #18895

@Myp3a

Description

@Myp3a

Name and Version

% llama-server --version
version: 7705 (9789e28)
built with GNU 15.2.1 for Linux x86_64

Operating systems

Linux

GGML backends

CPU, CUDA

Hardware

GPU: Nvidia RTX 3090

Models

Google TranslateGemma 27B (original, used GGUF)

Problem description & steps to reproduce

When running TranslateGemma model family with --jinja and --chat-template-file (as it requires a custom one), llama-server fails to run, complaining Failed to generate tool call example: raised_excepton_from_template.
I'm using a custom template that allows running the model with the current llama.cpp implementation: translategemma-template.txt
The template explicitly checks that first message is the system message. However, some example checks trip over this while verifying that chat template is valid, and the llama-server stops.

The issue isn't linked to a specific OS or backend.


Additional research: I'm able to run the model successfully by gating the common/arg.cpp:L629 check behind an additional flag. After load, template is correctly recognized as the one with system prompt support and without tools support.
I can do a PR with a flag to manually override this check. However, I'm also open to a discussion about alternative methods of running TranslateGemma.

First Bad Commit

No response

Relevant log output

Logs
% llama-server --device CUDA0 --model /opt/llamacppmodels/translategemma-27b-it.Q6_K.gguf --mmproj /opt/llamacppmodels/translategemma-27b-it.mmproj-Q8_0.gguf --port 6670 --host 0.0.0.0 -ctk q8_0 -ctv q8_0 --ctx-size 2000 --jinja --split-mode none --chat-template-file /opt/llamacppmodels/translategemma-template.jinja -v --no-warmup --verbose-prompt --reasoning_budget 0                                 
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: Tesla P100-PCIE-16GB, compute capability 6.0, VMM: yes
register_backend: registered backend CUDA (2 devices)
register_device: registered device CUDA0 (NVIDIA GeForce RTX 3090)
register_device: registered device CUDA1 (Tesla P100-PCIE-16GB)
ggml_vulkan: Found 3 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Graphics (RADV RENOIR) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 3090 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
ggml_vulkan: 2 = Tesla P100-PCIE-16GB (NVIDIA) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
register_backend: registered backend Vulkan (3 devices)
register_device: registered device Vulkan0 (AMD Radeon Graphics (RADV RENOIR))
register_device: registered device Vulkan1 (NVIDIA GeForce RTX 3090)
register_device: registered device Vulkan2 (Tesla P100-PCIE-16GB)
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (OpenBLAS)
register_backend: registered backend RPC (0 devices)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (AMD Ryzen 7 5700G with Radeon Graphics)
Failed to generate tool call example: Conversations must start with a system context. at row 8, column 74:
{%- if (messages[0]['role'] != 'system') -%}
    {{ raise_exception("Conversations must start with a system context.") }}
                                                                         ^
{%- endif -%}
 at row 8, column 5:
{%- if (messages[0]['role'] != 'system') -%}
    {{ raise_exception("Conversations must start with a system context.") }}
    ^
{%- endif -%}
 at row 7, column 45:
{{ bos_token }}
{%- if (messages[0]['role'] != 'system') -%}
                                            ^
    {{ raise_exception("Conversations must start with a system context.") }}
 at row 7, column 1:
{{ bos_token }}
{%- if (messages[0]['role'] != 'system') -%}
^
    {{ raise_exception("Conversations must start with a system context.") }}
 at row 1, column 1:
{%- set languages = {
^
    "en-US": "English",

common_chat_verify_template: failed to apply template: Conversations must start with a system context. at row 8, column 74:
{%- if (messages[0]['role'] != 'system') -%}
    {{ raise_exception("Conversations must start with a system context.") }}
                                                                         ^
{%- endif -%}
 at row 8, column 5:
{%- if (messages[0]['role'] != 'system') -%}
    {{ raise_exception("Conversations must start with a system context.") }}
    ^
{%- endif -%}
 at row 7, column 45:
{{ bos_token }}
{%- if (messages[0]['role'] != 'system') -%}
                                            ^
    {{ raise_exception("Conversations must start with a system context.") }}
 at row 7, column 1:
{{ bos_token }}
{%- if (messages[0]['role'] != 'system') -%}
^
    {{ raise_exception("Conversations must start with a system context.") }}
 at row 1, column 1:
{%- set languages = {
^
    "en-US": "English",

error: the supplied chat template is not supported: {%- set languages = {
    "en-US": "English",
    "ru-RU": "Russian"
}
-%}
{{ bos_token }}
{%- if (messages[0]['role'] != 'system') -%}
    {{ raise_exception("Conversations must start with a system context.") }}
{%- endif -%}
{%- if (messages[1]['role'] != 'user') -%}
    {{ raise_exception("Conversations must continue with user request.") }}
{%- endif -%}
{%- for message in messages -%}
    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 1) -%}
        {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
    {%- endif -%}
    {%- if (message['role'] == 'assistant') -%}
        {%- if message['content'] is none or message['content'] is not string -%}
            {{ raise_exception("Assistant role must provide content as a string") }}
        {%- endif -%}
        {{ '<start_of_turn>model\n'}}
        {{ message["content"] | trim }}
        {{ '<end_of_turn>\n' }}
    {%- elif (message['role'] == 'user') -%}
        {%- if messages[0]["content"].split(";")|length == 3 -%}
            {%- set content = message["content"] -%}
            {%- set source_lang_code = messages[0]["content"].split(";")[0] | replace("_", "-")  -%}
            {%- set source_lang = languages[source_lang_code] -%}
            {%- set target_lang_code = messages[0]["content"].split(";")[1] | replace("_", "-") -%}
            {%- set target_lang = languages[target_lang_code] -%}
            {%- set context = messages[0]["content"].split(";")[2] -%}
            {{ '<start_of_turn>user\nYou are a professional ' + source_lang + ' (' + source_lang_code + ') to ' +
            target_lang + ' (' + target_lang_code + ') translator. Your goal is to accurately convey the meaning and ' +
            'nuances of the original ' + source_lang + ' text while adhering to ' + target_lang + ' grammar, ' +
            'vocabulary, and cultural sensitivities. Context: ' + context + '\n'
            }}
            {{
                    'Produce only the ' + target_lang + ' translation, without any additional explanations or ' +
                    'commentary. Please translate the following ' + source_lang + ' text into ' + target_lang + ':\n\n\n' +
                    content | trim
            }}
        {%- else -%}
            {{ '<start_of_turn>user\n' + message['content'] + '\n' }}
        {%- endif -%}
        {{ '<end_of_turn>\n' }}
    {%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
    {{'<start_of_turn>model\n'}}
{%- endif -%}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions