Eval bug: Chat template verification prevents running strict templates

### Name and Version

% llama-server --version
version: 7705 (9789e28459)
built with GNU 15.2.1 for Linux x86_64

### Operating systems

Linux

### GGML backends

CPU, CUDA

### Hardware

GPU: Nvidia RTX 3090

### Models

Google TranslateGemma 27B ([original](https://huggingface.co/google/translategemma-27b-it), [used GGUF](https://huggingface.co/mradermacher/translategemma-27b-it-GGUF))

### Problem description & steps to reproduce

When running TranslateGemma model family with `--jinja` and `--chat-template-file` (as it requires a custom one), `llama-server` fails to run, complaining `Failed to generate tool call example: raised_excepton_from_template`.
I'm using a custom template that allows running the model with the current `llama.cpp` implementation: [translategemma-template.txt](https://github.com/user-attachments/files/24689020/translategemma-template.txt)
The template explicitly checks that first message is the system message. However, some example checks trip over this while verifying that chat template is valid, and the `llama-server` stops.

The issue isn't linked to a specific OS or backend.

---
Additional research: I'm able to run the model successfully by gating the [common/arg.cpp:L629](https://github.com/ggml-org/llama.cpp/blob/master/common/arg.cpp#L629) check behind an additional flag. After load, template is correctly recognized as the one with system prompt support and without tools support.
I can do a PR with a flag to manually override this check. However, I'm also open to a discussion about alternative methods of running TranslateGemma.

### First Bad Commit

_No response_

### Relevant log output

<details>
<summary>Logs</summary>


```console
% llama-server --device CUDA0 --model /opt/llamacppmodels/translategemma-27b-it.Q6_K.gguf --mmproj /opt/llamacppmodels/translategemma-27b-it.mmproj-Q8_0.gguf --port 6670 --host 0.0.0.0 -ctk q8_0 -ctv q8_0 --ctx-size 2000 --jinja --split-mode none --chat-template-file /opt/llamacppmodels/translategemma-template.jinja -v --no-warmup --verbose-prompt --reasoning_budget 0                                 
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: Tesla P100-PCIE-16GB, compute capability 6.0, VMM: yes
register_backend: registered backend CUDA (2 devices)
register_device: registered device CUDA0 (NVIDIA GeForce RTX 3090)
register_device: registered device CUDA1 (Tesla P100-PCIE-16GB)
ggml_vulkan: Found 3 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Graphics (RADV RENOIR) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 3090 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
ggml_vulkan: 2 = Tesla P100-PCIE-16GB (NVIDIA) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
register_backend: registered backend Vulkan (3 devices)
register_device: registered device Vulkan0 (AMD Radeon Graphics (RADV RENOIR))
register_device: registered device Vulkan1 (NVIDIA GeForce RTX 3090)
register_device: registered device Vulkan2 (Tesla P100-PCIE-16GB)
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (OpenBLAS)
register_backend: registered backend RPC (0 devices)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (AMD Ryzen 7 5700G with Radeon Graphics)
Failed to generate tool call example: Conversations must start with a system context. at row 8, column 74:
{%- if (messages[0]['role'] != 'system') -%}
    {{ raise_exception("Conversations must start with a system context.") }}
                                                                         ^
{%- endif -%}
 at row 8, column 5:
{%- if (messages[0]['role'] != 'system') -%}
    {{ raise_exception("Conversations must start with a system context.") }}
    ^
{%- endif -%}
 at row 7, column 45:
{{ bos_token }}
{%- if (messages[0]['role'] != 'system') -%}
                                            ^
    {{ raise_exception("Conversations must start with a system context.") }}
 at row 7, column 1:
{{ bos_token }}
{%- if (messages[0]['role'] != 'system') -%}
^
    {{ raise_exception("Conversations must start with a system context.") }}
 at row 1, column 1:
{%- set languages = {
^
    "en-US": "English",

common_chat_verify_template: failed to apply template: Conversations must start with a system context. at row 8, column 74:
{%- if (messages[0]['role'] != 'system') -%}
    {{ raise_exception("Conversations must start with a system context.") }}
                                                                         ^
{%- endif -%}
 at row 8, column 5:
{%- if (messages[0]['role'] != 'system') -%}
    {{ raise_exception("Conversations must start with a system context.") }}
    ^
{%- endif -%}
 at row 7, column 45:
{{ bos_token }}
{%- if (messages[0]['role'] != 'system') -%}
                                            ^
    {{ raise_exception("Conversations must start with a system context.") }}
 at row 7, column 1:
{{ bos_token }}
{%- if (messages[0]['role'] != 'system') -%}
^
    {{ raise_exception("Conversations must start with a system context.") }}
 at row 1, column 1:
{%- set languages = {
^
    "en-US": "English",

error: the supplied chat template is not supported: {%- set languages = {
    "en-US": "English",
    "ru-RU": "Russian"
}
-%}
{{ bos_token }}
{%- if (messages[0]['role'] != 'system') -%}
    {{ raise_exception("Conversations must start with a system context.") }}
{%- endif -%}
{%- if (messages[1]['role'] != 'user') -%}
    {{ raise_exception("Conversations must continue with user request.") }}
{%- endif -%}
{%- for message in messages -%}
    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 1) -%}
        {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
    {%- endif -%}
    {%- if (message['role'] == 'assistant') -%}
        {%- if message['content'] is none or message['content'] is not string -%}
            {{ raise_exception("Assistant role must provide content as a string") }}
        {%- endif -%}
        {{ '<start_of_turn>model\n'}}
        {{ message["content"] | trim }}
        {{ '<end_of_turn>\n' }}
    {%- elif (message['role'] == 'user') -%}
        {%- if messages[0]["content"].split(";")|length == 3 -%}
            {%- set content = message["content"] -%}
            {%- set source_lang_code = messages[0]["content"].split(";")[0] | replace("_", "-")  -%}
            {%- set source_lang = languages[source_lang_code] -%}
            {%- set target_lang_code = messages[0]["content"].split(";")[1] | replace("_", "-") -%}
            {%- set target_lang = languages[target_lang_code] -%}
            {%- set context = messages[0]["content"].split(";")[2] -%}
            {{ '<start_of_turn>user\nYou are a professional ' + source_lang + ' (' + source_lang_code + ') to ' +
            target_lang + ' (' + target_lang_code + ') translator. Your goal is to accurately convey the meaning and ' +
            'nuances of the original ' + source_lang + ' text while adhering to ' + target_lang + ' grammar, ' +
            'vocabulary, and cultural sensitivities. Context: ' + context + '\n'
            }}
            {{
                    'Produce only the ' + target_lang + ' translation, without any additional explanations or ' +
                    'commentary. Please translate the following ' + source_lang + ' text into ' + target_lang + ':\n\n\n' +
                    content | trim
            }}
        {%- else -%}
            {{ '<start_of_turn>user\n' + message['content'] + '\n' }}
        {%- endif -%}
        {{ '<end_of_turn>\n' }}
    {%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
    {{'<start_of_turn>model\n'}}
{%- endif -%}
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Chat template verification prevents running strict templates #18895

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Chat template verification prevents running strict templates #18895

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions