% llama-server --device CUDA0 --model /opt/llamacppmodels/translategemma-27b-it.Q6_K.gguf --mmproj /opt/llamacppmodels/translategemma-27b-it.mmproj-Q8_0.gguf --port 6670 --host 0.0.0.0 -ctk q8_0 -ctv q8_0 --ctx-size 2000 --jinja --split-mode none --chat-template-file /opt/llamacppmodels/translategemma-template.jinja -v --no-warmup --verbose-prompt --reasoning_budget 0
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 1: Tesla P100-PCIE-16GB, compute capability 6.0, VMM: yes
register_backend: registered backend CUDA (2 devices)
register_device: registered device CUDA0 (NVIDIA GeForce RTX 3090)
register_device: registered device CUDA1 (Tesla P100-PCIE-16GB)
ggml_vulkan: Found 3 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Graphics (RADV RENOIR) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = NVIDIA GeForce RTX 3090 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
ggml_vulkan: 2 = Tesla P100-PCIE-16GB (NVIDIA) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
register_backend: registered backend Vulkan (3 devices)
register_device: registered device Vulkan0 (AMD Radeon Graphics (RADV RENOIR))
register_device: registered device Vulkan1 (NVIDIA GeForce RTX 3090)
register_device: registered device Vulkan2 (Tesla P100-PCIE-16GB)
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (OpenBLAS)
register_backend: registered backend RPC (0 devices)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (AMD Ryzen 7 5700G with Radeon Graphics)
Failed to generate tool call example: Conversations must start with a system context. at row 8, column 74:
{%- if (messages[0]['role'] != 'system') -%}
{{ raise_exception("Conversations must start with a system context.") }}
^
{%- endif -%}
at row 8, column 5:
{%- if (messages[0]['role'] != 'system') -%}
{{ raise_exception("Conversations must start with a system context.") }}
^
{%- endif -%}
at row 7, column 45:
{{ bos_token }}
{%- if (messages[0]['role'] != 'system') -%}
^
{{ raise_exception("Conversations must start with a system context.") }}
at row 7, column 1:
{{ bos_token }}
{%- if (messages[0]['role'] != 'system') -%}
^
{{ raise_exception("Conversations must start with a system context.") }}
at row 1, column 1:
{%- set languages = {
^
"en-US": "English",
common_chat_verify_template: failed to apply template: Conversations must start with a system context. at row 8, column 74:
{%- if (messages[0]['role'] != 'system') -%}
{{ raise_exception("Conversations must start with a system context.") }}
^
{%- endif -%}
at row 8, column 5:
{%- if (messages[0]['role'] != 'system') -%}
{{ raise_exception("Conversations must start with a system context.") }}
^
{%- endif -%}
at row 7, column 45:
{{ bos_token }}
{%- if (messages[0]['role'] != 'system') -%}
^
{{ raise_exception("Conversations must start with a system context.") }}
at row 7, column 1:
{{ bos_token }}
{%- if (messages[0]['role'] != 'system') -%}
^
{{ raise_exception("Conversations must start with a system context.") }}
at row 1, column 1:
{%- set languages = {
^
"en-US": "English",
error: the supplied chat template is not supported: {%- set languages = {
"en-US": "English",
"ru-RU": "Russian"
}
-%}
{{ bos_token }}
{%- if (messages[0]['role'] != 'system') -%}
{{ raise_exception("Conversations must start with a system context.") }}
{%- endif -%}
{%- if (messages[1]['role'] != 'user') -%}
{{ raise_exception("Conversations must continue with user request.") }}
{%- endif -%}
{%- for message in messages -%}
{%- if (message['role'] == 'user') != (loop.index0 % 2 == 1) -%}
{{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
{%- endif -%}
{%- if (message['role'] == 'assistant') -%}
{%- if message['content'] is none or message['content'] is not string -%}
{{ raise_exception("Assistant role must provide content as a string") }}
{%- endif -%}
{{ '<start_of_turn>model\n'}}
{{ message["content"] | trim }}
{{ '<end_of_turn>\n' }}
{%- elif (message['role'] == 'user') -%}
{%- if messages[0]["content"].split(";")|length == 3 -%}
{%- set content = message["content"] -%}
{%- set source_lang_code = messages[0]["content"].split(";")[0] | replace("_", "-") -%}
{%- set source_lang = languages[source_lang_code] -%}
{%- set target_lang_code = messages[0]["content"].split(";")[1] | replace("_", "-") -%}
{%- set target_lang = languages[target_lang_code] -%}
{%- set context = messages[0]["content"].split(";")[2] -%}
{{ '<start_of_turn>user\nYou are a professional ' + source_lang + ' (' + source_lang_code + ') to ' +
target_lang + ' (' + target_lang_code + ') translator. Your goal is to accurately convey the meaning and ' +
'nuances of the original ' + source_lang + ' text while adhering to ' + target_lang + ' grammar, ' +
'vocabulary, and cultural sensitivities. Context: ' + context + '\n'
}}
{{
'Produce only the ' + target_lang + ' translation, without any additional explanations or ' +
'commentary. Please translate the following ' + source_lang + ' text into ' + target_lang + ':\n\n\n' +
content | trim
}}
{%- else -%}
{{ '<start_of_turn>user\n' + message['content'] + '\n' }}
{%- endif -%}
{{ '<end_of_turn>\n' }}
{%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
{{'<start_of_turn>model\n'}}
{%- endif -%}
Name and Version
% llama-server --version
version: 7705 (9789e28)
built with GNU 15.2.1 for Linux x86_64
Operating systems
Linux
GGML backends
CPU, CUDA
Hardware
GPU: Nvidia RTX 3090
Models
Google TranslateGemma 27B (original, used GGUF)
Problem description & steps to reproduce
When running TranslateGemma model family with
--jinjaand--chat-template-file(as it requires a custom one),llama-serverfails to run, complainingFailed to generate tool call example: raised_excepton_from_template.I'm using a custom template that allows running the model with the current
llama.cppimplementation: translategemma-template.txtThe template explicitly checks that first message is the system message. However, some example checks trip over this while verifying that chat template is valid, and the
llama-serverstops.The issue isn't linked to a specific OS or backend.
Additional research: I'm able to run the model successfully by gating the common/arg.cpp:L629 check behind an additional flag. After load, template is correctly recognized as the one with system prompt support and without tools support.
I can do a PR with a flag to manually override this check. However, I'm also open to a discussion about alternative methods of running TranslateGemma.
First Bad Commit
No response
Relevant log output
Logs