Translategemma:Enhance common/jinja's ability to detect model chat templates#19043
Translategemma:Enhance common/jinja's ability to detect model chat templates#19043xiaobing318 wants to merge 10 commits intoggml-org:masterfrom
Conversation
修改docker.yml文件中的内容使其停止周期性的运行该workflow,如果想要运行该workflow可以手动启动
… Jinja's ability to detect array content in typed content, thus correctly detecting TranslateGemma chat template capabilities.
There was a problem hiding this comment.
Adding a cap for a single use case chat template is a bad idea. We already know the chat template that has this feature is translategemma, why need a cap?
Caps are only for detecting features that are shared by multiple templates
| // TranslateGemma format detection | ||
| if (src.find("source_lang_code") != std::string::npos && | ||
| src.find("target_lang_code") != std::string::npos && | ||
| src.find("You are a professional") != std::string::npos) { |
|
I think you haven't yet looked into this PR: #19019 I prefer the implementation in the mentioned PR as it is more simple. Will close this one to avoid duplicate efforts |
@ngxson |
|
The language is included per-message, not global level kwarg |
The following are the test case I used and the result I obtained. {
"model": "translategemma-4b-it",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"source_lang_code": "en",
"target_lang_code": "cs",
"text": "this is a test"
}
]
}
],
"temperature": 0,
"max_tokens": 2048
}Response Body {
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "This is a test."
}
}
],
"created": 1769170032,
"model": "translategemma-4b-it.Q8_0.gguf",
"system_fingerprint": "b7827-de4e19dde",
"object": "chat.completion",
"usage": {
"completion_tokens": 6,
"prompt_tokens": 83,
"total_tokens": 89
},
"id": "chatcmpl-B5OPiRcZpSzkVg8rdlmmbhdXHRu6ES6M",
"timings": {
"cache_n": 0,
"prompt_n": 83,
"prompt_ms": 96.262,
"prompt_per_token_ms": 1.1597831325301204,
"prompt_per_second": 862.2301635120815,
"predicted_n": 6,
"predicted_ms": 185.518,
"predicted_per_token_ms": 30.919666666666668,
"predicted_per_second": 32.34187518192305
}
} |
|
Hmm ok, seems like the language fields are discarded when converting from/to internal representation of common_chat_msg I think the most simple way is to allow them to be passed via global kwarg, should be ~10 lines of code to be added. I will push a fix |
Thank you very much. It would be great if you could also provide examples of using images and text (via the request body). |
|
PTAL: #19052 |
Make sure to read the contributing guidelines before submitting a PR
Questions and motivation
When running
llama-serverwith this command, the logs (shown below) indicate that an error occurred while parsing TranslateGemma’s built-in chat template, so it fell back to the defaultchatmlchat template. After tracing the issue, I found that capability probing incommon/jinjafor the TranslateGemma chat template produced a false negative, so I enhancedcommon/jinja’s detection logic. In addition, I added initialization for the TranslateGemma chat template.Test examples
Note
For flexibility (rather than targeting only the specific TranslateGemma model), the request body does not support TranslateGemma’s officially recommended request format and needs to be adjusted to be compatible with the request format used by
llama.cpp. The differences are shown below.Recommended request format
{ "model": "translategemma-4b-it", "messages": [ { "role": "user", "content": [ { "type": "image", "source_lang_code": "en", "target_lang_code": "zh", "url": "https://c7.alamy.com/comp/2YAX36N/traffic-signs-in-czech-republic-pedestrian-zone-2YAX36N.jpg" } ] } ], "temperature": 0, "max_tokens": 2048 }Request format compatible with
llama.cpp{ "model": "translategemma-4b-it", "messages": [ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "https://c7.alamy.com/comp/2YAX36N/traffic-signs-in-czech-republic-pedestrian-zone-2YAX36N.jpg", "detail": "auto" } } ] } ], "chat_template_kwargs": { "source_lang_code": "en", "target_lang_code": "zh" }, "temperature": 0, "max_tokens": 2048 }Text data
1. Single message
{ "model": "translategemma-4b-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "welcome, translategemma model." } ] } ], "chat_template_kwargs": { "source_lang_code": "en", "target_lang_code": "ar-QA" }, "temperature": 0, "max_tokens": 2048 }2. Multiple messages
{ "model": "translategemma-4b-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "welcome, translategemma model." } ] }, { "role": "assistant", "content": "أهلاً وسهلاً، نموذج ترجمة جيما." }, { "role": "user", "content": [ { "type": "text", "text": "Parameter support can differ depending on the model used to generate the response, particularly for newer reasoning models. Parameters that are only supported for reasoning models are noted below. " } ] } ], "chat_template_kwargs": { "source_lang_code": "en", "target_lang_code": "ar-QA" }, "temperature": 0, "max_tokens": 2048 }Internet image data
1. Single message
{ "model": "translategemma-4b-it", "messages": [ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "https://c7.alamy.com/comp/2YAX36N/traffic-signs-in-czech-republic-pedestrian-zone-2YAX36N.jpg", "detail": "auto" } } ] } ], "chat_template_kwargs": { "source_lang_code": "cs", "target_lang_code": "zh" }, "temperature": 0, "max_tokens": 2048 }Local image data
1. Single message
{ "model": "translategemma-4b-it", "messages": [ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "file://cs.png", "detail": "auto" } } ] } ], "chat_template_kwargs": { "source_lang_code": "cs", "target_lang_code": "zh" }, "temperature": 0, "max_tokens": 2048 }2. Single message
{ "model": "translategemma-4b-it", "messages": [ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "file://cs.png", "detail": "auto" } } ] }, { "role": "assistant", "content": "Pedestrian zone" }, { "role": "user", "content": [ { "type": "text", "text": "Toto je čínská věta." } ] } ], "chat_template_kwargs": { "source_lang_code": "cs", "target_lang_code": "zh" }, "temperature": 0, "max_tokens": 2048 }