Skip to content

Eval bug: Trailing quotation marks dropped from chat completion response on Kimi K2.5 and Minimax M2.5 #19795

@DocShotgun

Description

@DocShotgun

Name and Version

llama-server
version: 8124 (3571565)
built with GNU 14.2.0 for Linux x86_64

Operating systems

Linux

GGML backends

CUDA, CPU

Hardware

Dual 5th gen Xeon with 768gb DDR5 + RTX Pro 6000

Models

Kimi K2.5 (q8_0 + q4_0 mix)
Minimax M2.5 (q8_0 + q4_k + q4_k + q5_k mix)

Of note, I could NOT reproduce this on Qwen 3.5 (q8_0 + q4_k + q4_k + q5_k mix)

Perhaps there is some issue common to the chat parsers for Kimi K2.5 and Minimax M2.5 but not Qwen 3.5?

Problem description & steps to reproduce

Alrighty, this is a weird one.

I noticed that when using Kimi K2.5 via llama-server's chat completions, the model always omits the final " if the response ends on a ". However, if I manually format the prompt (by rendering the jinja template) and send the request as a raw request to /v1/completions, this doesn't happen. @ddh0 also encountered this issue on Minimax M2.5.

To reproduce, hit the endpoint with a curl request:

curl http://0.0.0.0:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer AINT_GOT_NO_API_KEY" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Write the word test wrapped in quotes."
      }
    ],
    "top_k": 1,
    "chat_template_kwargs": {"thinking": false}
  }'

and then here we get:

{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"\"test"}}],"created":1771735211,"model":"moonshotai/Kimi-K2.5","system_fingerprint":"b8124-35715657c","object":"chat.completion","usage":{"completion_tokens":4,"prompt_tokens":17,"total_tokens":21},"id":"chatcmpl-CESqRF1bX2EM29fGt6aHtIBBU9g3zNQD","timings":{"cache_n":16,"prompt_n":1,"prompt_ms":122.831,"prompt_per_token_ms":122.831,"prompt_per_second":8.141267269663196,"predicted_n":4,"predicted_ms":220.697,"predicted_per_token_ms":55.17425,"predicted_per_second":18.124396797419084}}

where the returned string is "test.

If we instead manually format the prompt into a string:

curl http://0.0.0.0:8080/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer AINT_GOT_NO_API_KEY" \
  -d '{
    "prompt": "<|im_user|>user<|im_middle|>Write the word test wrapped in quotes.<|im_end|><|im_assistant|>assistant<|im_middle|><think></think>",
    "top_k": 1
  }'

there's no problem with the trailing quote:

{"choices":[{"text":"\"test\"","index":0,"logprobs":null,"finish_reason":"stop"}],"created":1771735231,"model":"moonshotai/Kimi-K2.5","system_fingerprint":"b8124-35715657c","object":"text_completion","usage":{"completion_tokens":4,"prompt_tokens":17,"total_tokens":21},"id":"chatcmpl-jrXhbDqotkrbzsMGXMul1X5tAmpUNGlm","timings":{"cache_n":16,"prompt_n":1,"prompt_ms":79.296,"prompt_per_token_ms":79.296,"prompt_per_second":12.61097659402744,"predicted_n":4,"predicted_ms":191.489,"predicted_per_token_ms":47.87225,"predicted_per_second":20.88892834575354}}

where the returned string is "test".


For Minimax M2.5, the curl requests would be:

curl http://0.0.0.0:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer AINT_GOT_NO_API_KEY" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Write the word test wrapped in quotes."
      }
    ],
    "top_k": 1
  }'

and

curl http://0.0.0.0:8080/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer AINT_GOT_NO_API_KEY" \
  -d '{
    "prompt": "]~!b[]~b]system\nYou are a helpful assistant. Your name is MiniMax-M2.5 and is built by MiniMax.[e~[\n]~b]user\nWrite the word test wrapped in quotes.[e~[\n]~b]ai\n<think>\n",
    "top_k": 1
  }'

First Bad Commit

No response

Relevant log output

curl http://0.0.0.0:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer AINT_GOT_NO_API_KEY" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Write the word test wrapped in quotes."
      }
    ],
    "top_k": 1,
    "chat_template_kwargs": {"thinking": false}
  }'
{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"\"test"}}],"created":1771735211,"model":"moonshotai/Kimi-K2.5","system_fingerprint":"b8124-35715657c","object":"chat.completion","usage":{"completion_tokens":4,"prompt_tokens":17,"total_tokens":21},"id":"chatcmpl-CESqRF1bX2EM29fGt6aHtIBBU9g3zNQD","timings":{"cache_n":16,"prompt_n":1,"prompt_ms":122.831,"prompt_per_token_ms":122.831,"prompt_per_second":8.141267269663196,"predicted_n":4,"predicted_ms":220.697,"predicted_per_token_ms":55.17425,"predicted_per_second":18.124396797419084}}
curl http://0.0.0.0:8080/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer AINT_GOT_NO_API_KEY" \
  -d '{
    "prompt": "<|im_user|>user<|im_middle|>Write the word test wrapped in quotes.<|im_end|><|im_assistant|>assistant<|im_middle|><think></think>",
    "top_k": 1
  }'
{"choices":[{"text":"\"test\"","index":0,"logprobs":null,"finish_reason":"stop"}],"created":1771735231,"model":"moonshotai/Kimi-K2.5","system_fingerprint":"b8124-35715657c","object":"text_completion","usage":{"completion_tokens":4,"prompt_tokens":17,"total_tokens":21},"id":"chatcmpl-jrXhbDqotkrbzsMGXMul1X5tAmpUNGlm","timings":{"cache_n":16,"prompt_n":1,"prompt_ms":79.296,"prompt_per_token_ms":79.296,"prompt_per_second":12.61097659402744,"predicted_n":4,"predicted_ms":191.489,"predicted_per_token_ms":47.87225,"predicted_per_second":20.88892834575354}}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions