Eval bug: Trailing quotation marks dropped from chat completion response on Kimi K2.5 and Minimax M2.5

### Name and Version

llama-server
version: 8124 (35715657c)
built with GNU 14.2.0 for Linux x86_64

### Operating systems

Linux

### GGML backends

CUDA, CPU

### Hardware

Dual 5th gen Xeon with 768gb DDR5 + RTX Pro 6000

### Models

Kimi K2.5 (q8_0 + q4_0 mix)
Minimax M2.5 (q8_0 + q4_k + q4_k + q5_k mix)

Of note, I could **NOT** reproduce this on Qwen 3.5 (q8_0 + q4_k + q4_k + q5_k mix)

Perhaps there is some issue common to the chat parsers for Kimi K2.5 and Minimax M2.5 but not Qwen 3.5?

### Problem description & steps to reproduce

Alrighty, this is a weird one.

I noticed that when using Kimi K2.5 via llama-server's chat completions, the model always omits the final `"` if the response ends on a `"`. However, if I manually format the prompt (by rendering the jinja template) and send the request as a raw request to `/v1/completions`, this doesn't happen. @ddh0 also encountered this issue on Minimax M2.5.

To reproduce, hit the endpoint with a curl request:
```
curl http://0.0.0.0:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer AINT_GOT_NO_API_KEY" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Write the word test wrapped in quotes."
      }
    ],
    "top_k": 1,
    "chat_template_kwargs": {"thinking": false}
  }'
```

and then here we get:

```
{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"\"test"}}],"created":1771735211,"model":"moonshotai/Kimi-K2.5","system_fingerprint":"b8124-35715657c","object":"chat.completion","usage":{"completion_tokens":4,"prompt_tokens":17,"total_tokens":21},"id":"chatcmpl-CESqRF1bX2EM29fGt6aHtIBBU9g3zNQD","timings":{"cache_n":16,"prompt_n":1,"prompt_ms":122.831,"prompt_per_token_ms":122.831,"prompt_per_second":8.141267269663196,"predicted_n":4,"predicted_ms":220.697,"predicted_per_token_ms":55.17425,"predicted_per_second":18.124396797419084}}
```

where the returned string is `"test`.

If we instead manually format the prompt into a string:

```
curl http://0.0.0.0:8080/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer AINT_GOT_NO_API_KEY" \
  -d '{
    "prompt": "<|im_user|>user<|im_middle|>Write the word test wrapped in quotes.<|im_end|><|im_assistant|>assistant<|im_middle|><think></think>",
    "top_k": 1
  }'
```

there's no problem with the trailing quote:

```
{"choices":[{"text":"\"test\"","index":0,"logprobs":null,"finish_reason":"stop"}],"created":1771735231,"model":"moonshotai/Kimi-K2.5","system_fingerprint":"b8124-35715657c","object":"text_completion","usage":{"completion_tokens":4,"prompt_tokens":17,"total_tokens":21},"id":"chatcmpl-jrXhbDqotkrbzsMGXMul1X5tAmpUNGlm","timings":{"cache_n":16,"prompt_n":1,"prompt_ms":79.296,"prompt_per_token_ms":79.296,"prompt_per_second":12.61097659402744,"predicted_n":4,"predicted_ms":191.489,"predicted_per_token_ms":47.87225,"predicted_per_second":20.88892834575354}}
```

where the returned string is `"test"`.

---

For Minimax M2.5, the curl requests would be:
```
curl http://0.0.0.0:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer AINT_GOT_NO_API_KEY" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Write the word test wrapped in quotes."
      }
    ],
    "top_k": 1
  }'
```
and
```
curl http://0.0.0.0:8080/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer AINT_GOT_NO_API_KEY" \
  -d '{
    "prompt": "]~!b[]~b]system\nYou are a helpful assistant. Your name is MiniMax-M2.5 and is built by MiniMax.[e~[\n]~b]user\nWrite the word test wrapped in quotes.[e~[\n]~b]ai\n<think>\n",
    "top_k": 1
  }'
```

### First Bad Commit

_No response_

### Relevant log output

```
curl http://0.0.0.0:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer AINT_GOT_NO_API_KEY" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Write the word test wrapped in quotes."
      }
    ],
    "top_k": 1,
    "chat_template_kwargs": {"thinking": false}
  }'
{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"\"test"}}],"created":1771735211,"model":"moonshotai/Kimi-K2.5","system_fingerprint":"b8124-35715657c","object":"chat.completion","usage":{"completion_tokens":4,"prompt_tokens":17,"total_tokens":21},"id":"chatcmpl-CESqRF1bX2EM29fGt6aHtIBBU9g3zNQD","timings":{"cache_n":16,"prompt_n":1,"prompt_ms":122.831,"prompt_per_token_ms":122.831,"prompt_per_second":8.141267269663196,"predicted_n":4,"predicted_ms":220.697,"predicted_per_token_ms":55.17425,"predicted_per_second":18.124396797419084}}
```

```
curl http://0.0.0.0:8080/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer AINT_GOT_NO_API_KEY" \
  -d '{
    "prompt": "<|im_user|>user<|im_middle|>Write the word test wrapped in quotes.<|im_end|><|im_assistant|>assistant<|im_middle|><think></think>",
    "top_k": 1
  }'
{"choices":[{"text":"\"test\"","index":0,"logprobs":null,"finish_reason":"stop"}],"created":1771735231,"model":"moonshotai/Kimi-K2.5","system_fingerprint":"b8124-35715657c","object":"text_completion","usage":{"completion_tokens":4,"prompt_tokens":17,"total_tokens":21},"id":"chatcmpl-jrXhbDqotkrbzsMGXMul1X5tAmpUNGlm","timings":{"cache_n":16,"prompt_n":1,"prompt_ms":79.296,"prompt_per_token_ms":79.296,"prompt_per_second":12.61097659402744,"predicted_n":4,"predicted_ms":191.489,"predicted_per_token_ms":47.87225,"predicted_per_second":20.88892834575354}}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Trailing quotation marks dropped from chat completion response on Kimi K2.5 and Minimax M2.5 #19795

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Trailing quotation marks dropped from chat completion response on Kimi K2.5 and Minimax M2.5 #19795

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions