Eval bug: llama-server tool_calls returns arguments as JSON object instead of string, breaking OpenAI compatibility

### Name and Version

build: 8233 (c5a778891) with GNU 11.4.0 for Linux x86_64

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

NVIDIA GeForce RTX 5090

### Models

- Qwen3-Coder-Next-UD-Q4_K_XL.gguf
https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF

- Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf
https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF

### Problem description & steps to reproduce

Dear mods, I am trying to run quantized model in llama.cpp through the [instructions](https://unsloth.ai/docs/models/qwen3-coder-next#run-qwen3-coder-next). However, following the recent Autoparser refactoring PR (#18675), `llama-server` returns the `arguments` field in `tool_calls` as a parsed JSON object rather than a JSON string. 

This breaks strict OpenAI API compatibility. According to the [OpenAI API Reference](https://platform.openai.com/docs/api-reference/chat/object), `tool_calls[].function.arguments` must be a string containing JSON, not a parsed object. Because of this change, the official `openai` Python SDK (which uses Pydantic for strong type checking) crashes with a `TypeError` when attempting to process tool calls.

**What I tried**
Adding the argument --jinja or --chat-template chatml both failed to adhere to OpenAI API compatibility, which made the *claw frameworks (openclaw, nanoclaw, zeroclaw, ironclaw) fail to work as expected.

**To Reproduce**
1. Start `llama-server` with a tool-capable model (e.g., Qwen3-Coder-Next) and Jinja template enabled:
   ```bash
   ./llama-server --model Qwen3-Coder-Next-UD-Q4_K_XL.gguf --jinja --port 8001
   ```
2. Send a `curl` request to the `/v1/chat/completions` endpoint with tools provided:
   ```bash
   curl -s http://127.0.0.1:8001/v1/chat/completions \
     -H "Content-Type: application/json" \
     -d '{
       "model": "Qwen3-Coder-Next",
       "messages": [{"role": "user", "content": "What is 1+1? Use the add tool."}],
       "tools": [{"type": "function", "function": {"name": "add", "description": "Add two numbers", "parameters": {"type": "object", "properties": {"a": {"type": "string"}, "b": {"type": "string"}}, "required": ["a", "b"]}}}],
       "tool_choice": "auto"
     }'
   ```
3. Observe the `tool_calls` block in the raw JSON response. It shows:
   ```json
   "tool_calls": [
     {
       "type": "function",
       "function": {
         "name": "add",
         "arguments": {
           "a": "1",
           "b": "1"
         }
       },
       "id": "..."
     }
   ]
   ```
   Instead of the expected OpenAI-compatible format:
   ```json
   "arguments": "{\"a\": \"1\", \"b\": \"1\"}"
   ```

4. If you run the official `openai` Python SDK (v2.21.0), it immediately crashes upon receiving the tool call response:
   ```python
   TypeError: the JSON object must be str, bytes or bytearray, not dict
   ```

**Root Cause**
I traced this back to the massive parser refactoring in PR #18675 (commit `566059a26b0ce8faec4ea053605719d399c64cc5`).
In `common/chat.cpp` around line 132, the `arguments` field is explicitly parsed into a JSON object:
```cpp
{"type", "function"},
{"function", {
    {"name", tool_call.name},
    {"arguments", json::parse(tool_call.arguments)}, // <-- This causes the issue
}},
```
It should output the raw serialized JSON string instead of parsing it.

### First Bad Commit

566059a26b0ce8faec4ea053605719d399c64cc5
(From PR #18675: Autoparser - complete refactoring of parser architecture)

### Relevant log output

<details>
<summary>Logs</summary>


```console
$ ./llama.cpp/build/bin/llama-server \
    --model /home/alvis/Workspace/MachineLearning/Data/LinkedData/Qwen3-Coder-Next-UD-Q4_K_XL.gguf \
    --alias "Qwen3-Coder-Next" \
    --ctx-size 16384 \
    --seed 3407 \
    --temp 1.0 \
    --top-p 0.95 \
    --min-p 0.01 \
    --top-k 40 \
    --host 0.0.0.0 \
    --port 8001 \
    --jinja

ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
main: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
build: 8233 (c5a778891) with GNU 11.4.0 for Linux x86_64
system info: n_threads = 6, n_threads_batch = 6, total_threads = 20

system_info: n_threads = 6 (n_threads_batch = 6) / 20 | CUDA : ARCHS = 750,800,860,890,1200,1210 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | BLACKWELL_NATIVE_FP4 = 1 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 

init: using 19 threads for HTTP server
start: binding port with default address family
main: loading model
...
main: server is listening on http://0.0.0.0:8001
main: starting the main loop...

# Then triggered the API request:
$ curl -s http://127.0.0.1:8001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3-Coder-Next",
    "messages": [{"role": "user", "content": "What is 1+1? Use the add tool."}],
    "tools": [{"type": "function", "function": {"name": "add", "description": "Add two numbers", "parameters": {"type": "object", "properties": {"a": {"type": "string"}, "b": {"type": "string"}}, "required": ["a", "b"]}}}],
    "tool_choice": "auto"
  }' | python3 -m json.tool

# Server output:
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-native
slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> ?temp-ext -> dist 
slot launch_slot_: id  3 | task 0 | processing task, is_child = 0
slot update_slots: id  3 | task 0 | new prompt, n_ctx_slot = 16384, n_keep = 0, task.n_tokens = 288
slot init_sampler: id  3 | task 0 | init sampler, took 0.04 ms, tokens: text = 288, total = 288
slot update_slots: id  3 | task 0 | prompt processing done, n_tokens = 288, batch.n_tokens = 288
slot      release: id  3 | task 0 | stop processing: n_tokens = 311, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200

# Received JSON response violating the string arguments spec:
{
    "choices": [
        {
            "finish_reason": "tool_calls",
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "",
                "tool_calls": [
                    {
                        "type": "function",
                        "function": {
                            "name": "add",
                            "arguments": {
                                "a": "1",
                                "b": "1"
                            }
                        },
                        "id": "VR8m59fStegbYHZWeoJlj4nI0j9hhTXt"
                    }
                ]
            }
        }
    ],
    ...
}
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: llama-server tool_calls returns arguments as JSON object instead of string, breaking OpenAI compatibility #20198

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: llama-server tool_calls returns arguments as JSON object instead of string, breaking OpenAI compatibility #20198

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions