Bug Description
When serving a gpt-oss model (e.g. ggml-org/gpt-oss-120b-GGUF) with --reasoning-format auto, the first request succeeds but subsequent multi-turn requests with tool calls fail with HTTP 500:
Jinja Exception: Cannot pass both content and thinking in an assistant message with tool calls!
Regression
Introduced in PR #16937 (commit 87c9efc3b) — "common : move gpt-oss reasoning processing to init params".
That PR moved the thinking field assignment from output serialization (common_chat_msgs_to_json_oaicompat) to input processing (common_chat_params_init_gpt_oss). The new code conditionally adds thinking from reasoning_content when tool calls are present, but does not erase content from the adjusted message.
Root Cause
In common/chat.cpp — common_chat_params_init_gpt_oss():
if (has_reasoning_content && has_tool_calls) {
auto adjusted_message = msg;
adjusted_message["thinking"] = msg.at("reasoning_content");
// BUG: "content" is not erased — template forbids having both
adjusted_messages.push_back(adjusted_message);
}
The gpt-oss Jinja template (models/templates/openai-gpt-oss-120b.jinja) explicitly checks that tool-call messages have either content or thinking, not both — they render to the same <|channel|>analysis slot. When the client sends back conversation history containing assistant messages with content, reasoning_content, and tool_calls, the adjusted message ends up with {content, thinking, tool_calls} and the template raises an error.
Steps to Reproduce
- Start
llama-server with a gpt-oss model (e.g. llama-server -hf ggml-org/gpt-oss-120b-GGUF --reasoning-format auto --jinja -fa on)
- Send a chat completion request that triggers tool calls
- Send a follow-up request including the full conversation history (with the assistant's
content, reasoning_content, and tool_calls)
- Server returns HTTP 500 with the Jinja exception
Fix
Add adjusted_message.erase("content") after setting thinking:
if (has_reasoning_content && has_tool_calls) {
auto adjusted_message = msg;
adjusted_message["thinking"] = msg.at("reasoning_content");
adjusted_message.erase("content"); // template forbids both content and thinking with tool_calls
adjusted_messages.push_back(adjusted_message);
}
Bug Description
When serving a gpt-oss model (e.g. ggml-org/gpt-oss-120b-GGUF) with
--reasoning-format auto, the first request succeeds but subsequent multi-turn requests with tool calls fail with HTTP 500:Regression
Introduced in PR #16937 (commit
87c9efc3b) — "common : move gpt-oss reasoning processing to init params".That PR moved the
thinkingfield assignment from output serialization (common_chat_msgs_to_json_oaicompat) to input processing (common_chat_params_init_gpt_oss). The new code conditionally addsthinkingfromreasoning_contentwhen tool calls are present, but does not erasecontentfrom the adjusted message.Root Cause
In
common/chat.cpp—common_chat_params_init_gpt_oss():The gpt-oss Jinja template (
models/templates/openai-gpt-oss-120b.jinja) explicitly checks that tool-call messages have eithercontentorthinking, not both — they render to the same<|channel|>analysisslot. When the client sends back conversation history containing assistant messages withcontent,reasoning_content, andtool_calls, the adjusted message ends up with{content, thinking, tool_calls}and the template raises an error.Steps to Reproduce
llama-serverwith a gpt-oss model (e.g.llama-server -hf ggml-org/gpt-oss-120b-GGUF --reasoning-format auto --jinja -fa on)content,reasoning_content, andtool_calls)Fix
Add
adjusted_message.erase("content")after settingthinking: