Send reasoning content back to the model across turns via the reasoning_content API field by ServeurpersoCom · Pull Request #21036 · ggml-org/llama.cpp

ServeurpersoCom · 2026-03-26T17:42:55Z

Overview

Send reasoning content back to the model across turns via the reasoning_content API field instead of stripping it.

Currently the WebUI strips all reasoning from previous assistant messages before sending them to /v1/chat/completions. This means models like GLM-4.7-Flash, DeepSeek-R1, QwQ and others that support multi-turn chain-of-thought lose their own reasoning history on every new turn.

The server already supports reasoning_content as a first-class input field: common_chat_msgs_parse_oaicompat parses it, to_json_oaicompat serializes it, and Jinja templates consume it natively (e.g. GLM maps it to blocks via its clear_thinking flag). The WebUI also already stores reasoning inline in content wrapped in internal tags. The only missing piece was extracting it and sending it back as a proper API field.

Changes:

Extract reasoning from internal tags and send it as a separate reasoning_content field in the API payload, no internal tags leak into the request
Add "Exclude reasoning from context" toggle in Settings > Developer, unchecked by default so reasoning is preserved
Add corresponding syncable parameter so server admins can pre-configure the default
Add 12 unit tests covering extraction, stripping, and the conditional mapping logic

Tested live with MoE-GLM-4.7-Flash-30B-A3B: verified the payload in DevTools Network tab across all three toggle states (default on, toggled off at runtime, re-enabled at runtime) without page reload.

Additional information

Closes #19449

Related: PR #18994 (server-side reasoning input support)

Note: for GLM-4.7-Flash to actually preserve reasoning in the rendered prompt and not just receive it, the template also needs clear_thinking: false via chat_template_kwargs. That is a separate concern outside this PR scope.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES Claude Opus 4.6 Extended inside a disposable local container used for code audit/generation without any privilege/write access, all changes reviewed and tested/commited manually

Preserve assistant reasoning across turns by extracting it from internal tags and sending it as a separate reasoning_content field in the API payload. The server and Jinja templates handle native formatting (e.g. <think> tags for Qwen, GLM, DeepSeek...). Adds "Exclude reasoning from context" toggle in Settings > Developer (off by default, so reasoning is preserved). Includes unit tests.

ngxson

Not sure if this will be supported by many models (some also requires explicitly set a model-specific kwarg like exclude_thinking=False), but if you're confident with this, we can give it a try

ServeurpersoCom · 2026-03-27T06:00:14Z

Absolutely, my translator (LLM) ate the most important part, it will rarely be useful but it will work if needed.

The API includes a symmetric field for returning reasoning_content in the context, and the Jinja template handles rejecting it if necessary this is the case for most models

ZUIcat · 2026-03-28T16:07:11Z

May I ask, I recall that previously most models did not recommend including the reasoning content in the response. Will this modification break that behavior, or does the model’s Jinja template prevent it so there’s no need to worry?

ServeurpersoCom · 2026-03-28T16:13:17Z

May I ask, I recall that previously most models did not recommend including the reasoning content in the response. Will this modification break that behavior, or does the model’s Jinja template prevent it so there’s no need to worry?

Absolutely, the Jinja template "filters" and cleans the CoT text for the vast majority of models, and if it ever exceptionally causes a problem on a particular model, simply check the box in Settings/Dev to test this case. Furthermore, I'm certain that everything can be overridden in various ways on the backend, CLI, presets.ini for router mode etc... (even externalize another modded Jinja template).

…ng_content API field (ggml-org#21036) * webui: send reasoning_content back to model in context Preserve assistant reasoning across turns by extracting it from internal tags and sending it as a separate reasoning_content field in the API payload. The server and Jinja templates handle native formatting (e.g. <think> tags for Qwen, GLM, DeepSeek...). Adds "Exclude reasoning from context" toggle in Settings > Developer (off by default, so reasoning is preserved). Includes unit tests. * webui: add syncable parameter for excludeReasoningFromContext * chore: update webui build output

ServeurpersoCom added 2 commits March 26, 2026 18:36

webui: add syncable parameter for excludeReasoningFromContext

d830070

ServeurpersoCom requested a review from a team as a code owner March 26, 2026 17:42

chore: update webui build output

212e370

github-actions Bot added examples server labels Mar 26, 2026

allozaur requested a review from ngxson March 26, 2026 18:33

ngxson approved these changes Mar 26, 2026

View reviewed changes

ggerganov approved these changes Mar 27, 2026

View reviewed changes

ServeurpersoCom merged commit d0fa2c9 into ggml-org:master Mar 27, 2026
6 checks passed

ServeurpersoCom mentioned this pull request Apr 11, 2026

Feature Request: WebUI response streaming is fragile #21754

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Send reasoning content back to the model across turns via the reasoning_content API field#21036

Send reasoning content back to the model across turns via the reasoning_content API field#21036
ServeurpersoCom merged 3 commits intoggml-org:masterfrom
ServeurpersoCom:webui/preserve-reasoning-in-context

ServeurpersoCom commented Mar 26, 2026

Uh oh!

ngxson left a comment

Uh oh!

ServeurpersoCom commented Mar 27, 2026

Uh oh!

Uh oh!

ZUIcat commented Mar 28, 2026

Uh oh!

ServeurpersoCom commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ServeurpersoCom commented Mar 26, 2026

Overview

Additional information

Requirements

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

ServeurpersoCom commented Mar 27, 2026

Uh oh!

Uh oh!

ZUIcat commented Mar 28, 2026

Uh oh!

ServeurpersoCom commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants