[Mirror] server: /v1/responses (partial) by ngxson · Pull Request #85 · ngxson/llama.cpp

ngxson · 2026-01-21T08:34:35Z

Mirror from upstream PR: ggml-org#18486

Note: @coderabbitai use my 'Mirror PR' preset for reviewing this.

Summary by CodeRabbit

New Features
- Added OpenAI-compatible Responses API endpoint (/v1/responses) with streaming support and automatic translation to chat-completion format.
Documentation
- Updated server README with Responses endpoint docs and usage examples.
Tests
- Added unit tests validating Responses API behavior with the OpenAI Python client (streaming and non-streaming scenarios).
Chores
- Bumped OpenAI package dependency from ~=1.55.3 to ~=2.14.0.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…ver_task_result_cmpl_partial, and server_task_result_cmpl_final

coderabbitai · 2026-01-21T08:35:00Z

📝 Walkthrough

Walkthrough

Adds OpenAI Responses API support: new /v1/responses route, conversion utility to translate Responses requests into Chat Completions format, new TASK_RESPONSE_TYPE_OAI_RESP with streaming/state handling, SSE formatting, unit tests, and dependency bumps for the OpenAI client.

Changes

Cohort / File(s)	Summary
Dependency Updates `requirements/requirements-tool_bench.txt`, `tools/server/tests/requirements.txt`	Bumped `openai` from ~=1.55.3 to ~=2.14.0
Documentation `tools/server/README.md`	Added Responses API to features and documented `POST /v1/responses` with examples
Conversion Utilities `tools/server/server-common.h`, `tools/server/server-common.cpp`	Added `convert_responses_to_chatcmpl(const json&)` to map Responses payloads to Chat Completions format and `format_oai_resp_sse(const json&)` to produce OpenAI-style SSE event formatting
Routing & Endpoints `tools/server/server.cpp`, `tools/server/server-context.h`, `tools/server/server-context.cpp`	Added `post_responses_oai` route and `/v1/responses` endpoint; route uses conversion utility and delegates to completions handler with `TASK_RESPONSE_TYPE_OAI_RESP`; streaming and non‑streaming flows updated to handle new type
Task Types & State `tools/server/server-task.h`, `tools/server/server-task.cpp`	Introduced `TASK_RESPONSE_TYPE_OAI_RESP`; added OAI response IDs and generalized thinking/text block state; added `update()` on partial results; implemented `to_json_oaicompat_resp()` and `to_json_oaicompat_resp_stream()` for final/partial streaming and non-streaming outputs
Tests `tools/server/tests/unit/test_compat_oai_responses.py`	Added tests exercising OpenAI Python client compatibility for Responses (standard and streaming), asserting IDs, event sequence, and content

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server as HTTP Server
    participant Conv as Conversion Layer
    participant Handler as Completions Handler
    participant Task as Task Processor

    Client->>Server: POST /v1/responses
    Server->>Conv: convert_responses_to_chatcmpl(request_body)
    Conv-->>Server: chat-completions-format JSON
    Server->>Handler: handle_completions_impl(..., TASK_RESPONSE_TYPE_OAI_RESP)
    Handler->>Task: create/process task (streaming state)
    Task->>Handler: emit SSE events or final JSON (using format_oai_resp_sse)
    Handler-->>Server: stream or response payload
    Server-->>Client: SSE stream or JSON response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

🐰 A tiny hop, a clever swap,
Responses translate and never stop,
Streams of thought and message IDs,
Tools and reasons tumble with ease,
Hooray — the warren sings, we hop on top!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 18.52% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description identifies the upstream source and requests a specific review approach, but lacks details on objectives, changes, testing, or breaking changes as would be typical for a substantive implementation PR.	Consider adding a brief summary of the changes (OpenAI Responses API support), any testing performed, and notes on what 'partial' means regarding this implementation scope.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[Mirror] server: /v1/responses (partial)' clearly indicates this is a mirrored upstream PR that adds partial support for the /v1/responses server endpoint, matching the changeset's implementation of this API.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@tools/server/server-common.cpp`:
- Around line 1156-1173: The current branch that handles "input_file" pushes a
content entry with {"type":"file"} into chatcmpl_content which
oaicompat_chat_params_parse does not accept (it only supports text, image_url,
input_audio); update server-common.cpp to reject "input_file" early instead of
creating an unsupported content type: in the else-if for type == "input_file"
(the block that currently checks for file_url/file_data/filename and pushes into
chatcmpl_content), throw a clear std::invalid_argument like "'input_file' is not
supported for chat content; use an alternative upload/attachment flow" (or
similar) and remove the code that injects {"type":"file"} so parsing via
oaicompat_chat_params_parse will not later fail; alternatively, if you prefer to
support files, implement corresponding handling in oaicompat_chat_params_parse
to accept "file" content types.

In `@tools/server/server-task.cpp`:
- Around line 879-899: The final streaming reasoning output (created when
oaicompat_msg.reasoning_content is non-empty) is missing the "status" field;
update the output_item construction (the json assigned to output_item in the
block that builds the "response.output_item.done" event) to include
"status":"completed" so the streaming final item matches the non-streaming
schema and the intermediate "in_progress" items.
- Around line 806-854: The non-streaming output in
server_task_result_cmpl_final::to_json_oaicompat_resp() currently uses raw
tool_call.id for the function_call "call_id", while
to_json_oaicompat_resp_stream() prefixes IDs with "fc_", causing inconsistent
IDs; add a small helper (e.g., normalize_fc_id(const std::string&)) that returns
tool_call.id if already prefixed or prefixes with "fc_" otherwise, replace
direct uses of tool_call.id in to_json_oaicompat_resp() and the corresponding
places in to_json_oaicompat_resp_stream() to call this helper, and ensure
common_chat_tool_call.id is wrapped via normalize_fc_id when building the
{"call_id", ...} JSON field so both streaming and non‑streaming paths produce
consistent "fc_"‑prefixed IDs.
- Around line 1502-1527: The code currently stores a single oai_resp_fc_id and
overwrites it when multiple function-call deltas interleave; instead maintain a
map keyed by diff.tool_call_index (e.g., std::unordered_map<int, std::string>
tool_call_ids) and update/lookup entries when processing diff.tool_call_delta in
functions that build events (similar to to_json_anthropic()); on name/create
deltas set tool_call_ids[diff.tool_call_index] = diff.tool_call_delta.id (or
"fc_"+id for item_id), and when argument deltas arrive use
tool_call_ids.at(diff.tool_call_index) to produce the correct "item_id" so each
interleaved call keeps its own ID; ensure any cleanup (erase) happens when a
call completes.

🧹 Nitpick comments (1)

tools/server/tests/unit/test_compat_oai_responses.py (1)
1-3: Prefer explicit imports over from utils import *.
Ruff already flags this (F403/F405). Explicit imports reduce namespace ambiguity in tests.
♻️ Suggested change
-import pytest
-from openai import OpenAI
-from utils import *
+import pytest
+from openai import OpenAI
+from utils import ServerProcess, ServerPreset, match_regex

tools/server/server-common.cpp

tools/server/server-task.cpp

…ed out

ngxson · 2026-01-21T16:05:37Z

@coderabbitai review

coderabbitai · 2026-01-21T16:05:44Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

openingnow added 23 commits January 20, 2026 23:54

from previous PR

47134fc

Make instruction(system) as first message

c41a6d7

Convert [input_message] (text/image/file)

aa2238e

Rename convert_responses_to_chatcmpl(body) -> response_body

fd0a13b

Initial tool call support

f4a87c0

Erase instructions field from chatcmpl body

6e47dea

Feed reasoning texts to chat template

313ea1e

Use std::vector instead of opaque json array

7d7058b

Make output_item.added events consistent

e550290

Move server_task_result_cmpl_partial::update from header to source

97e649e

Match ID of output_item.added and .done events

d9dca02

Add function_call only if there is no "fc_" prefix

cd9b4cf

Add function call output at non-streaming API

6c200df

Test if ID is persistent

63c6013

Add doc

f232a1b

Fix style - use trailing comma

8a2dd2d

Rewrite state management

42a6eb3

catch up with upstream/master

5e1f65c

Fix style - "type" is the first item of SSE data

951fe42

Explicitly check "instructions" from response_body

ebb6438

Make lambdas static

cf83e1a

Check if reasoning content exists

0d5e3de

Add oai_resp_id to task_result_state(also initialized at ctor), ser…

5ac23d2

…ver_task_result_cmpl_partial, and server_task_result_cmpl_final

coderabbitai bot reviewed Jan 21, 2026

View reviewed changes

tools/server/server-common.cpp Show resolved Hide resolved

tools/server/server-task.cpp Show resolved Hide resolved

tools/server/server-task.cpp Show resolved Hide resolved

tools/server/server-task.cpp Show resolved Hide resolved

github-actions bot added examples python server labels Jan 21, 2026

openingnow added 2 commits January 21, 2026 11:56

Reject input_file since it is not supported by chatcmpl

da3ed76

Add "fc_" prefix to non-straming function call id as coderabbit point…

96995a6

…ed out

openingnow deleted the v1_responses branch January 22, 2026 10:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Mirror] server: /v1/responses (partial)#85

[Mirror] server: /v1/responses (partial)#85
ngxson wants to merge 25 commits intongxson:masterfrom
openingnow:v1_responses

ngxson commented Jan 21, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 21, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson commented Jan 21, 2026

Uh oh!

coderabbitai bot commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ngxson commented Jan 21, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson commented Jan 21, 2026

Uh oh!

coderabbitai bot commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ngxson commented Jan 21, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 21, 2026 •

edited

Loading