Skip to content

[Mirror] server: /v1/responses (partial)#85

Open
ngxson wants to merge 25 commits intongxson:masterfrom
openingnow:v1_responses
Open

[Mirror] server: /v1/responses (partial)#85
ngxson wants to merge 25 commits intongxson:masterfrom
openingnow:v1_responses

Conversation

@ngxson
Copy link
Owner

@ngxson ngxson commented Jan 21, 2026

Mirror from upstream PR: ggml-org#18486

Note: @coderabbitai use my 'Mirror PR' preset for reviewing this.

Summary by CodeRabbit

  • New Features

    • Added OpenAI-compatible Responses API endpoint (/v1/responses) with streaming support and automatic translation to chat-completion format.
  • Documentation

    • Updated server README with Responses endpoint docs and usage examples.
  • Tests

    • Added unit tests validating Responses API behavior with the OpenAI Python client (streaming and non-streaming scenarios).
  • Chores

    • Bumped OpenAI package dependency from ~=1.55.3 to ~=2.14.0.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 21, 2026

📝 Walkthrough

Walkthrough

Adds OpenAI Responses API support: new /v1/responses route, conversion utility to translate Responses requests into Chat Completions format, new TASK_RESPONSE_TYPE_OAI_RESP with streaming/state handling, SSE formatting, unit tests, and dependency bumps for the OpenAI client.

Changes

Cohort / File(s) Summary
Dependency Updates
requirements/requirements-tool_bench.txt, tools/server/tests/requirements.txt
Bumped openai from ~=1.55.3 to ~=2.14.0
Documentation
tools/server/README.md
Added Responses API to features and documented POST /v1/responses with examples
Conversion Utilities
tools/server/server-common.h, tools/server/server-common.cpp
Added convert_responses_to_chatcmpl(const json&) to map Responses payloads to Chat Completions format and format_oai_resp_sse(const json&) to produce OpenAI-style SSE event formatting
Routing & Endpoints
tools/server/server.cpp, tools/server/server-context.h, tools/server/server-context.cpp
Added post_responses_oai route and /v1/responses endpoint; route uses conversion utility and delegates to completions handler with TASK_RESPONSE_TYPE_OAI_RESP; streaming and non‑streaming flows updated to handle new type
Task Types & State
tools/server/server-task.h, tools/server/server-task.cpp
Introduced TASK_RESPONSE_TYPE_OAI_RESP; added OAI response IDs and generalized thinking/text block state; added update() on partial results; implemented to_json_oaicompat_resp() and to_json_oaicompat_resp_stream() for final/partial streaming and non-streaming outputs
Tests
tools/server/tests/unit/test_compat_oai_responses.py
Added tests exercising OpenAI Python client compatibility for Responses (standard and streaming), asserting IDs, event sequence, and content

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server as HTTP Server
    participant Conv as Conversion Layer
    participant Handler as Completions Handler
    participant Task as Task Processor

    Client->>Server: POST /v1/responses
    Server->>Conv: convert_responses_to_chatcmpl(request_body)
    Conv-->>Server: chat-completions-format JSON
    Server->>Handler: handle_completions_impl(..., TASK_RESPONSE_TYPE_OAI_RESP)
    Handler->>Task: create/process task (streaming state)
    Task->>Handler: emit SSE events or final JSON (using format_oai_resp_sse)
    Handler-->>Server: stream or response payload
    Server-->>Client: SSE stream or JSON response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

🐰 A tiny hop, a clever swap,
Responses translate and never stop,
Streams of thought and message IDs,
Tools and reasons tumble with ease,
Hooray — the warren sings, we hop on top!

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.52% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description identifies the upstream source and requests a specific review approach, but lacks details on objectives, changes, testing, or breaking changes as would be typical for a substantive implementation PR. Consider adding a brief summary of the changes (OpenAI Responses API support), any testing performed, and notes on what 'partial' means regarding this implementation scope.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title '[Mirror] server: /v1/responses (partial)' clearly indicates this is a mirrored upstream PR that adds partial support for the /v1/responses server endpoint, matching the changeset's implementation of this API.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@tools/server/server-common.cpp`:
- Around line 1156-1173: The current branch that handles "input_file" pushes a
content entry with {"type":"file"} into chatcmpl_content which
oaicompat_chat_params_parse does not accept (it only supports text, image_url,
input_audio); update server-common.cpp to reject "input_file" early instead of
creating an unsupported content type: in the else-if for type == "input_file"
(the block that currently checks for file_url/file_data/filename and pushes into
chatcmpl_content), throw a clear std::invalid_argument like "'input_file' is not
supported for chat content; use an alternative upload/attachment flow" (or
similar) and remove the code that injects {"type":"file"} so parsing via
oaicompat_chat_params_parse will not later fail; alternatively, if you prefer to
support files, implement corresponding handling in oaicompat_chat_params_parse
to accept "file" content types.

In `@tools/server/server-task.cpp`:
- Around line 879-899: The final streaming reasoning output (created when
oaicompat_msg.reasoning_content is non-empty) is missing the "status" field;
update the output_item construction (the json assigned to output_item in the
block that builds the "response.output_item.done" event) to include
"status":"completed" so the streaming final item matches the non-streaming
schema and the intermediate "in_progress" items.
- Around line 806-854: The non-streaming output in
server_task_result_cmpl_final::to_json_oaicompat_resp() currently uses raw
tool_call.id for the function_call "call_id", while
to_json_oaicompat_resp_stream() prefixes IDs with "fc_", causing inconsistent
IDs; add a small helper (e.g., normalize_fc_id(const std::string&)) that returns
tool_call.id if already prefixed or prefixes with "fc_" otherwise, replace
direct uses of tool_call.id in to_json_oaicompat_resp() and the corresponding
places in to_json_oaicompat_resp_stream() to call this helper, and ensure
common_chat_tool_call.id is wrapped via normalize_fc_id when building the
{"call_id", ...} JSON field so both streaming and non‑streaming paths produce
consistent "fc_"‑prefixed IDs.
- Around line 1502-1527: The code currently stores a single oai_resp_fc_id and
overwrites it when multiple function-call deltas interleave; instead maintain a
map keyed by diff.tool_call_index (e.g., std::unordered_map<int, std::string>
tool_call_ids) and update/lookup entries when processing diff.tool_call_delta in
functions that build events (similar to to_json_anthropic()); on name/create
deltas set tool_call_ids[diff.tool_call_index] = diff.tool_call_delta.id (or
"fc_"+id for item_id), and when argument deltas arrive use
tool_call_ids.at(diff.tool_call_index) to produce the correct "item_id" so each
interleaved call keeps its own ID; ensure any cleanup (erase) happens when a
call completes.
🧹 Nitpick comments (1)
tools/server/tests/unit/test_compat_oai_responses.py (1)

1-3: Prefer explicit imports over from utils import *.
Ruff already flags this (F403/F405). Explicit imports reduce namespace ambiguity in tests.

♻️ Suggested change
-import pytest
-from openai import OpenAI
-from utils import *
+import pytest
+from openai import OpenAI
+from utils import ServerProcess, ServerPreset, match_regex

@ngxson
Copy link
Owner Author

ngxson commented Jan 21, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Jan 21, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@openingnow openingnow deleted the v1_responses branch January 22, 2026 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant