Skip to content

refactor(3/3): switch dynamo-protocols to upstream async-openai types#7625

Merged
ishandhanani merged 50 commits into
mainfrom
idhanani/protocols-cleanup
Apr 8, 2026
Merged

refactor(3/3): switch dynamo-protocols to upstream async-openai types#7625
ishandhanani merged 50 commits into
mainfrom
idhanani/protocols-cleanup

Conversation

@ishandhanani
Copy link
Copy Markdown
Contributor

@ishandhanani ishandhanani commented Mar 25, 2026

Summary

Replace the entire forked async-openai codebase in dynamo-protocols with upstream async-openai v0.34 re-exports + locally-defined inference-serving extensions.

19,108 deletions and 1,089 additions in the current PR diff (net -18,019 lines). dynamo-protocols now contains a minimal crate root, slimmed error types, and protocol type definitions -- no HTTP client, no API handlers, and no forked copies of upstream types.

Approach

The fork originally vendored the entire async-openai crate (HTTP client, API handlers, type definitions) and added NVIDIA-specific fields directly onto base OpenAI types. This PR reverses that by:

  1. Adding upstream async-openai as a types-only dependency (default-features = false with chat-completion-types, response-types, completion-types, embedding-types, image-types)

  2. Re-exporting upstream types where identical -- ~50 chat types (CompletionUsage, FunctionCall, Role, Prompt, ChatCompletionMessageToolCall, etc.), all embedding types, all response types, all image types. Consumers still import from dynamo_protocols::types::* -- the re-exports are transparent.

  3. Keeping local definitions for types we extend -- when a type has inference-serving fields that upstream doesn't have, we define it locally and document the delta. For example:

    /// Chat completion response message with multimodal content and reasoning.
    ///
    /// Extends upstream `ChatCompletionResponseMessage` with:
    /// - `content`: `Option<ChatCompletionMessageContent>` (multimodal) instead of `Option<String>`
    /// - `reasoning_content`: model reasoning output (DeepSeek-R1, QwQ)
    pub struct ChatCompletionResponseMessage {
        pub content: Option<ChatCompletionMessageContent>,  // our extension
        pub reasoning_content: Option<String>,               // our extension
        // ... standard fields (refusal, tool_calls, role, etc.)
    }

    Types form a containment tree -- if a child type is extended, all parent containers must also be locally defined. So ChatChoice is local because it contains our extended ChatCompletionResponseMessage, CreateChatCompletionResponse is local because it contains ChatChoice, etc.

  4. Adapting consumers for upstream API changes that differ from the fork:

    • ChatCompletionMessageToolCall no longer has a type field (always "function" in the spec)
    • ChatCompletionToolType renamed to FunctionType in upstream
    • Stop renamed to StopConfiguration in upstream
    • ImagesResponse gained new fields (background, output_format, quality, size, usage)
    • ImageModel gained new variants (GptImage1, GptImage1dot5, GptImage1Mini)
  5. Deleting everything else -- 31 API handler modules, HTTP client, config, 24 forked type modules that were either identical to upstream or unused. Removed client-only dependencies (reqwest, backoff, secrecy, async-openai-macros, etc.).

Types retained locally (inference-serving extensions)

Type What we added Why
ChatCompletionResponseMessage content: Option<ChatCompletionMessageContent> (enum: Text or multimodal Parts), reasoning_content: Option<String> Backends return multimodal content (images, video, audio) and reasoning tokens
ChatCompletionStreamResponseDelta Same content + reasoning_content fields Streaming variant of the above
ChatChoice / ChatChoiceStream stop_reason: Option<StopReason> Backends report which stop condition triggered (matched string or token ID)
CreateChatCompletionRequest mm_processor_kwargs: Option<Value> vLLM multimodal processor configuration
ChatCompletionStreamOptions continuous_usage_stats: bool Per-chunk usage stats (not just final chunk)
ChatCompletionRequestAssistantMessage reasoning_content: Option<ReasoningContent> Interleaved reasoning segments for KV cache-correct context reconstruction
ChatCompletionRequestUserMessageContentPart VideoUrl(...), AudioUrl(...) variants Multimodal input beyond images
ImageUrl url: url::Url (not String), uuid: Option<Uuid> Strongly-typed URLs + asset tracking through pipeline
CreateCompletionRequest prompt_embeds: Option<String>, strict echo validation Pre-computed embeddings, reject non-bool echo values
ReasoningContent Fully custom enum (Text or Segments) No upstream equivalent
StopReason Fully custom enum (String or Int) No upstream equivalent
ChatCompletionMessageContent Fully custom enum (Text or Parts) No upstream equivalent
All anthropic/ types Fully custom (~870 lines) Anthropic Messages API

What remains in dynamo-protocols

lib/protocols/src/
├── lib.rs              # Crate root
├── error.rs            # Slimmed error types
└── types/
    ├── mod.rs          # Re-exports upstream + local extensions
    ├── anthropic.rs    # Full Anthropic Messages API
    ├── chat.rs         # Upstream chat re-exports + local multimodal/reasoning extensions
    ├── completion.rs   # Upstream completion re-export + local CreateCompletionRequest
    ├── impls.rs        # From/Default impls for local types
    └── responses/
        └── mod.rs      # Upstream response re-exports + aliases

Stack: 3/3 -- depends on #7565, see RFC #7563

Test plan

  • cargo test -p dynamo-protocols -- pass
  • cargo test -p dynamo-parsers -- 354 pass
  • cargo test -p dynamo-llm --lib -- 759 pass
  • cargo check -- clean across all crates

Open with Devin

Summary by CodeRabbit

  • New Features

    • Added multimodal support (video and audio inputs) for chat completions.
    • Extended response handling with reasoning content and streaming options.
    • Improved tool calling with updated type definitions.
  • Breaking Changes

    • Removed HTTP client public exports; simplified HTTP service interface.
    • Removed support for assistants, threads, runs, and file operations.
    • Removed realtime and BYOT feature support.
    • Updated protocol types to align with upstream async-openai library.
  • Refactor

    • Consolidated protocol types and dependencies.

ishandhanani and others added 12 commits March 21, 2026 07:51
…c-openai

Move NVIDIA-specific nvext fields from the base response types
(CreateChatCompletionResponse, CreateChatCompletionStreamResponse,
CreateCompletionResponse) into dynamo-llm's Nv* wrapper structs where
they belong. This makes the base types pure OpenAI spec, preparing
dynamo-async-openai for standalone publishing.

No behavior change -- nvext is still carried on the wrapper types
and the JSON wire format is identical thanks to serde(flatten).
Move pure Anthropic Messages API types (request, response, streaming,
error, count-tokens) and CacheControl into dynamo-async-openai as
types::anthropic module. dynamo-llm re-exports them and retains only
the bidirectional conversion logic (Anthropic <-> NvChat).

This makes Anthropic protocol types available to standalone consumers
alongside the existing OpenAI and Responses API types.
Rename crate and directory (lib/async-openai -> lib/protocols) to
better reflect the crate's role as protocol type definitions rather
than an OpenAI client fork. Global find-replace of
dynamo_async_openai -> dynamo_protocols across 63 files.

Mechanical rename only -- no logic changes.
…tests

Handle empty choices chunks in text and batch stream readers to avoid
panics on usage-only events. Add regression tests verifying nvext
passthrough during stream aggregation.
ToSchema was derived on ~91 types in the protocols crate but never
consumed for OpenAPI schema generation -- only the Nv* wrapper types
in dynamo-llm are referenced in openapi_docs.rs. Removing it drops
the utoipa dependency from dynamo-protocols, unblocking the eventual
switch to upstream async-openai.

dynamo-llm wrapper types that flatten protocol types now use
#[schema(value_type = Object)] to avoid requiring ToSchema on the
inner protocol types.
Replace forked chat type definitions with re-exports from upstream
async-openai v0.34. Types that are structurally identical to upstream
are now re-exported directly. Types with inference-serving extensions
(reasoning_content, stop_reason, multimodal content, video/audio
inputs) remain locally defined with documentation of the delta.

Key changes:
- Add async-openai = "0.34" dependency (types-only features)
- Re-export ~50 upstream types from async_openai::types::chat
- Keep local definitions for extended types: ChatChoice,
  ChatChoiceStream, ChatCompletionStreamResponseDelta,
  ChatCompletionResponseMessage, CreateChatCompletionRequest,
  ChatCompletionStreamOptions, ChatCompletionRequestMessage,
  ChatCompletionRequestAssistantMessage, ImageUrl, VideoUrl, AudioUrl
- Adapt consumers for upstream API changes:
  - ChatCompletionMessageToolCall no longer has `type` field
  - FunctionType replaces ChatCompletionToolType for tool call chunks
  - StopConfiguration replaces Stop
…exports

- completion.rs: re-export CreateCompletionResponse from upstream,
  keep CreateCompletionRequest locally (has prompt_embeds, echo
  validation, extended stream_options)
- embedding.rs: full re-export from upstream (identical types)
- responses/: full re-export from upstream async-openai v0.34,
  delete 4526 lines of forked type definitions (api.rs, conversation.rs,
  impls.rs, response.rs, sdk.rs, stream.rs)
- Update dynamo-llm response consumers for upstream API changes

Total: -4886 lines removed from fork.
The fork's client create_stream expects the fork's internal
CreateChatCompletionRequest, but our chat types are now locally
defined (re-exported from upstream). Switch to create_stream_byot
which accepts any Serialize type, bridging the type mismatch.

Note: these client wrappers are only used in tests, not production.
@ishandhanani ishandhanani requested a review from a team as a code owner March 25, 2026 08:08
@github-actions github-actions Bot added refactor frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Mar 25, 2026
@ishandhanani ishandhanani marked this pull request as draft March 25, 2026 08:19
@ishandhanani ishandhanani force-pushed the idhanani/protocols-cleanup branch from 01aa666 to b7f9fa6 Compare March 25, 2026 08:28
Delete the forked async-openai HTTP client, API handler modules,
and remaining forked type definitions. dynamo-protocols now contains
only:
- Upstream re-exports (chat, completion, embedding, responses, images)
- Locally-defined inference-serving extensions (documented)
- Anthropic Messages API types
- MCP types
- Error types (slimmed down, no reqwest dependency)

Removed:
- 31 forked API handler modules (client.rs, config.rs, chat.rs, etc.)
- 24 forked type modules (assistant, audio, batch, file, model, etc.)
- shared/ directory (re-exported through upstream responses)
- lib/llm/src/http/client.rs (dead code, never imported)
- Forked tests that depended on the client (byot, whisper)
- async-openai-macros, reqwest, backoff, secrecy, and other
  client-only dependencies

Total: -10,718 lines
- Use FunctionType instead of ChatCompletionToolType for streaming
  tool call chunk assertions (upstream renamed the enum)
- Remove deleted HTTP client tests (PureOpenAI/NvCustom/GenericBYOT
  clients were part of the fork, not needed)
- Normalize fixture comparison in postprocessor_parsing_stream to
  handle upstream serde skip_serializing_if differences
- Fix stale comments referencing old crate/path names
Copy link
Copy Markdown
Contributor

@grahamking grahamking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work.

@MatejKosec
Copy link
Copy Markdown
Contributor

Review: Type Migration Risk Assessment

Thorough analysis of the type conversion changes. Overall the refactor is clean and well-structured — the -18K line reduction is a big maintenance win. A few things that might be overlooked:

1. stream field serializes null (Medium Risk)

CreateChatCompletionRequest.stream lost its #[serde(skip_serializing_if = "Option::is_none")] attribute and now only has #[serde(default)]. This means stream: None will serialize as "stream": null instead of being omitted. Some backends reject explicit null on this field. Fix: add skip_serializing_if back alongside default.

2. v1/responses has near-zero test coverage (High Risk)

The PR rewrites the type conversion layer for Responses (chat_completion_to_response, ResponseStreamConverter, InputContent handling), but there are almost no E2E tests exercising the /v1/responses endpoint. The only usage is in test_sglang.py — no coverage for vLLM, TRT-LLM, streaming responses, structured output through responses, or error paths. The 36/36 E2E pass is mostly chat completions.

Specifically untested after this PR:

  • convert_content() in responses/mod.rs (InputText/InputImage/InputFile conversion)
  • chat_completion_to_response() (the core translation layer)
  • ResponseStreamConverter (streaming delta → response event)
  • Error propagation through the responses path

3. InputVideo/InputAudio handling silently removed

InputContent conversion drops InputVideo, InputAudio, OutputText, Refusal variants. Correct today (upstream v0.34 doesn't have them), but when upstream adds them, the code will silently skip them. A TODO comment would help.

4. store behavioral change

store: true is now accepted (previously rejected) in Responses API requests, enabling audit gating. This is intentional but worth noting in the PR description since it's a behavioral change, not just a refactor.

5. convert_image_detail_str does JSON round-trip per image

Minor perf concern: serializes to serde_json::Value then matches string for every image. A direct enum match or From impl would avoid the allocation.

Recommendation: Items 1 and 2 should be addressed before merge. The rest are non-blocking follow-ups.

@MatejKosec
Copy link
Copy Markdown
Contributor

Follow-up: ImageModel Wildcard Bug (New Finding)

The wildcard match at lib/llm/src/http/service/openai.rs:1944:

_ => format!("{:?}", m).to_lowercase()

produces wrong model names for upstream ImageModel variants because Debug doesn't respect #[serde(rename)]:

Variant Debug output (lowercased) Correct serde name
GptImage1 gptimage1 gpt-image-1
GptImage1dot5 gptimage1dot5 gpt-image-1.5
GptImage1Mini gptimage1mini gpt-image-1-mini

This would cause state.manager().get_images_engine(&model) to fail to find the engine for GPT-Image models. Latent bug — won't fire until someone uses these models through Dynamo, but worth fixing.

Fix: Use serde_json::to_value(&m) to get the correctly-renamed string, or match explicitly on known variants.

@MatejKosec
Copy link
Copy Markdown
Contributor

Follow-up: v1/responses Wire-Format Changes & Test Gap

Deep analysis of the v1/responses path reveals visible wire-format changes with no regression test coverage.

Fields removed from Response JSON

Field Old Value After PR
frequency_penalty 0.0 absent
presence_penalty 0.0 absent
store false/true absent
max_tool_calls null absent

Fields that flip from null to omitted

error, incomplete_details, prompt, prompt_cache_key, safety_identifier, conversation, billing — all previously serialized as "field": null, now omitted entirely (upstream uses skip_serializing_if).

Concrete client breakage scenario

response = requests.post("/v1/responses", json={"input": "hello"}).json()
fp = response["frequency_penalty"]  # Was 0.0, now KeyError
store = response.get("store", True)  # Was False, now defaults to True (flipped!)

The gap

No test anywhere serializes a Response to JSON and validates the shape. The Rust unit tests check struct field values but never serialize. The Python E2E tests check object, id, status, output[0].type but none of the fields that changed.

Recommendation

Add a ~30-line wire-format snapshot test:

let json = serde_json::to_value(&response).unwrap();
assert!(json.get("frequency_penalty").is_none());
assert!(json.get("store").is_none());
assert!(json.get("presence_penalty").is_none());
assert_eq!(json["metadata"], serde_json::json!({}));
assert!(json["output"][0].get("id").is_some());
assert!(json["output"][0].get("status").is_some());

Also consider running the OpenResponses compliance suite (bun run test:compliance) which the test file itself references.

- Add skip_serializing_if to stream field on CreateChatCompletionRequest
  to avoid serializing null (some backends reject it)
- Fix ImageModel wildcard using Debug output instead of serde rename
  (gptimage1 vs gpt-image-1) by matching all variants explicitly
- Guard file_id-only InputImage with targeted error before URL parsing
- Add wire-format snapshot test for Response JSON shape documenting
  that frequency_penalty, presence_penalty, store, max_tool_calls are
  intentionally absent (request-level fields, not in OpenAI spec)
@ishandhanani ishandhanani enabled auto-merge (squash) April 8, 2026 18:47
@ishandhanani ishandhanani merged commit fd5cc28 into main Apr 8, 2026
94 of 95 checks passed
@ishandhanani ishandhanani deleted the idhanani/protocols-cleanup branch April 8, 2026 19:29
ishandhanani added a commit that referenced this pull request Apr 16, 2026
…t messages

Two regressions in the Responses API → Chat Completions bridge when a prior
turn contains a `function_call` + assistant text + `function_call_output`
sequence (emitted by OpenAI Agents SDK, Codex, etc.):

1. Tool-call chain broken by interstitial assistant text.
   `convert_input_items_to_messages` emitted three separate chat messages
   (assistant-with-tool-calls, assistant-with-text, tool). Jinja templates
   that require a tool message to directly follow its assistant tool_call
   (e.g. MiniMax) then see the interstitial assistant message reset
   `last_tool_call.name` and reject the payload with
   "Message has tool role, but there was no previous assistant message
   with a tool call!".

   Coalesce adjacent assistant-side items (OutputMessage, FunctionCall,
   assistant EasyMessage) into a single ChatCompletionRequestAssistantMessage
   carrying both `content` and `tool_calls`. This matches the Chat Completions
   spec and lets downstream templates pair tool calls with their outputs.

2. Strict upstream deserialization rejects bare assistant messages.
   After the async-openai 0.34 upgrade (#7625) the lenient fix from #6599
   was dropped: upstream `OutputMessage` requires `id` + `status`, and
   `OutputTextContent` requires `annotations`. Clients like Codex send
   `{"type":"message","role":"assistant","content":[{"type":"output_text",
   "text":"..."}]}` without these, failing with
   "data did not match any variant of untagged enum InputParam".

   Add a custom Deserialize on NvCreateResponse that patches these items
   with synthetic defaults before delegating to the upstream strict types.

Tested live against MiniMax-M2.7 on dynamo.frontend:
- Codex `codex exec "hello minimax"` round-trips successfully
- All three previously-failing repro shapes (simpler, structured w/ id+status,
  structured w/o id+status) now return 200

Covered by three new unit tests in responses/mod.rs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::sglang Relates to the sglang backend documentation Improvements or additions to documentation frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` refactor size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants