refactor(3/3): switch dynamo-protocols to upstream async-openai types by ishandhanani · Pull Request #7625 · ai-dynamo/dynamo

ishandhanani · 2026-03-25T08:08:20Z

Summary

Replace the entire forked async-openai codebase in dynamo-protocols with upstream async-openai v0.34 re-exports + locally-defined inference-serving extensions.

19,108 deletions and 1,089 additions in the current PR diff (net -18,019 lines). dynamo-protocols now contains a minimal crate root, slimmed error types, and protocol type definitions -- no HTTP client, no API handlers, and no forked copies of upstream types.

Approach

The fork originally vendored the entire async-openai crate (HTTP client, API handlers, type definitions) and added NVIDIA-specific fields directly onto base OpenAI types. This PR reverses that by:

Adding upstream async-openai as a types-only dependency (default-features = false with chat-completion-types, response-types, completion-types, embedding-types, image-types)
Re-exporting upstream types where identical -- ~50 chat types (CompletionUsage, FunctionCall, Role, Prompt, ChatCompletionMessageToolCall, etc.), all embedding types, all response types, all image types. Consumers still import from dynamo_protocols::types::* -- the re-exports are transparent.

Keeping local definitions for types we extend -- when a type has inference-serving fields that upstream doesn't have, we define it locally and document the delta. For example:

/// Chat completion response message with multimodal content and reasoning.
///
/// Extends upstream `ChatCompletionResponseMessage` with:
/// - `content`: `Option<ChatCompletionMessageContent>` (multimodal) instead of `Option<String>`
/// - `reasoning_content`: model reasoning output (DeepSeek-R1, QwQ)
pub struct ChatCompletionResponseMessage {
    pub content: Option<ChatCompletionMessageContent>,  // our extension
    pub reasoning_content: Option<String>,               // our extension
    // ... standard fields (refusal, tool_calls, role, etc.)
}

Types form a containment tree -- if a child type is extended, all parent containers must also be locally defined. So ChatChoice is local because it contains our extended ChatCompletionResponseMessage, CreateChatCompletionResponse is local because it contains ChatChoice, etc.

Adapting consumers for upstream API changes that differ from the fork:
- ChatCompletionMessageToolCall no longer has a type field (always "function" in the spec)
- ChatCompletionToolType renamed to FunctionType in upstream
- Stop renamed to StopConfiguration in upstream
- ImagesResponse gained new fields (background, output_format, quality, size, usage)
- ImageModel gained new variants (GptImage1, GptImage1dot5, GptImage1Mini)
Deleting everything else -- 31 API handler modules, HTTP client, config, 24 forked type modules that were either identical to upstream or unused. Removed client-only dependencies (reqwest, backoff, secrecy, async-openai-macros, etc.).

Types retained locally (inference-serving extensions)

Type	What we added	Why
`ChatCompletionResponseMessage`	`content: Option<ChatCompletionMessageContent>` (enum: Text or multimodal Parts), `reasoning_content: Option<String>`	Backends return multimodal content (images, video, audio) and reasoning tokens
`ChatCompletionStreamResponseDelta`	Same `content` + `reasoning_content` fields	Streaming variant of the above
`ChatChoice` / `ChatChoiceStream`	`stop_reason: Option<StopReason>`	Backends report which stop condition triggered (matched string or token ID)
`CreateChatCompletionRequest`	`mm_processor_kwargs: Option<Value>`	vLLM multimodal processor configuration
`ChatCompletionStreamOptions`	`continuous_usage_stats: bool`	Per-chunk usage stats (not just final chunk)
`ChatCompletionRequestAssistantMessage`	`reasoning_content: Option<ReasoningContent>`	Interleaved reasoning segments for KV cache-correct context reconstruction
`ChatCompletionRequestUserMessageContentPart`	`VideoUrl(...)`, `AudioUrl(...)` variants	Multimodal input beyond images
`ImageUrl`	`url: url::Url` (not String), `uuid: Option<Uuid>`	Strongly-typed URLs + asset tracking through pipeline
`CreateCompletionRequest`	`prompt_embeds: Option<String>`, strict `echo` validation	Pre-computed embeddings, reject non-bool echo values
`ReasoningContent`	Fully custom enum (Text or Segments)	No upstream equivalent
`StopReason`	Fully custom enum (String or Int)	No upstream equivalent
`ChatCompletionMessageContent`	Fully custom enum (Text or Parts)	No upstream equivalent
All `anthropic/` types	Fully custom (~870 lines)	Anthropic Messages API

What remains in dynamo-protocols

lib/protocols/src/
├── lib.rs              # Crate root
├── error.rs            # Slimmed error types
└── types/
    ├── mod.rs          # Re-exports upstream + local extensions
    ├── anthropic.rs    # Full Anthropic Messages API
    ├── chat.rs         # Upstream chat re-exports + local multimodal/reasoning extensions
    ├── completion.rs   # Upstream completion re-export + local CreateCompletionRequest
    ├── impls.rs        # From/Default impls for local types
    └── responses/
        └── mod.rs      # Upstream response re-exports + aliases

Stack: 3/3 -- depends on #7565, see RFC #7563

Test plan

cargo test -p dynamo-protocols -- pass
cargo test -p dynamo-parsers -- 354 pass
cargo test -p dynamo-llm --lib -- 759 pass
cargo check -- clean across all crates

Summary by CodeRabbit

New Features
- Added multimodal support (video and audio inputs) for chat completions.
- Extended response handling with reasoning content and streaming options.
- Improved tool calling with updated type definitions.
Breaking Changes
- Removed HTTP client public exports; simplified HTTP service interface.
- Removed support for assistants, threads, runs, and file operations.
- Removed realtime and BYOT feature support.
- Updated protocol types to align with upstream async-openai library.
Refactor
- Consolidated protocol types and dependencies.

…c-openai Move NVIDIA-specific nvext fields from the base response types (CreateChatCompletionResponse, CreateChatCompletionStreamResponse, CreateCompletionResponse) into dynamo-llm's Nv* wrapper structs where they belong. This makes the base types pure OpenAI spec, preparing dynamo-async-openai for standalone publishing. No behavior change -- nvext is still carried on the wrapper types and the JSON wire format is identical thanks to serde(flatten).

Move pure Anthropic Messages API types (request, response, streaming, error, count-tokens) and CacheControl into dynamo-async-openai as types::anthropic module. dynamo-llm re-exports them and retains only the bidirectional conversion logic (Anthropic <-> NvChat). This makes Anthropic protocol types available to standalone consumers alongside the existing OpenAI and Responses API types.

Rename crate and directory (lib/async-openai -> lib/protocols) to better reflect the crate's role as protocol type definitions rather than an OpenAI client fork. Global find-replace of dynamo_async_openai -> dynamo_protocols across 63 files. Mechanical rename only -- no logic changes.

…tests Handle empty choices chunks in text and batch stream readers to avoid panics on usage-only events. Add regression tests verifying nvext passthrough during stream aggregation.

ToSchema was derived on ~91 types in the protocols crate but never consumed for OpenAPI schema generation -- only the Nv* wrapper types in dynamo-llm are referenced in openapi_docs.rs. Removing it drops the utoipa dependency from dynamo-protocols, unblocking the eventual switch to upstream async-openai. dynamo-llm wrapper types that flatten protocol types now use #[schema(value_type = Object)] to avoid requiring ToSchema on the inner protocol types.

Replace forked chat type definitions with re-exports from upstream async-openai v0.34. Types that are structurally identical to upstream are now re-exported directly. Types with inference-serving extensions (reasoning_content, stop_reason, multimodal content, video/audio inputs) remain locally defined with documentation of the delta. Key changes: - Add async-openai = "0.34" dependency (types-only features) - Re-export ~50 upstream types from async_openai::types::chat - Keep local definitions for extended types: ChatChoice, ChatChoiceStream, ChatCompletionStreamResponseDelta, ChatCompletionResponseMessage, CreateChatCompletionRequest, ChatCompletionStreamOptions, ChatCompletionRequestMessage, ChatCompletionRequestAssistantMessage, ImageUrl, VideoUrl, AudioUrl - Adapt consumers for upstream API changes: - ChatCompletionMessageToolCall no longer has `type` field - FunctionType replaces ChatCompletionToolType for tool call chunks - StopConfiguration replaces Stop

…exports - completion.rs: re-export CreateCompletionResponse from upstream, keep CreateCompletionRequest locally (has prompt_embeds, echo validation, extended stream_options) - embedding.rs: full re-export from upstream (identical types) - responses/: full re-export from upstream async-openai v0.34, delete 4526 lines of forked type definitions (api.rs, conversation.rs, impls.rs, response.rs, sdk.rs, stream.rs) - Update dynamo-llm response consumers for upstream API changes Total: -4886 lines removed from fork.

The fork's client create_stream expects the fork's internal CreateChatCompletionRequest, but our chat types are now locally defined (re-exported from upstream). Switch to create_stream_byot which accepts any Serialize type, bridging the type mismatch. Note: these client wrappers are only used in tests, not production.

Delete the forked async-openai HTTP client, API handler modules, and remaining forked type definitions. dynamo-protocols now contains only: - Upstream re-exports (chat, completion, embedding, responses, images) - Locally-defined inference-serving extensions (documented) - Anthropic Messages API types - MCP types - Error types (slimmed down, no reqwest dependency) Removed: - 31 forked API handler modules (client.rs, config.rs, chat.rs, etc.) - 24 forked type modules (assistant, audio, batch, file, model, etc.) - shared/ directory (re-exported through upstream responses) - lib/llm/src/http/client.rs (dead code, never imported) - Forked tests that depended on the client (byot, whisper) - async-openai-macros, reqwest, backoff, secrecy, and other client-only dependencies Total: -10,718 lines

- Use FunctionType instead of ChatCompletionToolType for streaming tool call chunk assertions (upstream renamed the enum) - Remove deleted HTTP client tests (PureOpenAI/NvCustom/GenericBYOT clients were part of the fork, not needed) - Normalize fixture comparison in postprocessor_parsing_stream to handle upstream serde skip_serializing_if differences - Fix stale comments referencing old crate/path names

grahamking

Great work.

MatejKosec · 2026-04-06T22:05:47Z

Review: Type Migration Risk Assessment

Thorough analysis of the type conversion changes. Overall the refactor is clean and well-structured — the -18K line reduction is a big maintenance win. A few things that might be overlooked:

1. `stream` field serializes `null` (Medium Risk)

CreateChatCompletionRequest.stream lost its #[serde(skip_serializing_if = "Option::is_none")] attribute and now only has #[serde(default)]. This means stream: None will serialize as "stream": null instead of being omitted. Some backends reject explicit null on this field. Fix: add skip_serializing_if back alongside default.

2. v1/responses has near-zero test coverage (High Risk)

The PR rewrites the type conversion layer for Responses (chat_completion_to_response, ResponseStreamConverter, InputContent handling), but there are almost no E2E tests exercising the /v1/responses endpoint. The only usage is in test_sglang.py — no coverage for vLLM, TRT-LLM, streaming responses, structured output through responses, or error paths. The 36/36 E2E pass is mostly chat completions.

Specifically untested after this PR:

convert_content() in responses/mod.rs (InputText/InputImage/InputFile conversion)
chat_completion_to_response() (the core translation layer)
ResponseStreamConverter (streaming delta → response event)
Error propagation through the responses path

3. `InputVideo`/`InputAudio` handling silently removed

InputContent conversion drops InputVideo, InputAudio, OutputText, Refusal variants. Correct today (upstream v0.34 doesn't have them), but when upstream adds them, the code will silently skip them. A TODO comment would help.

4. `store` behavioral change

store: true is now accepted (previously rejected) in Responses API requests, enabling audit gating. This is intentional but worth noting in the PR description since it's a behavioral change, not just a refactor.

5. `convert_image_detail_str` does JSON round-trip per image

Minor perf concern: serializes to serde_json::Value then matches string for every image. A direct enum match or From impl would avoid the allocation.

Recommendation: Items 1 and 2 should be addressed before merge. The rest are non-blocking follow-ups.

MatejKosec · 2026-04-06T22:07:25Z

Follow-up: ImageModel Wildcard Bug (New Finding)

The wildcard match at lib/llm/src/http/service/openai.rs:1944:

_ => format!("{:?}", m).to_lowercase()

produces wrong model names for upstream ImageModel variants because Debug doesn't respect #[serde(rename)]:

Variant	Debug output (lowercased)	Correct serde name
`GptImage1`	`gptimage1`	`gpt-image-1`
`GptImage1dot5`	`gptimage1dot5`	`gpt-image-1.5`
`GptImage1Mini`	`gptimage1mini`	`gpt-image-1-mini`

This would cause state.manager().get_images_engine(&model) to fail to find the engine for GPT-Image models. Latent bug — won't fire until someone uses these models through Dynamo, but worth fixing.

Fix: Use serde_json::to_value(&m) to get the correctly-renamed string, or match explicitly on known variants.

MatejKosec · 2026-04-06T22:11:37Z

Follow-up: v1/responses Wire-Format Changes & Test Gap

Deep analysis of the v1/responses path reveals visible wire-format changes with no regression test coverage.

Fields removed from Response JSON

Field	Old Value	After PR
`frequency_penalty`	`0.0`	absent
`presence_penalty`	`0.0`	absent
`store`	`false`/`true`	absent
`max_tool_calls`	`null`	absent

Fields that flip from `null` to omitted

error, incomplete_details, prompt, prompt_cache_key, safety_identifier, conversation, billing — all previously serialized as "field": null, now omitted entirely (upstream uses skip_serializing_if).

Concrete client breakage scenario

response = requests.post("/v1/responses", json={"input": "hello"}).json()
fp = response["frequency_penalty"]  # Was 0.0, now KeyError
store = response.get("store", True)  # Was False, now defaults to True (flipped!)

The gap

No test anywhere serializes a Response to JSON and validates the shape. The Rust unit tests check struct field values but never serialize. The Python E2E tests check object, id, status, output[0].type but none of the fields that changed.

Recommendation

Add a ~30-line wire-format snapshot test:

let json = serde_json::to_value(&response).unwrap();
assert!(json.get("frequency_penalty").is_none());
assert!(json.get("store").is_none());
assert!(json.get("presence_penalty").is_none());
assert_eq!(json["metadata"], serde_json::json!({}));
assert!(json["output"][0].get("id").is_some());
assert!(json["output"][0].get("status").is_some());

Also consider running the OpenResponses compliance suite (bun run test:compliance) which the test file itself references.

- Add skip_serializing_if to stream field on CreateChatCompletionRequest to avoid serializing null (some backends reject it) - Fix ImageModel wildcard using Debug output instead of serde rename (gptimage1 vs gpt-image-1) by matching all variants explicitly - Guard file_id-only InputImage with targeted error before URL parsing - Add wire-format snapshot test for Response JSON shape documenting that frequency_penalty, presence_penalty, store, max_tool_calls are intentionally absent (request-level fields, not in OpenAI spec)

…t messages Two regressions in the Responses API → Chat Completions bridge when a prior turn contains a `function_call` + assistant text + `function_call_output` sequence (emitted by OpenAI Agents SDK, Codex, etc.): 1. Tool-call chain broken by interstitial assistant text. `convert_input_items_to_messages` emitted three separate chat messages (assistant-with-tool-calls, assistant-with-text, tool). Jinja templates that require a tool message to directly follow its assistant tool_call (e.g. MiniMax) then see the interstitial assistant message reset `last_tool_call.name` and reject the payload with "Message has tool role, but there was no previous assistant message with a tool call!". Coalesce adjacent assistant-side items (OutputMessage, FunctionCall, assistant EasyMessage) into a single ChatCompletionRequestAssistantMessage carrying both `content` and `tool_calls`. This matches the Chat Completions spec and lets downstream templates pair tool calls with their outputs. 2. Strict upstream deserialization rejects bare assistant messages. After the async-openai 0.34 upgrade (#7625) the lenient fix from #6599 was dropped: upstream `OutputMessage` requires `id` + `status`, and `OutputTextContent` requires `annotations`. Clients like Codex send `{"type":"message","role":"assistant","content":[{"type":"output_text", "text":"..."}]}` without these, failing with "data did not match any variant of untagged enum InputParam". Add a custom Deserialize on NvCreateResponse that patches these items with synthetic defaults before delegating to the upstream strict types. Tested live against MiniMax-M2.7 on dynamo.frontend: - Codex `codex exec "hello minimax"` round-trips successfully - All three previously-failing repro shapes (simpler, structured w/ id+status, structured w/o id+status) now return 200 Covered by three new unit tests in responses/mod.rs.

ishandhanani and others added 12 commits March 21, 2026 07:51

style: cargo fmt

da1b03a

style: remove block comment separators from anthropic types

115e23e

chore: update bindings/python Cargo.lock for nvext removal

f7febca

Merge branch 'main' into idhanani/remove-nvext-from-base-types

ad7a2da

fix: guard empty choices in stream readers and add nvext aggregation …

0135af4

…tests Handle empty choices chunks in text and batch stream readers to avoid panics on usage-only events. Add regression tests verifying nvext passthrough during stream aggregation.

ishandhanani requested a review from a team as a code owner March 25, 2026 08:08

pull-request-size Bot added the size/XXL label Mar 25, 2026

github-actions Bot added refactor frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Mar 25, 2026

ishandhanani mentioned this pull request Mar 25, 2026

RFC: Publish dynamo-protocols and dynamo-parsers as standalone crates #7563

Closed

1 task

ishandhanani marked this pull request as draft March 25, 2026 08:19

copy-pr-bot Bot temporarily deployed to GITLAB March 25, 2026 08:22 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 25, 2026 08:23 Inactive

ishandhanani force-pushed the idhanani/protocols-cleanup branch from 01aa666 to b7f9fa6 Compare March 25, 2026 08:28

copy-pr-bot Bot temporarily deployed to GITLAB March 25, 2026 08:28 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 25, 2026 08:29 Inactive

ishandhanani force-pushed the idhanani/protocols-cleanup branch from b7f9fa6 to f33f0dc Compare March 25, 2026 08:30

copy-pr-bot Bot temporarily deployed to GITLAB March 25, 2026 08:30 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 25, 2026 08:34 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 25, 2026 09:03 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 25, 2026 09:04 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 6, 2026 17:22 Inactive

ishandhanani added 2 commits April 6, 2026 10:34

fix(protocols): remove unused deps

e8fd72c

chore: update lockfile

e0e28b2

copy-pr-bot Bot temporarily deployed to GITLAB April 6, 2026 17:35 Inactive

chore: sync bindings lockfiles

2d4dd6b

copy-pr-bot Bot temporarily deployed to GITLAB April 6, 2026 17:44 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 6, 2026 18:38 Inactive

grahamking approved these changes Apr 6, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to GITLAB April 7, 2026 18:44 Inactive

Merge branch 'main' into idhanani/protocols-cleanup

a662453

copy-pr-bot Bot temporarily deployed to GITLAB April 7, 2026 20:08 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 7, 2026 21:34 Inactive

Merge branch 'main' into idhanani/protocols-cleanup

7c11594

copy-pr-bot Bot temporarily deployed to GITLAB April 7, 2026 23:35 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 8, 2026 01:39 Inactive

Merge branch 'main' into idhanani/protocols-cleanup

2ad56f0

copy-pr-bot Bot temporarily deployed to GITLAB April 8, 2026 06:53 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 8, 2026 07:30 Inactive

Merge branch 'main' into idhanani/protocols-cleanup

4d35f18

copy-pr-bot Bot temporarily deployed to GITLAB April 8, 2026 18:44 Inactive

ishandhanani enabled auto-merge (squash) April 8, 2026 18:47

copy-pr-bot Bot had a problem deploying to GITLAB April 8, 2026 19:10 Failure

ishandhanani merged commit fd5cc28 into main Apr 8, 2026
94 of 95 checks passed

ishandhanani deleted the idhanani/protocols-cleanup branch April 8, 2026 19:29

ishandhanani mentioned this pull request Apr 16, 2026

fix(responses): own input chain in dynamo-protocols to accept Codex/Agents-SDK shapes #8275

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(3/3): switch dynamo-protocols to upstream async-openai types#7625

refactor(3/3): switch dynamo-protocols to upstream async-openai types#7625
ishandhanani merged 50 commits into
mainfrom
idhanani/protocols-cleanup

ishandhanani commented Mar 25, 2026 •

edited

Loading

Uh oh!

grahamking left a comment

Uh oh!

MatejKosec commented Apr 6, 2026

Uh oh!

MatejKosec commented Apr 6, 2026

Uh oh!

MatejKosec commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ishandhanani commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Approach

Types retained locally (inference-serving extensions)

What remains in dynamo-protocols

Test plan

Summary by CodeRabbit

Uh oh!

grahamking left a comment

Choose a reason for hiding this comment

Uh oh!

MatejKosec commented Apr 6, 2026

Review: Type Migration Risk Assessment

1. stream field serializes null (Medium Risk)

2. v1/responses has near-zero test coverage (High Risk)

3. InputVideo/InputAudio handling silently removed

4. store behavioral change

5. convert_image_detail_str does JSON round-trip per image

Uh oh!

MatejKosec commented Apr 6, 2026

Follow-up: ImageModel Wildcard Bug (New Finding)

Uh oh!

MatejKosec commented Apr 6, 2026

Follow-up: v1/responses Wire-Format Changes & Test Gap

Fields removed from Response JSON

Fields that flip from null to omitted

Concrete client breakage scenario

The gap

Recommendation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ishandhanani commented Mar 25, 2026 •

edited

Loading

1. `stream` field serializes `null` (Medium Risk)

3. `InputVideo`/`InputAudio` handling silently removed

4. `store` behavioral change

5. `convert_image_detail_str` does JSON round-trip per image

Fields that flip from `null` to omitted