refactor(3/3): switch dynamo-protocols to upstream async-openai types#7625
Conversation
…c-openai Move NVIDIA-specific nvext fields from the base response types (CreateChatCompletionResponse, CreateChatCompletionStreamResponse, CreateCompletionResponse) into dynamo-llm's Nv* wrapper structs where they belong. This makes the base types pure OpenAI spec, preparing dynamo-async-openai for standalone publishing. No behavior change -- nvext is still carried on the wrapper types and the JSON wire format is identical thanks to serde(flatten).
Move pure Anthropic Messages API types (request, response, streaming, error, count-tokens) and CacheControl into dynamo-async-openai as types::anthropic module. dynamo-llm re-exports them and retains only the bidirectional conversion logic (Anthropic <-> NvChat). This makes Anthropic protocol types available to standalone consumers alongside the existing OpenAI and Responses API types.
Rename crate and directory (lib/async-openai -> lib/protocols) to better reflect the crate's role as protocol type definitions rather than an OpenAI client fork. Global find-replace of dynamo_async_openai -> dynamo_protocols across 63 files. Mechanical rename only -- no logic changes.
…tests Handle empty choices chunks in text and batch stream readers to avoid panics on usage-only events. Add regression tests verifying nvext passthrough during stream aggregation.
ToSchema was derived on ~91 types in the protocols crate but never consumed for OpenAPI schema generation -- only the Nv* wrapper types in dynamo-llm are referenced in openapi_docs.rs. Removing it drops the utoipa dependency from dynamo-protocols, unblocking the eventual switch to upstream async-openai. dynamo-llm wrapper types that flatten protocol types now use #[schema(value_type = Object)] to avoid requiring ToSchema on the inner protocol types.
Replace forked chat type definitions with re-exports from upstream async-openai v0.34. Types that are structurally identical to upstream are now re-exported directly. Types with inference-serving extensions (reasoning_content, stop_reason, multimodal content, video/audio inputs) remain locally defined with documentation of the delta. Key changes: - Add async-openai = "0.34" dependency (types-only features) - Re-export ~50 upstream types from async_openai::types::chat - Keep local definitions for extended types: ChatChoice, ChatChoiceStream, ChatCompletionStreamResponseDelta, ChatCompletionResponseMessage, CreateChatCompletionRequest, ChatCompletionStreamOptions, ChatCompletionRequestMessage, ChatCompletionRequestAssistantMessage, ImageUrl, VideoUrl, AudioUrl - Adapt consumers for upstream API changes: - ChatCompletionMessageToolCall no longer has `type` field - FunctionType replaces ChatCompletionToolType for tool call chunks - StopConfiguration replaces Stop
…exports - completion.rs: re-export CreateCompletionResponse from upstream, keep CreateCompletionRequest locally (has prompt_embeds, echo validation, extended stream_options) - embedding.rs: full re-export from upstream (identical types) - responses/: full re-export from upstream async-openai v0.34, delete 4526 lines of forked type definitions (api.rs, conversation.rs, impls.rs, response.rs, sdk.rs, stream.rs) - Update dynamo-llm response consumers for upstream API changes Total: -4886 lines removed from fork.
The fork's client create_stream expects the fork's internal CreateChatCompletionRequest, but our chat types are now locally defined (re-exported from upstream). Switch to create_stream_byot which accepts any Serialize type, bridging the type mismatch. Note: these client wrappers are only used in tests, not production.
01aa666 to
b7f9fa6
Compare
Delete the forked async-openai HTTP client, API handler modules, and remaining forked type definitions. dynamo-protocols now contains only: - Upstream re-exports (chat, completion, embedding, responses, images) - Locally-defined inference-serving extensions (documented) - Anthropic Messages API types - MCP types - Error types (slimmed down, no reqwest dependency) Removed: - 31 forked API handler modules (client.rs, config.rs, chat.rs, etc.) - 24 forked type modules (assistant, audio, batch, file, model, etc.) - shared/ directory (re-exported through upstream responses) - lib/llm/src/http/client.rs (dead code, never imported) - Forked tests that depended on the client (byot, whisper) - async-openai-macros, reqwest, backoff, secrecy, and other client-only dependencies Total: -10,718 lines
b7f9fa6 to
f33f0dc
Compare
- Use FunctionType instead of ChatCompletionToolType for streaming tool call chunk assertions (upstream renamed the enum) - Remove deleted HTTP client tests (PureOpenAI/NvCustom/GenericBYOT clients were part of the fork, not needed) - Normalize fixture comparison in postprocessor_parsing_stream to handle upstream serde skip_serializing_if differences - Fix stale comments referencing old crate/path names
Review: Type Migration Risk AssessmentThorough analysis of the type conversion changes. Overall the refactor is clean and well-structured — the -18K line reduction is a big maintenance win. A few things that might be overlooked: 1.
|
Follow-up: ImageModel Wildcard Bug (New Finding)The wildcard match at _ => format!("{:?}", m).to_lowercase()produces wrong model names for upstream
This would cause Fix: Use |
Follow-up: v1/responses Wire-Format Changes & Test GapDeep analysis of the v1/responses path reveals visible wire-format changes with no regression test coverage. Fields removed from Response JSON
Fields that flip from
|
- Add skip_serializing_if to stream field on CreateChatCompletionRequest to avoid serializing null (some backends reject it) - Fix ImageModel wildcard using Debug output instead of serde rename (gptimage1 vs gpt-image-1) by matching all variants explicitly - Guard file_id-only InputImage with targeted error before URL parsing - Add wire-format snapshot test for Response JSON shape documenting that frequency_penalty, presence_penalty, store, max_tool_calls are intentionally absent (request-level fields, not in OpenAI spec)
…t messages Two regressions in the Responses API → Chat Completions bridge when a prior turn contains a `function_call` + assistant text + `function_call_output` sequence (emitted by OpenAI Agents SDK, Codex, etc.): 1. Tool-call chain broken by interstitial assistant text. `convert_input_items_to_messages` emitted three separate chat messages (assistant-with-tool-calls, assistant-with-text, tool). Jinja templates that require a tool message to directly follow its assistant tool_call (e.g. MiniMax) then see the interstitial assistant message reset `last_tool_call.name` and reject the payload with "Message has tool role, but there was no previous assistant message with a tool call!". Coalesce adjacent assistant-side items (OutputMessage, FunctionCall, assistant EasyMessage) into a single ChatCompletionRequestAssistantMessage carrying both `content` and `tool_calls`. This matches the Chat Completions spec and lets downstream templates pair tool calls with their outputs. 2. Strict upstream deserialization rejects bare assistant messages. After the async-openai 0.34 upgrade (#7625) the lenient fix from #6599 was dropped: upstream `OutputMessage` requires `id` + `status`, and `OutputTextContent` requires `annotations`. Clients like Codex send `{"type":"message","role":"assistant","content":[{"type":"output_text", "text":"..."}]}` without these, failing with "data did not match any variant of untagged enum InputParam". Add a custom Deserialize on NvCreateResponse that patches these items with synthetic defaults before delegating to the upstream strict types. Tested live against MiniMax-M2.7 on dynamo.frontend: - Codex `codex exec "hello minimax"` round-trips successfully - All three previously-failing repro shapes (simpler, structured w/ id+status, structured w/o id+status) now return 200 Covered by three new unit tests in responses/mod.rs.
Summary
Replace the entire forked async-openai codebase in dynamo-protocols with upstream async-openai v0.34 re-exports + locally-defined inference-serving extensions.
19,108 deletions and 1,089 additions in the current PR diff (net -18,019 lines). dynamo-protocols now contains a minimal crate root, slimmed error types, and protocol type definitions -- no HTTP client, no API handlers, and no forked copies of upstream types.
Approach
The fork originally vendored the entire async-openai crate (HTTP client, API handlers, type definitions) and added NVIDIA-specific fields directly onto base OpenAI types. This PR reverses that by:
Adding upstream async-openai as a types-only dependency (
default-features = falsewithchat-completion-types,response-types,completion-types,embedding-types,image-types)Re-exporting upstream types where identical -- ~50 chat types (CompletionUsage, FunctionCall, Role, Prompt, ChatCompletionMessageToolCall, etc.), all embedding types, all response types, all image types. Consumers still import from
dynamo_protocols::types::*-- the re-exports are transparent.Keeping local definitions for types we extend -- when a type has inference-serving fields that upstream doesn't have, we define it locally and document the delta. For example:
Types form a containment tree -- if a child type is extended, all parent containers must also be locally defined. So
ChatChoiceis local because it contains our extendedChatCompletionResponseMessage,CreateChatCompletionResponseis local because it containsChatChoice, etc.Adapting consumers for upstream API changes that differ from the fork:
ChatCompletionMessageToolCallno longer has atypefield (always "function" in the spec)ChatCompletionToolTyperenamed toFunctionTypein upstreamStoprenamed toStopConfigurationin upstreamImagesResponsegained new fields (background,output_format,quality,size,usage)ImageModelgained new variants (GptImage1,GptImage1dot5,GptImage1Mini)Deleting everything else -- 31 API handler modules, HTTP client, config, 24 forked type modules that were either identical to upstream or unused. Removed client-only dependencies (reqwest, backoff, secrecy, async-openai-macros, etc.).
Types retained locally (inference-serving extensions)
ChatCompletionResponseMessagecontent: Option<ChatCompletionMessageContent>(enum: Text or multimodal Parts),reasoning_content: Option<String>ChatCompletionStreamResponseDeltacontent+reasoning_contentfieldsChatChoice/ChatChoiceStreamstop_reason: Option<StopReason>CreateChatCompletionRequestmm_processor_kwargs: Option<Value>ChatCompletionStreamOptionscontinuous_usage_stats: boolChatCompletionRequestAssistantMessagereasoning_content: Option<ReasoningContent>ChatCompletionRequestUserMessageContentPartVideoUrl(...),AudioUrl(...)variantsImageUrlurl: url::Url(not String),uuid: Option<Uuid>CreateCompletionRequestprompt_embeds: Option<String>, strictechovalidationReasoningContentStopReasonChatCompletionMessageContentanthropic/typesWhat remains in dynamo-protocols
Stack: 3/3 -- depends on #7565, see RFC #7563
Test plan
cargo test -p dynamo-protocols-- passcargo test -p dynamo-parsers-- 354 passcargo test -p dynamo-llm --lib-- 759 passcargo check-- clean across all cratesSummary by CodeRabbit
New Features
Breaking Changes
Refactor