feat: expose service_tier in CompletionUsage from OpenAI Responses API#5341
Merged
davidzhao merged 1 commit intolivekit:mainfrom Apr 5, 2026
Merged
Conversation
91d06ab to
7d13649
Compare
OpenAI returns service_tier (e.g. "default", "priority", "flex") in every API response, indicating the processing tier that was actually used. This is important for accurate cost tracking since priority tier has different billing rates. Changes: - Add service_tier field to CompletionUsage (optional, defaults to None) - Read event.response.service_tier in OpenAI Responses plugin's _handle_response_completed and pass it through to CompletionUsage This allows downstream consumers (session reports, webhooks, billing) to know which service tier was used for each LLM call.
7d13649 to
3fa5281
Compare
osimhi213
added a commit
to de-id/livekit-agents
that referenced
this pull request
Apr 5, 2026
* upstream/main: fix: add PARTICIPANT_KIND_CONNECTOR to default participant kinds (livekit#5339) feat: expose service_tier in CompletionUsage from OpenAI Responses API (livekit#5341) feat: answering machine detection (livekit#4906) fix: wait_for_participant waits until participant is fully active (livekit#5271) (gemini realtime): add warnings in update_chat_ctx and update_instructions (livekit#5332) fix: convert oneOf to anyOf in strict schema for discriminated unions (livekit#5324) fix(voice): make function call history preservation configurable in AgentTask (livekit#5288)
osimhi213
added a commit
to de-id/livekit-agents
that referenced
this pull request
Apr 5, 2026
* fix(voice): make function call history preservation configurable in AgentTask (livekit#5288) * fix: convert oneOf to anyOf in strict schema for discriminated unions (livekit#5324) * (gemini realtime): add warnings in update_chat_ctx and update_instructions (livekit#5332) * fix: wait_for_participant waits until participant is fully active (livekit#5271) * feat: answering machine detection (livekit#4906) Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> * feat: expose service_tier in CompletionUsage from OpenAI Responses API (livekit#5341) * fix: add PARTICIPANT_KIND_CONNECTOR to default participant kinds (livekit#5339) --------- Co-authored-by: Gopal Bagaswar <67310594+GopalGB@users.noreply.github.com> Co-authored-by: Long Chen <longch1024@gmail.com> Co-authored-by: Tina Nguyen <72938484+tinalenguyen@users.noreply.github.com> Co-authored-by: David Zhao <dz@livekit.io> Co-authored-by: Chenghao Mou <chenghao.mou@livekit.io> Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Piyush Gambhir <90608533+piyush-gambhir@users.noreply.github.com> Co-authored-by: Anunay Maheshwari <anunaym14@gmail.com>
russellmartin-livekit
pushed a commit
that referenced
this pull request
Apr 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
OpenAI returns
service_tier(e.g."default","priority","flex") in every API response, indicating the processing tier that was actually used to serve the request. This is important for accurate cost tracking since priority tier has different billing rates.Currently, both the Responses API plugin and the Chat Completions inference layer parse usage data but ignore
service_tier. This PR adds it toCompletionUsageso downstream consumers can access it.Changes
livekit-agents/livekit/agents/llm/llm.py: Addservice_tier: str | None = Nonefield toCompletionUsagelivekit-plugins/livekit-plugins-openai/.../responses/llm.py: Readevent.response.service_tierin_handle_response_completedand pass it toCompletionUsagelivekit-agents/livekit/agents/inference/llm.py: Readchunk.service_tierin Chat Completions stream and pass it toCompletionUsageWhy
service_tieris configured at the project level, some requests may be downgraded to"default"under ramp rate limitsResponseobject andChatCompletionChunk— just not being readDesign Note
service_tieris semantically response metadata rather than token usage. Placing it onCompletionUsageis a pragmatic choice —CompletionUsageis the object that flows through the metrics/usage collection pipeline (ModelUsageCollector, session reports, etc.), so it propagates automatically with zero changes to the pipeline.If the maintainers prefer cleaner separation, this could be moved to
ChatChunkdirectly — happy to refactor if that's the preferred approach.Backward Compatible
service_tierdefaults toNone— no impact on existing codeNoneRelated