Skip to content

feat: expose service_tier in CompletionUsage from OpenAI Responses API#5341

Merged
davidzhao merged 1 commit intolivekit:mainfrom
piyush-gambhir:feat/expose-service-tier
Apr 5, 2026
Merged

feat: expose service_tier in CompletionUsage from OpenAI Responses API#5341
davidzhao merged 1 commit intolivekit:mainfrom
piyush-gambhir:feat/expose-service-tier

Conversation

@piyush-gambhir
Copy link
Copy Markdown
Contributor

@piyush-gambhir piyush-gambhir commented Apr 4, 2026

Summary

OpenAI returns service_tier (e.g. "default", "priority", "flex") in every API response, indicating the processing tier that was actually used to serve the request. This is important for accurate cost tracking since priority tier has different billing rates.

Currently, both the Responses API plugin and the Chat Completions inference layer parse usage data but ignore service_tier. This PR adds it to CompletionUsage so downstream consumers can access it.

Changes

  • livekit-agents/livekit/agents/llm/llm.py: Add service_tier: str | None = None field to CompletionUsage
  • livekit-plugins/livekit-plugins-openai/.../responses/llm.py: Read event.response.service_tier in _handle_response_completed and pass it to CompletionUsage
  • livekit-agents/livekit/agents/inference/llm.py: Read chunk.service_tier in Chat Completions stream and pass it to CompletionUsage

Why

  • OpenAI's Priority Processing bills at a premium rate
  • When service_tier is configured at the project level, some requests may be downgraded to "default" under ramp rate limits
  • Without this field, there's no way to know which tier was actually used for billing reconciliation
  • The field is already present on both the OpenAI Response object and ChatCompletionChunk — just not being read

Design Note

service_tier is semantically response metadata rather than token usage. Placing it on CompletionUsage is a pragmatic choice — CompletionUsage is the object that flows through the metrics/usage collection pipeline (ModelUsageCollector, session reports, etc.), so it propagates automatically with zero changes to the pipeline.

If the maintainers prefer cleaner separation, this could be moved to ChatChunk directly — happy to refactor if that's the preferred approach.

Backward Compatible

  • service_tier defaults to None — no impact on existing code
  • Providers that don't support it simply leave it as None

Related

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 4, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

OpenAI returns service_tier (e.g. "default", "priority", "flex") in
every API response, indicating the processing tier that was actually
used. This is important for accurate cost tracking since priority
tier has different billing rates.

Changes:
- Add service_tier field to CompletionUsage (optional, defaults to None)
- Read event.response.service_tier in OpenAI Responses plugin's
  _handle_response_completed and pass it through to CompletionUsage

This allows downstream consumers (session reports, webhooks, billing)
to know which service tier was used for each LLM call.
@piyush-gambhir piyush-gambhir force-pushed the feat/expose-service-tier branch from 7d13649 to 3fa5281 Compare April 4, 2026 23:39
Copy link
Copy Markdown
Member

@davidzhao davidzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg

@davidzhao davidzhao merged commit 37ca860 into livekit:main Apr 5, 2026
13 checks passed
osimhi213 added a commit to de-id/livekit-agents that referenced this pull request Apr 5, 2026
* upstream/main:
  fix: add PARTICIPANT_KIND_CONNECTOR to default participant kinds (livekit#5339)
  feat: expose service_tier in CompletionUsage from OpenAI Responses API (livekit#5341)
  feat: answering machine detection (livekit#4906)
  fix: wait_for_participant waits until participant is fully active (livekit#5271)
  (gemini realtime): add warnings in update_chat_ctx and update_instructions (livekit#5332)
  fix: convert oneOf to anyOf in strict schema for discriminated unions (livekit#5324)
  fix(voice): make function call history preservation configurable in AgentTask (livekit#5288)
osimhi213 added a commit to de-id/livekit-agents that referenced this pull request Apr 5, 2026
* fix(voice): make function call history preservation configurable in AgentTask (livekit#5288)

* fix: convert oneOf to anyOf in strict schema for discriminated unions (livekit#5324)

* (gemini realtime): add warnings in update_chat_ctx and update_instructions (livekit#5332)

* fix: wait_for_participant waits until participant is fully active (livekit#5271)

* feat: answering machine detection (livekit#4906)

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* feat: expose service_tier in CompletionUsage from OpenAI Responses API (livekit#5341)

* fix: add PARTICIPANT_KIND_CONNECTOR to default participant kinds (livekit#5339)

---------

Co-authored-by: Gopal Bagaswar <67310594+GopalGB@users.noreply.github.com>
Co-authored-by: Long Chen <longch1024@gmail.com>
Co-authored-by: Tina Nguyen <72938484+tinalenguyen@users.noreply.github.com>
Co-authored-by: David Zhao <dz@livekit.io>
Co-authored-by: Chenghao Mou <chenghao.mou@livekit.io>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Piyush Gambhir <90608533+piyush-gambhir@users.noreply.github.com>
Co-authored-by: Anunay Maheshwari <anunaym14@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants