Skip to content

OpenAI Realtime: response.error mid-flight causes future to hang indefinitely (TODO at realtime_model.py:2005) #5566

@cphoward

Description

@cphoward

TL;DR

OpenAI Realtime: response.error mid-flight causes future to hang indefinitely (TODO at realtime_model.py:2005)

Summary

When the OpenAI Realtime substrate sends a response.error event mid-flight (during a generation), the future associated with that response is never resolved — it hangs indefinitely. Callers awaiting the future block forever.

The source code at livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model.py:2005 acknowledges this with a TODO comment.

Reproduction

from livekit.plugins.openai.realtime import RealtimeModel

session = RealtimeModel(model="gpt-4o-realtime-preview-2025-06-03").session()
fut = session.generate_reply(instructions="...")
# If OpenAI sends response.error mid-stream (e.g., due to substrate
# rate limit, content policy violation, or similar), the future
# `fut` will never resolve.
try:
    result = await asyncio.wait_for(fut, timeout=30.0)
except asyncio.TimeoutError:
    # This is the only way to detect the hang from the caller side
    print("future hung; substrate sent response.error and we didn't handle it")

Source code reference

Around line 2005 in realtime_model.py, in _handle_response_error (or equivalent):

def _handle_response_error(self, event: ResponseErrorEvent) -> None:
    # TODO: handle response.error mid-flight
    # Currently the response future hangs indefinitely.
    ...

(Exact TODO text and surrounding code may differ in current main; the line number reference is from the version this issue was authored against.)

Proposed fix

When response.error arrives mid-flight, the handler should:

  1. Identify the affected response by event.response_id
  2. Pop the future from _response_created_futures (or equivalent state)
  3. Resolve the future with RealtimeError(event.error.message) or similar
  4. Clean up any associated state (e.g., the in-flight _ResponseGeneration entry)
def _handle_response_error(self, event: ResponseErrorEvent) -> None:
    """Handle response.error: reject the associated future and clean up."""
    response_id = event.response_id  # may need to derive from event shape
    error_msg = event.error.message if event.error else "unknown error"

    # Reject any pending response_created future (matches by client_event_id
    # in metadata, if available; otherwise iterate)
    if event.metadata and (event_id := event.metadata.get("client_event_id")):
        if (fut := self._response_created_futures.pop(event_id, None)) is not None:
            if not fut.done():
                fut.set_exception(llm.RealtimeError(error_msg))

    # Clean up the in-flight generation state
    if self._current_generation is not None:
        # ... existing cleanup logic ...
        self._close_current_generation(reason=f"response.error: {error_msg}")

Impact

Production agents experiencing this bug see:

  • Hung futures consuming async task slots
  • Apparent "frozen" agent state when the substrate errors out
  • Difficult-to-diagnose timeouts upstream of the call site
  • Need for upstream asyncio.wait_for(...) wrappers (defensive code that shouldn't be necessary)

Acceptance criteria

  • _handle_response_error resolves the affected future with RealtimeError instead of leaving it hanging.
  • A test verifies the failure mode: simulate response.error mid-flight, assert the future resolves with RealtimeError.
  • The TODO comment at realtime_model.py:2005 is removed.

Related

  • Source code TODO at realtime_model.py:2005
  • Adjacent race conditions documented in source at realtime_model.py:1870 (response.done without prior response.created)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions