-
Notifications
You must be signed in to change notification settings - Fork 3.1k
fix(voice): add PreemptiveGenerationOptions for fine-grained control #5428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
b7a57fb
87df85c
35b2d14
725e55f
a5c1678
21de229
f250d41
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -64,11 +64,13 @@ | |||||||
| from .turn import ( | ||||||||
| EndpointingOptions, | ||||||||
| InterruptionOptions, | ||||||||
| PreemptiveGenerationOptions, | ||||||||
| TurnDetectionMode, | ||||||||
| TurnHandlingOptions, | ||||||||
| _migrate_turn_handling, | ||||||||
| _resolve_endpointing, | ||||||||
| _resolve_interruption, | ||||||||
| _resolve_preemptive_generation, | ||||||||
| ) | ||||||||
|
|
||||||||
| if TYPE_CHECKING: | ||||||||
|
|
@@ -135,7 +137,6 @@ class AgentSessionOptions: | |||||||
| turn_handling: TurnHandlingOptions | ||||||||
| max_tool_steps: int | ||||||||
| user_away_timeout: float | None | ||||||||
| preemptive_generation: bool | ||||||||
| min_consecutive_speech_delay: float | ||||||||
| use_tts_aligned_transcript: bool | None | ||||||||
| tts_text_transforms: Sequence[TextTransforms] | None | ||||||||
|
|
@@ -151,6 +152,10 @@ def endpointing(self) -> EndpointingOptions: | |||||||
| def interruption(self) -> InterruptionOptions: | ||||||||
| return self.turn_handling["interruption"] | ||||||||
|
|
||||||||
| @property | ||||||||
| def preemptive_generation(self) -> PreemptiveGenerationOptions: | ||||||||
| return self.turn_handling["preemptive_generation"] | ||||||||
|
Comment on lines
+155
to
+157
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't need to expose it if we expose turn_handling
Suggested change
|
||||||||
|
|
||||||||
|
|
||||||||
| Userdata_T = TypeVar("Userdata_T") | ||||||||
| Run_T = TypeVar("Run_T") | ||||||||
|
|
@@ -202,6 +207,7 @@ class AgentSession(rtc.EventEmitter[EventTypes], Generic[Userdata_T]): | |||||||
| "allow_interruptions": "Use turn_handling=TurnHandlingOptions(...) instead", | ||||||||
| "discard_audio_if_uninterruptible": "Use turn_handling=TurnHandlingOptions(...) instead", | ||||||||
| "min_interruption_duration": "Use turn_handling=TurnHandlingOptions(...) instead", | ||||||||
| "preemptive_generation": "Use turn_handling=TurnHandlingOptions(...) instead", | ||||||||
| "min_interruption_words": "Use turn_handling=TurnHandlingOptions(...) instead", | ||||||||
| "turn_detection": "Use turn_handling=TurnHandlingOptions(...) instead", | ||||||||
| "agent_false_interruption_timeout": "Use turn_handling=TurnHandlingOptions(...) instead", | ||||||||
|
|
@@ -227,7 +233,6 @@ def __init__( | |||||||
| # Misc settings | ||||||||
| userdata: NotGivenOr[Userdata_T] = NOT_GIVEN, | ||||||||
| video_sampler: NotGivenOr[_VideoSampler | None] = NOT_GIVEN, | ||||||||
| preemptive_generation: bool = True, | ||||||||
| aec_warmup_duration: float | None = 3.0, | ||||||||
| ivr_detection: bool = False, | ||||||||
| user_away_timeout: float | None = 15.0, | ||||||||
|
|
@@ -236,6 +241,7 @@ def __init__( | |||||||
| conn_options: NotGivenOr[SessionConnectOptions] = NOT_GIVEN, | ||||||||
| loop: asyncio.AbstractEventLoop | None = None, | ||||||||
| # deprecated | ||||||||
| preemptive_generation: NotGivenOr[bool] = NOT_GIVEN, | ||||||||
| min_endpointing_delay: NotGivenOr[float] = NOT_GIVEN, | ||||||||
| max_endpointing_delay: NotGivenOr[float] = NOT_GIVEN, | ||||||||
| false_interruption_timeout: NotGivenOr[float | None] = NOT_GIVEN, | ||||||||
|
|
@@ -294,20 +300,14 @@ def __init__( | |||||||
| user_away_timeout (float, optional): If set, set the user state as | ||||||||
| "away" after this amount of time after user and agent are silent. | ||||||||
| Defaults to ``15.0`` s, set to ``None`` to disable. | ||||||||
| preemptive_generation (bool): | ||||||||
| Whether to speculatively begin LLM and TTS requests before an end-of-turn is | ||||||||
| detected. When True, the agent sends inference calls as soon as a user | ||||||||
| transcript is received rather than waiting for a definitive turn boundary. This | ||||||||
| can reduce response latency by overlapping model inference with user audio, | ||||||||
| but may incur extra compute if the user interrupts or revises mid-utterance. | ||||||||
| Defaults to ``True``. | ||||||||
| aec_warmup_duration (float, optional): The duration in seconds that the agent | ||||||||
| will ignore user's audio interruptions after the agent starts speaking. | ||||||||
| This is useful to prevent the agent from being interrupted by echo before AEC is ready. | ||||||||
| Set to ``None`` to disable. Default ``3.0`` s. | ||||||||
| session_close_transcript_timeout (float, optional): Seconds to wait for the | ||||||||
| final STT transcript when closing the session (after audio is detached). | ||||||||
| Default ``2.0`` s (independent of ``commit_user_turn``'s ``transcript_timeout``). | ||||||||
| preemptive_generation (NotGivenOr[bool | PreemptiveGenerationOptions]): Deprecated, use turn_handling=TurnHandlingOptions(...) instead. | ||||||||
| min_endpointing_delay (NotGivenOr[float]): Deprecated, use turn_handling=TurnHandlingOptions(...) instead. | ||||||||
| max_endpointing_delay (NotGivenOr[float]): Deprecated, use turn_handling=TurnHandlingOptions(...) instead. | ||||||||
| false_interruption_timeout (NotGivenOr[float | None]): Deprecated, use turn_handling=TurnHandlingOptions(...) instead. | ||||||||
|
|
@@ -330,12 +330,12 @@ def __init__( | |||||||
| turn_handling = ( | ||||||||
| _migrate_turn_handling( | ||||||||
| # backward compatibility for deprecated parameters that had default values | ||||||||
| min_endpointing_delay=min_endpointing_delay | ||||||||
| if is_given(min_endpointing_delay) | ||||||||
| else 0.5, | ||||||||
| max_endpointing_delay=max_endpointing_delay | ||||||||
| if is_given(max_endpointing_delay) | ||||||||
| else 3.0, | ||||||||
| min_endpointing_delay=( | ||||||||
| min_endpointing_delay if is_given(min_endpointing_delay) else 0.5 | ||||||||
| ), | ||||||||
| max_endpointing_delay=( | ||||||||
| max_endpointing_delay if is_given(max_endpointing_delay) else 3.0 | ||||||||
| ), | ||||||||
| false_interruption_timeout=false_interruption_timeout, | ||||||||
| turn_detection=turn_detection, | ||||||||
| discard_audio_if_uninterruptible=discard_audio_if_uninterruptible, | ||||||||
|
|
@@ -344,13 +344,15 @@ def __init__( | |||||||
| allow_interruptions=allow_interruptions, | ||||||||
| resume_false_interruption=resume_false_interruption, | ||||||||
| agent_false_interruption_timeout=agent_false_interruption_timeout, | ||||||||
| preemptive_generation=preemptive_generation, | ||||||||
| ) | ||||||||
| if not is_given(turn_handling) | ||||||||
| else turn_handling | ||||||||
| ) | ||||||||
|
|
||||||||
| endpointing = _resolve_endpointing(turn_handling.get("endpointing")) | ||||||||
| interruption = _resolve_interruption(turn_handling.get("interruption")) | ||||||||
| preemptive_gen = _resolve_preemptive_generation(turn_handling.get("preemptive_generation")) | ||||||||
| raw_turn_detection = turn_handling.get("turn_detection", None) | ||||||||
|
|
||||||||
| # This is the "global" chat_context, it holds the entire conversation history | ||||||||
|
|
@@ -360,10 +362,10 @@ def __init__( | |||||||
| endpointing=endpointing, | ||||||||
| interruption=interruption, | ||||||||
| turn_detection=raw_turn_detection, | ||||||||
| preemptive_generation=preemptive_gen, | ||||||||
| ), | ||||||||
| max_tool_steps=max_tool_steps, | ||||||||
| user_away_timeout=user_away_timeout, | ||||||||
| preemptive_generation=preemptive_generation, | ||||||||
| min_consecutive_speech_delay=min_consecutive_speech_delay, | ||||||||
| tts_text_transforms=( | ||||||||
| tts_text_transforms | ||||||||
|
|
||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -328,7 +328,7 @@ def _serialize_options(opts: AgentSessionOptions) -> dict[str, str]: | |
| "interruption": str(dict(opts.interruption)), | ||
| "max_tool_steps": str(opts.max_tool_steps), | ||
| "user_away_timeout": str(opts.user_away_timeout), | ||
| "preemptive_generation": str(opts.preemptive_generation), | ||
| "preemptive_generation": str(dict(opts.preemptive_generation)), | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 Existing test mock uses bool for preemptive_generation, but _serialize_options now calls dict() on it The change from Prompt for agentsWas this helpful? React with 👍 or 👎 to provide feedback. |
||
| "min_consecutive_speech_delay": str(opts.min_consecutive_speech_delay), | ||
| "use_tts_aligned_transcript": str(opts.use_tts_aligned_transcript), | ||
| "ivr_detection": str(opts.ivr_detection), | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Existing preemptive generation is cancelled before max_retries check, discarding valid work
In
on_preemptive_generation,_cancel_preemptive_generation()is called unconditionally on line 1783 before themax_retriescheck on line 1791. When_preemptive_generation_count >= max_retries, the method returns early without starting a new generation — but the previous (most recent) preemptive generation has already been cancelled and set toNone. This means the last successful preemptive generation is destroyed without replacement. Later, in_user_turn_completed_taskat line 1995,self._preemptive_generationisNone, so the preemptive result can never be used and a fresh (non-preemptive) LLM call is always made instead. This defeats the purpose of themax_retrieslimit, which should keep the last generation alive when retries are exhausted.The fix is to move
_cancel_preemptive_generation()after the early-return checks (or at least after themax_retriescheck), so the existing generation is only cancelled when it will actually be replaced by a new one.Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is expected,
on_preemptive_generationis called when user transcript changed, so the previous preemptive generation is invalid, we should cancel it asap.