feat(voice): add PreemptiveGenerationOptions for fine-grained control#1265
feat(voice): add PreemptiveGenerationOptions for fine-grained control#1265toubatbrian merged 5 commits intomainfrom
Conversation
Port of livekit/agents#5428. Adds PreemptiveGenerationOptions with configurable options to reduce wasted compute during preemptive generation: - preemptiveTts (default false): when false, only LLM runs preemptively and TTS starts after the turn is confirmed - maxSpeechDuration (default 10s): skip preemptive generation when user has been speaking too long - maxRetries (default 3): cap preemptive LLM requests per user turn, resets on turn completion The preemptiveGeneration parameter now lives inside turnHandling options. The old top-level boolean is deprecated with a migration path. https://claude.ai/code/session_01C6K9wneUorBnm9eK2rZjpt
🦋 Changeset detectedLatest commit: e414064 The changes in this PR will be included in the next version bump. This PR includes changesets to release 25 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
|
| @@ -1193,6 +1195,19 @@ export class AgentActivity implements RecognitionHooks { | |||
|
|
|||
| this.cancelPreemptiveGeneration(); | |||
There was a problem hiding this comment.
🔴 cancelPreemptiveGeneration() called before guard checks causes premature cancellation of valid speculative result
cancelPreemptiveGeneration() at line 1196 unconditionally cancels the existing preemptive generation before the new maxSpeechDuration (line 1198-1203) and maxRetries (line 1205-1207) guard checks. When either guard triggers an early return, the previous valid preemptive generation has already been cancelled and _preemptiveGeneration is set to undefined (via cancelPreemptiveGeneration at agents/src/voice/agent_activity.ts:1242-1246). When userTurnCompleted later runs, it checks this._preemptiveGeneration !== undefined at line 1705 and finds nothing — forcing a non-preemptive fallback via generateReply(). This defeats the purpose of the feature: the last successful speculative generation (e.g., the Nth attempt before hitting maxRetries) is thrown away, increasing response latency. The unit tests don't catch this because cancelPreemptiveGeneration is mocked as a no-op vi.fn() in the test harness (agent_activity.test.ts:269).
Prompt for agents
In agents/src/voice/agent_activity.ts, the onPreemptiveGeneration method calls this.cancelPreemptiveGeneration() on line 1196 before checking the maxSpeechDuration and maxRetries guards (lines 1198-1207). When either guard triggers an early return, the existing preemptive generation (which was the best speculative result so far) has already been cancelled with no replacement.
The fix is to move this.cancelPreemptiveGeneration() after the guard checks, right before this._preemptiveGenerationCount++ on line 1209. This way, the existing preemptive generation is only cancelled when a new one is about to be created.
Alternatively, move the guard checks (maxSpeechDuration and maxRetries) above the cancelPreemptiveGeneration() call.
The test in agent_activity.test.ts should also be updated: the cancelPreemptiveGeneration mock (line 269) doesn't test the real interaction between cancel and the guards. Consider verifying that _preemptiveGeneration is preserved (not set to undefined) when the guards trigger an early return.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Port of livekit/agents#5428 — adds
PreemptiveGenerationOptionswith configurable options to reduce wasted compute during preemptive generation.Changes
New
PreemptiveGenerationOptionsinterface (turn_config/preemptive_generation.ts) with:enabled(defaulttrue): whether preemptive generation is activepreemptiveTts(defaultfalse): whenfalse, only LLM runs preemptively; TTS starts once the turn is confirmed and speech is scheduledmaxSpeechDuration(default10000ms): skip preemptive generation when user has been speaking too long, since long utterances are more likely to changemaxRetries(default3): cap preemptive LLM requests per user turn; counter resets on turn completionMoved
preemptiveGenerationintoTurnHandlingOptions: The option is now configured viaturnHandling.preemptiveGenerationalongside endpointing and interruption. The old top-levelpreemptiveGeneration: booleanonAgentSessionOptionsis deprecated with a backward-compatible migration path.Pipeline TTS deferral: In
_pipelineReplyTaskImpl, TTS inference is now deferred until afterwaitForScheduledby default. WhenpreemptiveTts: true, TTS starts immediately alongside the LLM (previous behavior).Speech duration & retry guards in
onPreemptiveGeneration: Preemptive generation is skipped if the user has been speaking longer thanmaxSpeechDuration, or ifmaxRetriesattempts have already been made for the current turn.PreemptiveGenerationInfoextended withstartedSpeakingAtto enable the speech duration check.Usage
Implementation nuances (JS vs Python)
max_speech_duration: 10.0)maxSpeechDuration: 10_000)PreemptiveGenerationOptions(TypedDict, total=False)interface PreemptiveGenerationOptionswithPartial<>where neededtext_teewith lazy branch creationReadableStream.tee()but defersperformTTSInference()call{**defaults, **config}dict merge{ ...defaults, ...stripUndefined(config) }spread with undefined filtering_preemptive_generation_count: inton classprivate _preemptiveGenerationCount = 0snake_case(preemptive_tts,max_speech_duration)camelCase(preemptiveTts,maxSpeechDuration)Files changed
agents/src/voice/turn_config/preemptive_generation.ts— new:PreemptiveGenerationOptionsinterface and defaultsagents/src/voice/turn_config/turn_handling.ts— addedpreemptiveGenerationtoTurnHandlingOptionsandInternalTurnHandlingOptionsagents/src/voice/turn_config/utils.ts— updated migration logic,mergeWithDefaults, deprecated boolean migrationagents/src/voice/agent_session.ts— deprecated top-levelpreemptiveGeneration, removed from defaultsagents/src/voice/agent_activity.ts— added retry count, speech duration check, preemptive TTS deferralagents/src/voice/audio_recognition.ts— addedstartedSpeakingAttoPreemptiveGenerationInfo, passedspeechStartTimeagents/src/voice/remote_session.ts— updated serialization to use structured options fromturnHandlingagents/src/voice/report.ts— addedpreemptive_generationto report outputagents/src/voice/report.test.ts— updated test defaultsagents/src/voice/turn_config/utils.test.ts— added preemptive generation migration testsTest plan
agents/src/passpnpm build:agents)pnpm lint)pnpm format:check)restaurant_agent.tsworks in Agent Playground with default options (preemptive generation enabled, TTS deferred)restaurant_agent.tswithpreemptiveTts: truefor full preemptive pipelinepreemptiveGeneration: { enabled: false }correctly disables preemptive generationcc @toubatbrian @livekit/agent-devs