feat(interruption): port barge-in cooldown window from python (#5269)#1366
Merged
toubatbrian merged 4 commits intomainfrom May 5, 2026
Merged
feat(interruption): port barge-in cooldown window from python (#5269)#1366toubatbrian merged 4 commits intomainfrom
toubatbrian merged 4 commits intomainfrom
Conversation
Adds a configurable backchannel boundary that keeps adaptive/VAD interruption detection active for a short cooldown at the start of agent speech (default 1000 ms) and releases held user transcripts whose end time falls within a trailing cooldown (default 3500 ms) when the agent finishes. This enables quick corrections and premature answers to the agent's last-sentence questions to flow through. Configured via turnHandling.interruption.backchannelBoundary on the session options (number, [start, end] tuple, or null to disable). Ref: livekit/agents#5269 https://claude.ai/code/session_01PR-port-5269
|
|
🦋 Changeset detectedLatest commit: 78cc8f8 The changes in this PR will be included in the next version bump. This PR includes changesets to release 29 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
theomonnom
approved these changes
May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Automated port of livekit/agents#5269 (
feat(interruption): barge-in cooldown window for corrections) into agents-js.cc @toubatbrian @livekit/agent-devs — please review.
The Python PR introduces a configurable cooldown around agent speech that:
1.0 s) before disabling it. Adaptive interruption stays enabled but its results are dropped during the cooldown. This lets the user quickly correct themselves at the very start of the agent's turn ("Stop!", "No wait, …") instead of being shut out by the agent immediately suppressing VAD interruption.3.5 s— higher than the start to absorb STT timestamp inaccuracy) by widening the_ignoreUserTranscriptUntilcutoff. This surfaces premature answers to the agent's last-sentence questions as normal user input instead of dropping them.Ported Features
1.
turnHandling.interruption.backchannelBoundaryagents/src/voice/turn_config/interruption.tsInterruptionOptions.backchannelBoundary: number | [number, number] | null. A single number applies to both sides; a[start, end]tuple configures them separately;nulldisables.defaultInterruptionOptions.backchannelBoundary = [1000, 3500](ms), matching Python's(1.0, 3.5)(s) per the time-unit unification rule inCLAUDE.md.2. Backchannel boundary timer in
AudioRecognitionagents/src/voice/audio_recognition.tsbackchannelBoundaryonAudioRecognitionOptions. Normalized to[start, end](ms) tuple; non-negative validation throws on invalid input.backchannelBoundaryActive: boolean— whether the start-side timer is currently running.backchannelBoundaryCallback?: () => void— fired exactly once when the timer expires naturally;agent_activityregisters a callback that disablesisInterruptionByAudioActivityEnabledonly if the agent is still speaking.cancelBackchannelBoundary()— clears the timer and the registered callback.onStartOfAgentSpeech()schedules asetTimeout(onBackchannelBoundaryDone, startCooldown)whenstart > 0.onEndOfAgentSpeech(ignoreUserTranscriptUntil)cancels the timer, then subtractsendCooldownfrom the chosenignoreUserTranscriptUntilcutoff before flushing held transcripts. Thecooldownis plumbed intoflushHeldTranscripts(cooldown)for diagnostic logging (addedDelay).flushHeldTranscriptsinsideonSTTEvent(when a buffered transcript needs to be flushed before processing a new event) now also passes the trailing-cooldown.onOverlapSpeechEventearly-returns whilebackchannelBoundaryActiveis true, dropping adaptive-interruption events during the cooldown (matches the Python guard).disableInterruptionDetection()andclose()cancel any pending timer to avoid leaked handles on shutdown.3.
disableVadInterruptionSooninAgentActivityagents/src/voice/agent_activity.tsthis.isInterruptionByAudioActivityEnabled = falseassignments at the agent-speech-started callbacks (TTS path, audio-output path, and the false-interruption resume path) withthis.disableVadInterruptionSoon().backchannelBoundaryCallbackon theAudioRecognitioninstance when the boundary timer is active. The callback only flipsisInterruptionByAudioActivityEnabledtofalseif the agent is still in'speaking'state when the timer fires (matching Python's "only disable it if the agent is still speaking" check).restoreInterruptionByAudioActivitynow callsaudioRecognition.cancelBackchannelBoundary()first so a queued callback cannot fire after the default flag has been restored.new AudioRecognition({ backchannelBoundary: … })construction site.Implementation Notes (language-level differences)
float); JS uses milliseconds (number). All defaults and validation paths multiply by1000perCLAUDE.md§2. ThesetTimeoutcall site uses the ms value directly (no* 1000at the call site).asyncio.get_running_loop().call_later→setTimeout. Python schedules the boundary expiry via the event loop'scall_laterand stores the returnedTimerHandle. JS usessetTimeoutand stores theReturnType<typeof setTimeout>handle; cancellation isclearTimeout. Semantics are equivalent — fire-once, cancellable.session.options.interruptionlookup insideAudioRecognition. The PythonAudioRecognitionreadssession.options.interruption.get("backchannel_boundary")directly off theAgentSessionit was constructed with. The JS class does not hold a reference toAgentSession; instead the newbackchannelBoundaryis passed as an explicit constructor option andAgentActivityreads it fromagentSession.sessionOptions.turnHandling.interruption.backchannelBoundaryat construction time. Behavior is unchanged because that field is the source of truth on both sides._disable_vad_interruption_soonlives onAgentActivity, notAudioRecognition. Python's_disable_vad_interruption_soonreadsself._interruption_by_audio_activity_enabled(a flag onAgentActivity) inside the timer callback. Putting the helper onAudioRecognitionwould have required reaching back into the activity. JS keeps the flag onAgentActivity(where it has always lived) and registers a closure asbackchannelBoundaryCallbackon the recognition instance. Same observable behavior, slightly different call direction._flush_held_transcriptsadded_delay. Python computesadded_delayand includes it in alogger.trace(...)call but never schedules an actual delay with it. The JS port mirrors this exactly (computed for the trace, not used to gate the emit).on_end_of_agent_speechearly return. Python guards withif not self._interruption_enabled: return; JS guards withif (!this.isInterruptionEnabled)plus anisAgentSpeaking = false. JS now callsthis.cancelBackchannelBoundary()before this guard so the timer is cleaned up regardless of whether the rest of the function runs (Python does the same)._cancel_backchannel_boundaryfromupdate_interruption_detection. Python addsself._cancel_backchannel_boundary()insideupdate_interruption_detectionwhen interruption detection is being torn down. The JS equivalent isdisableInterruptionDetection(), which now ends withthis.cancelBackchannelBoundary()to match._interruption_by_audio_activity_enabled = Falsecallsites (two TTS paths, one realtime-LLM path, one false-interruption-resume path). agents-js has three (TTS, audio-output, false-interruption-resume) because the realtime-LLM speaking flow shares the audio-output path. All three were updated to calldisableVadInterruptionSoon().Tests
pnpm build:agentspasses.pnpm lint --filter=@livekit/agentspasses (only pre-existingno-explicit-anyandtsdoc/syntaxwarnings on unrelated files remain).pnpm format:checkpasses.src/voice/agent_activity.test.ts— 8 tests, all pass.src/voice/audio_recognition_handoff.test.ts— 6 tests, all pass.src/voice/audio_recognition_span.test.ts— 4 tests, all pass.src/voice/turn_config/utils.test.ts— 14 tests, all pass.src/voice/agent.test.ts— 18 tests, all pass.pnpm api:checkfailure is pre-existing onmain(unrelatedexport * as ___syntax limitation indist/index.d.ts).test_start_boundary_does_not_block_vad_interruption,test_backchannel_boundary_suppresses_start_boundary_interruption,test_backchannel_boundary_releases_end_boundary_transcript) lean heavily on Python-only test helpers (FakeActions,_TestRecognitionHooks,BaseEndpointing, internalAudioRecognitionconstructor with channel hand-injection) that don't have direct JS counterparts. Equivalent behavior is exercised end-to-end via Agent Playground and is left as a follow-up if a JS-native test harness lands.Test plan
turnHandling: { interruption: { backchannelBoundary: [50, 0] } }configured — confirm an adaptive-interruption event fired during the first 50 ms is dropped, and one fired after expires the cooldown is honored.backchannelBoundary: [0, 500]and adaptive interruption enabled, ask the agent a question, then begin a transcript that ends within 500 ms of the agent finishing — confirm the transcript is released as user input rather than being held.backchannelBoundary: nulldisables both sides cleanly (no timer scheduled, no cooldown applied at end).session.aclose()does not leak the boundary timer.https://claude.ai/code/session_01PR-port-5269
Generated by Claude Code