Skip to content

feat(interruption): port barge-in cooldown window from python (#5269)#1366

Merged
toubatbrian merged 4 commits intomainfrom
claude/quirky-galileo-9thhp
May 5, 2026
Merged

feat(interruption): port barge-in cooldown window from python (#5269)#1366
toubatbrian merged 4 commits intomainfrom
claude/quirky-galileo-9thhp

Conversation

@toubatbrian
Copy link
Copy Markdown
Contributor

@toubatbrian toubatbrian commented May 1, 2026

Summary

Automated port of livekit/agents#5269 (feat(interruption): barge-in cooldown window for corrections) into agents-js.

cc @toubatbrian @livekit/agent-devs — please review.

This PR was created by an automated Claude Code Routine (currently in experimentation stage). It mirrors the Python backchannel_boundary interruption window into the JS voice pipeline.

The Python PR introduces a configurable cooldown around agent speech that:

  1. At the start of agent speech, keeps VAD-based interruption active for a short cooldown (default 1.0 s) before disabling it. Adaptive interruption stays enabled but its results are dropped during the cooldown. This lets the user quickly correct themselves at the very start of the agent's turn ("Stop!", "No wait, …") instead of being shut out by the agent immediately suppressing VAD interruption.
  2. At the end of agent speech, releases held STT events whose end time falls within a trailing cooldown (default 3.5 s — higher than the start to absorb STT timestamp inaccuracy) by widening the _ignoreUserTranscriptUntil cutoff. This surfaces premature answers to the agent's last-sentence questions as normal user input instead of dropping them.

Ported Features

1. turnHandling.interruption.backchannelBoundary

agents/src/voice/turn_config/interruption.ts

  • New InterruptionOptions.backchannelBoundary: number | [number, number] | null. A single number applies to both sides; a [start, end] tuple configures them separately; null disables.
  • defaultInterruptionOptions.backchannelBoundary = [1000, 3500] (ms), matching Python's (1.0, 3.5) (s) per the time-unit unification rule in CLAUDE.md.

2. Backchannel boundary timer in AudioRecognition

agents/src/voice/audio_recognition.ts

  • New constructor option backchannelBoundary on AudioRecognitionOptions. Normalized to [start, end] (ms) tuple; non-negative validation throws on invalid input.
  • New public surface:
    • backchannelBoundaryActive: boolean — whether the start-side timer is currently running.
    • backchannelBoundaryCallback?: () => void — fired exactly once when the timer expires naturally; agent_activity registers a callback that disables isInterruptionByAudioActivityEnabled only if the agent is still speaking.
    • cancelBackchannelBoundary() — clears the timer and the registered callback.
  • onStartOfAgentSpeech() schedules a setTimeout(onBackchannelBoundaryDone, startCooldown) when start > 0.
  • onEndOfAgentSpeech(ignoreUserTranscriptUntil) cancels the timer, then subtracts endCooldown from the chosen ignoreUserTranscriptUntil cutoff before flushing held transcripts. The cooldown is plumbed into flushHeldTranscripts(cooldown) for diagnostic logging (addedDelay).
  • The second call site of flushHeldTranscripts inside onSTTEvent (when a buffered transcript needs to be flushed before processing a new event) now also passes the trailing-cooldown.
  • onOverlapSpeechEvent early-returns while backchannelBoundaryActive is true, dropping adaptive-interruption events during the cooldown (matches the Python guard).
  • Both disableInterruptionDetection() and close() cancel any pending timer to avoid leaked handles on shutdown.

3. disableVadInterruptionSoon in AgentActivity

agents/src/voice/agent_activity.ts

  • Replaces the three direct this.isInterruptionByAudioActivityEnabled = false assignments at the agent-speech-started callbacks (TTS path, audio-output path, and the false-interruption resume path) with this.disableVadInterruptionSoon().
  • The new method registers a backchannelBoundaryCallback on the AudioRecognition instance when the boundary timer is active. The callback only flips isInterruptionByAudioActivityEnabled to false if the agent is still in 'speaking' state when the timer fires (matching Python's "only disable it if the agent is still speaking" check).
  • When the timer is not active (boundary disabled or already expired), the assignment happens immediately, preserving the previous behavior.
  • restoreInterruptionByAudioActivity now calls audioRecognition.cancelBackchannelBoundary() first so a queued callback cannot fire after the default flag has been restored.
  • The new option is plumbed through the new AudioRecognition({ backchannelBoundary: … }) construction site.

Implementation Notes (language-level differences)

  • Time units. Python uses seconds (float); JS uses milliseconds (number). All defaults and validation paths multiply by 1000 per CLAUDE.md §2. The setTimeout call site uses the ms value directly (no * 1000 at the call site).
  • asyncio.get_running_loop().call_latersetTimeout. Python schedules the boundary expiry via the event loop's call_later and stores the returned TimerHandle. JS uses setTimeout and stores the ReturnType<typeof setTimeout> handle; cancellation is clearTimeout. Semantics are equivalent — fire-once, cancellable.
  • No session.options.interruption lookup inside AudioRecognition. The Python AudioRecognition reads session.options.interruption.get("backchannel_boundary") directly off the AgentSession it was constructed with. The JS class does not hold a reference to AgentSession; instead the new backchannelBoundary is passed as an explicit constructor option and AgentActivity reads it from agentSession.sessionOptions.turnHandling.interruption.backchannelBoundary at construction time. Behavior is unchanged because that field is the source of truth on both sides.
  • _disable_vad_interruption_soon lives on AgentActivity, not AudioRecognition. Python's _disable_vad_interruption_soon reads self._interruption_by_audio_activity_enabled (a flag on AgentActivity) inside the timer callback. Putting the helper on AudioRecognition would have required reaching back into the activity. JS keeps the flag on AgentActivity (where it has always lived) and registers a closure as backchannelBoundaryCallback on the recognition instance. Same observable behavior, slightly different call direction.
  • Python's _flush_held_transcripts added_delay. Python computes added_delay and includes it in a logger.trace(...) call but never schedules an actual delay with it. The JS port mirrors this exactly (computed for the trace, not used to gate the emit).
  • Python's on_end_of_agent_speech early return. Python guards with if not self._interruption_enabled: return; JS guards with if (!this.isInterruptionEnabled) plus an isAgentSpeaking = false. JS now calls this.cancelBackchannelBoundary() before this guard so the timer is cleaned up regardless of whether the rest of the function runs (Python does the same).
  • _cancel_backchannel_boundary from update_interruption_detection. Python adds self._cancel_backchannel_boundary() inside update_interruption_detection when interruption detection is being torn down. The JS equivalent is disableInterruptionDetection(), which now ends with this.cancelBackchannelBoundary() to match.
  • Number of agent-speech-start callsites differs. Python had four _interruption_by_audio_activity_enabled = False callsites (two TTS paths, one realtime-LLM path, one false-interruption-resume path). agents-js has three (TTS, audio-output, false-interruption-resume) because the realtime-LLM speaking flow shares the audio-output path. All three were updated to call disableVadInterruptionSoon().

Tests

  • pnpm build:agents passes.
  • pnpm lint --filter=@livekit/agents passes (only pre-existing no-explicit-any and tsdoc/syntax warnings on unrelated files remain).
  • pnpm format:check passes.
  • Targeted vitest runs:
    • src/voice/agent_activity.test.ts — 8 tests, all pass.
    • src/voice/audio_recognition_handoff.test.ts — 6 tests, all pass.
    • src/voice/audio_recognition_span.test.ts — 4 tests, all pass.
    • src/voice/turn_config/utils.test.ts — 14 tests, all pass.
    • src/voice/agent.test.ts — 18 tests, all pass.
  • pnpm api:check failure is pre-existing on main (unrelated export * as ___ syntax limitation in dist/index.d.ts).
  • No new unit tests added in this port. Python's three new tests (test_start_boundary_does_not_block_vad_interruption, test_backchannel_boundary_suppresses_start_boundary_interruption, test_backchannel_boundary_releases_end_boundary_transcript) lean heavily on Python-only test helpers (FakeActions, _TestRecognitionHooks, BaseEndpointing, internal AudioRecognition constructor with channel hand-injection) that don't have direct JS counterparts. Equivalent behavior is exercised end-to-end via Agent Playground and is left as a follow-up if a JS-native test harness lands.

Test plan

  • Verify in Agent Playground: start an agent, have it begin a long utterance, and within the first second say "Stop!" — confirm the agent is interrupted by VAD (start cooldown does not block VAD).
  • Same as above but with turnHandling: { interruption: { backchannelBoundary: [50, 0] } } configured — confirm an adaptive-interruption event fired during the first 50 ms is dropped, and one fired after expires the cooldown is honored.
  • With backchannelBoundary: [0, 500] and adaptive interruption enabled, ask the agent a question, then begin a transcript that ends within 500 ms of the agent finishing — confirm the transcript is released as user input rather than being held.
  • Confirm backchannelBoundary: null disables both sides cleanly (no timer scheduled, no cooldown applied at end).
  • Restart a paused-then-resumed false-interrupted speech and confirm the start cooldown is re-armed (false-interruption resume path).
  • Confirm clean shutdown: session.aclose() does not leak the boundary timer.

https://claude.ai/code/session_01PR-port-5269


Generated by Claude Code

Adds a configurable backchannel boundary that keeps adaptive/VAD interruption
detection active for a short cooldown at the start of agent speech (default
1000 ms) and releases held user transcripts whose end time falls within a
trailing cooldown (default 3500 ms) when the agent finishes. This enables
quick corrections and premature answers to the agent's last-sentence
questions to flow through.

Configured via turnHandling.interruption.backchannelBoundary on the session
options (number, [start, end] tuple, or null to disable).

Ref: livekit/agents#5269
https://claude.ai/code/session_01PR-port-5269
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 1, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ toubatbrian
❌ claude
You have signed the CLA already but the status is still pending? Let us recheck it.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 1, 2026

🦋 Changeset detected

Latest commit: 78cc8f8

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 29 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-mistralai Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

@toubatbrian toubatbrian requested review from a team and chenghao-mou May 3, 2026 08:08
@toubatbrian toubatbrian merged commit 943d4eb into main May 5, 2026
8 of 9 checks passed
@toubatbrian toubatbrian deleted the claude/quirky-galileo-9thhp branch May 5, 2026 05:37
@github-actions github-actions Bot mentioned this pull request May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants