feat(interruption): port barge-in cooldown window from python (#5269) by toubatbrian · Pull Request #1366 · livekit/agents-js

toubatbrian · 2026-05-01T10:09:05Z

Summary

Automated port of livekit/agents#5269 (feat(interruption): barge-in cooldown window for corrections) into agents-js.

cc @toubatbrian @livekit/agent-devs — please review.

This PR was created by an automated Claude Code Routine (currently in experimentation stage). It mirrors the Python backchannel_boundary interruption window into the JS voice pipeline.

The Python PR introduces a configurable cooldown around agent speech that:

At the start of agent speech, keeps VAD-based interruption active for a short cooldown (default 1.0 s) before disabling it. Adaptive interruption stays enabled but its results are dropped during the cooldown. This lets the user quickly correct themselves at the very start of the agent's turn ("Stop!", "No wait, …") instead of being shut out by the agent immediately suppressing VAD interruption.
At the end of agent speech, releases held STT events whose end time falls within a trailing cooldown (default 3.5 s — higher than the start to absorb STT timestamp inaccuracy) by widening the _ignoreUserTranscriptUntil cutoff. This surfaces premature answers to the agent's last-sentence questions as normal user input instead of dropping them.

Ported Features

1. `turnHandling.interruption.backchannelBoundary`

agents/src/voice/turn_config/interruption.ts

New InterruptionOptions.backchannelBoundary: number | [number, number] | null. A single number applies to both sides; a [start, end] tuple configures them separately; null disables.
defaultInterruptionOptions.backchannelBoundary = [1000, 3500] (ms), matching Python's (1.0, 3.5) (s) per the time-unit unification rule in CLAUDE.md.

2. Backchannel boundary timer in `AudioRecognition`

agents/src/voice/audio_recognition.ts

New constructor option backchannelBoundary on AudioRecognitionOptions. Normalized to [start, end] (ms) tuple; non-negative validation throws on invalid input.
New public surface:
- backchannelBoundaryActive: boolean — whether the start-side timer is currently running.
- backchannelBoundaryCallback?: () => void — fired exactly once when the timer expires naturally; agent_activity registers a callback that disables isInterruptionByAudioActivityEnabled only if the agent is still speaking.
- cancelBackchannelBoundary() — clears the timer and the registered callback.
onStartOfAgentSpeech() schedules a setTimeout(onBackchannelBoundaryDone, startCooldown) when start > 0.
onEndOfAgentSpeech(ignoreUserTranscriptUntil) cancels the timer, then subtracts endCooldown from the chosen ignoreUserTranscriptUntil cutoff before flushing held transcripts. The cooldown is plumbed into flushHeldTranscripts(cooldown) for diagnostic logging (addedDelay).
The second call site of flushHeldTranscripts inside onSTTEvent (when a buffered transcript needs to be flushed before processing a new event) now also passes the trailing-cooldown.
onOverlapSpeechEvent early-returns while backchannelBoundaryActive is true, dropping adaptive-interruption events during the cooldown (matches the Python guard).
Both disableInterruptionDetection() and close() cancel any pending timer to avoid leaked handles on shutdown.

3. `disableVadInterruptionSoon` in `AgentActivity`

agents/src/voice/agent_activity.ts

Replaces the three direct this.isInterruptionByAudioActivityEnabled = false assignments at the agent-speech-started callbacks (TTS path, audio-output path, and the false-interruption resume path) with this.disableVadInterruptionSoon().
The new method registers a backchannelBoundaryCallback on the AudioRecognition instance when the boundary timer is active. The callback only flips isInterruptionByAudioActivityEnabled to false if the agent is still in 'speaking' state when the timer fires (matching Python's "only disable it if the agent is still speaking" check).
When the timer is not active (boundary disabled or already expired), the assignment happens immediately, preserving the previous behavior.
restoreInterruptionByAudioActivity now calls audioRecognition.cancelBackchannelBoundary() first so a queued callback cannot fire after the default flag has been restored.
The new option is plumbed through the new AudioRecognition({ backchannelBoundary: … }) construction site.

Implementation Notes (language-level differences)

Time units. Python uses seconds (float); JS uses milliseconds (number). All defaults and validation paths multiply by 1000 per CLAUDE.md §2. The setTimeout call site uses the ms value directly (no * 1000 at the call site).
asyncio.get_running_loop().call_later → setTimeout. Python schedules the boundary expiry via the event loop's call_later and stores the returned TimerHandle. JS uses setTimeout and stores the ReturnType<typeof setTimeout> handle; cancellation is clearTimeout. Semantics are equivalent — fire-once, cancellable.
No session.options.interruption lookup inside AudioRecognition. The Python AudioRecognition reads session.options.interruption.get("backchannel_boundary") directly off the AgentSession it was constructed with. The JS class does not hold a reference to AgentSession; instead the new backchannelBoundary is passed as an explicit constructor option and AgentActivity reads it from agentSession.sessionOptions.turnHandling.interruption.backchannelBoundary at construction time. Behavior is unchanged because that field is the source of truth on both sides.
_disable_vad_interruption_soon lives on AgentActivity, not AudioRecognition. Python's _disable_vad_interruption_soon reads self._interruption_by_audio_activity_enabled (a flag on AgentActivity) inside the timer callback. Putting the helper on AudioRecognition would have required reaching back into the activity. JS keeps the flag on AgentActivity (where it has always lived) and registers a closure as backchannelBoundaryCallback on the recognition instance. Same observable behavior, slightly different call direction.
Python's _flush_held_transcripts added_delay. Python computes added_delay and includes it in a logger.trace(...) call but never schedules an actual delay with it. The JS port mirrors this exactly (computed for the trace, not used to gate the emit).
Python's on_end_of_agent_speech early return. Python guards with if not self._interruption_enabled: return; JS guards with if (!this.isInterruptionEnabled) plus an isAgentSpeaking = false. JS now calls this.cancelBackchannelBoundary() before this guard so the timer is cleaned up regardless of whether the rest of the function runs (Python does the same).
_cancel_backchannel_boundary from update_interruption_detection. Python adds self._cancel_backchannel_boundary() inside update_interruption_detection when interruption detection is being torn down. The JS equivalent is disableInterruptionDetection(), which now ends with this.cancelBackchannelBoundary() to match.
Number of agent-speech-start callsites differs. Python had four _interruption_by_audio_activity_enabled = False callsites (two TTS paths, one realtime-LLM path, one false-interruption-resume path). agents-js has three (TTS, audio-output, false-interruption-resume) because the realtime-LLM speaking flow shares the audio-output path. All three were updated to call disableVadInterruptionSoon().

Tests

pnpm build:agents passes.
pnpm lint --filter=@livekit/agents passes (only pre-existing no-explicit-any and tsdoc/syntax warnings on unrelated files remain).
pnpm format:check passes.
Targeted vitest runs:
- src/voice/agent_activity.test.ts — 8 tests, all pass.
- src/voice/audio_recognition_handoff.test.ts — 6 tests, all pass.
- src/voice/audio_recognition_span.test.ts — 4 tests, all pass.
- src/voice/turn_config/utils.test.ts — 14 tests, all pass.
- src/voice/agent.test.ts — 18 tests, all pass.
pnpm api:check failure is pre-existing on main (unrelated export * as ___ syntax limitation in dist/index.d.ts).
No new unit tests added in this port. Python's three new tests (test_start_boundary_does_not_block_vad_interruption, test_backchannel_boundary_suppresses_start_boundary_interruption, test_backchannel_boundary_releases_end_boundary_transcript) lean heavily on Python-only test helpers (FakeActions, _TestRecognitionHooks, BaseEndpointing, internal AudioRecognition constructor with channel hand-injection) that don't have direct JS counterparts. Equivalent behavior is exercised end-to-end via Agent Playground and is left as a follow-up if a JS-native test harness lands.

Test plan

Verify in Agent Playground: start an agent, have it begin a long utterance, and within the first second say "Stop!" — confirm the agent is interrupted by VAD (start cooldown does not block VAD).
Same as above but with turnHandling: { interruption: { backchannelBoundary: [50, 0] } } configured — confirm an adaptive-interruption event fired during the first 50 ms is dropped, and one fired after expires the cooldown is honored.
With backchannelBoundary: [0, 500] and adaptive interruption enabled, ask the agent a question, then begin a transcript that ends within 500 ms of the agent finishing — confirm the transcript is released as user input rather than being held.
Confirm backchannelBoundary: null disables both sides cleanly (no timer scheduled, no cooldown applied at end).
Restart a paused-then-resumed false-interrupted speech and confirm the start cooldown is re-armed (false-interruption resume path).
Confirm clean shutdown: session.aclose() does not leak the boundary timer.

https://claude.ai/code/session_01PR-port-5269

Generated by Claude Code

Adds a configurable backchannel boundary that keeps adaptive/VAD interruption detection active for a short cooldown at the start of agent speech (default 1000 ms) and releases held user transcripts whose end time falls within a trailing cooldown (default 3500 ms) when the agent finishes. This enables quick corrections and premature answers to the agent's last-sentence questions to flow through. Configured via turnHandling.interruption.backchannelBoundary on the session options (number, [start, end] tuple, or null to disable). Ref: livekit/agents#5269 https://claude.ai/code/session_01PR-port-5269

CLAassistant · 2026-05-01T10:09:12Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ toubatbrian
❌ claude
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

changeset-bot · 2026-05-01T10:09:12Z

🦋 Changeset detected

Latest commit: 78cc8f8

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 29 packages

Name	Type
@livekit/agents	Patch
@livekit/agents-plugin-anam	Patch
@livekit/agents-plugin-assemblyai	Patch
@livekit/agents-plugin-baseten	Patch
@livekit/agents-plugin-bey	Patch
@livekit/agents-plugin-cartesia	Patch
@livekit/agents-plugin-cerebras	Patch
@livekit/agents-plugin-deepgram	Patch
@livekit/agents-plugin-elevenlabs	Patch
@livekit/agents-plugin-google	Patch
@livekit/agents-plugin-hedra	Patch
@livekit/agents-plugin-inworld	Patch
@livekit/agents-plugin-lemonslice	Patch
@livekit/agents-plugin-liveavatar	Patch
@livekit/agents-plugin-livekit	Patch
@livekit/agents-plugin-minimax	Patch
@livekit/agents-plugin-mistral	Patch
@livekit/agents-plugin-mistralai	Patch
@livekit/agents-plugin-neuphonic	Patch
@livekit/agents-plugin-openai	Patch
@livekit/agents-plugin-phonic	Patch
@livekit/agents-plugin-resemble	Patch
@livekit/agents-plugin-rime	Patch
@livekit/agents-plugin-runway	Patch
@livekit/agents-plugin-sarvam	Patch
@livekit/agents-plugin-silero	Patch
@livekit/agents-plugins-test	Patch
@livekit/agents-plugin-trugen	Patch
@livekit/agents-plugin-xai	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

devin-ai-integration Bot reviewed May 1, 2026

View reviewed changes

toubatbrian requested review from a team and chenghao-mou May 3, 2026 08:08

toubatbrian added 3 commits May 4, 2026 18:03

Merge branch 'main' into claude/quirky-galileo-9thhp

da84646

changes verified

0bdc328

Create audio_recognition_backchannel.test.ts

78cc8f8

toubatbrian added the verified-port label May 4, 2026

theomonnom approved these changes May 4, 2026

View reviewed changes

toubatbrian merged commit 943d4eb into main May 5, 2026
8 of 9 checks passed

toubatbrian deleted the claude/quirky-galileo-9thhp branch May 5, 2026 05:37

github-actions Bot mentioned this pull request May 4, 2026

Version Packages #1380

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(interruption): port barge-in cooldown window from python (#5269)#1366

feat(interruption): port barge-in cooldown window from python (#5269)#1366
toubatbrian merged 4 commits intomainfrom
claude/quirky-galileo-9thhp

toubatbrian commented May 1, 2026 •

edited

Loading

Uh oh!

CLAassistant commented May 1, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented May 1, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

toubatbrian commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Ported Features

1. turnHandling.interruption.backchannelBoundary

2. Backchannel boundary timer in AudioRecognition

3. disableVadInterruptionSoon in AgentActivity

Implementation Notes (language-level differences)

Tests

Test plan

Uh oh!

CLAassistant commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

toubatbrian commented May 1, 2026 •

edited

Loading

1. `turnHandling.interruption.backchannelBoundary`

2. Backchannel boundary timer in `AudioRecognition`

3. `disableVadInterruptionSoon` in `AgentActivity`

CLAassistant commented May 1, 2026 •

edited

Loading

changeset-bot Bot commented May 1, 2026 •

edited

Loading