Skip to content

Conversation

@SaHiL-Ez
Copy link

@SaHiL-Ez SaHiL-Ez commented Jan 18, 2026

Implemented a context-aware interruption handling system to solve the issue where the agent would abruptly stop speaking when the user gave passive acknowledgments (e.g., "yeah", "mhmm", "okay").

🛠️ Key Implementation Details
Runtime Monkeypatch: Implemented a runtime patch for
AgentActivity
to intercept the VAD (Voice Activity Detection) interruption signal before it pauses the audio stream.
Delta-Based Detection: Created logic to extract only the newly spoken words (the delta) from the accumulated transcript. This ensures that a user saying "mhmm" is correctly identified as a backchannel, while "mhmm wait stop" is treated as an interruption.
External Configuration: Added
filter_config.json
to allow easy modification of ignored words ("backchannel_words") and interrupt triggers ("command_words") without code changes.

📂 Files Added/Modified
examples/voice_agents/backchannel_patch.py
: Contains the core logic for the monkeypatch and delta detection.
examples/voice_agents/filter_config.json
: New configuration file for defining ignored word lists.
examples/voice_agents/basic_agent.py
: Updated to apply the patch on startup.
examples/voice_agents/README.md
: Updated with usage and configuration instructions.

✅ Test Results
Scenario 1 (Backchannel): User says "yeah/okay" while agent speaks -> Agent continues without pausing.
Scenario 2 (Interruption): User says "stop/wait" -> Agent interrupts immediately.
Scenario 3 (Mixed): User says "yeah but wait" -> Agent interrupts (detects command word).

Here is the attached drive link with video link - https://drive.google.com/drive/folders/125VJhOGH4_xUBFxZsN2SOSiYlcwZthEU?usp=sharing

Summary by CodeRabbit

  • New Features

    • Added intelligent interruption handling for voice agents that distinguishes between backchannel acknowledgments (e.g., "mm-hmm") and actual user commands, preventing inadvertent interruptions.
    • Introduced configurable word filtering with separate backchannel and command word lists via a configuration file.
  • Documentation

    • Updated voice agent guide to focus on interruption handling workflows with concrete testing scenarios.
  • Tests

    • Added comprehensive test suite validating interruption filter behavior across various input scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


SaHiL-Ez seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 18, 2026

📝 Walkthrough

Walkthrough

This PR implements intelligent interruption handling for LiveKit Agent by detecting and filtering "backchannel" words (acknowledgments like "uh-huh," "yeah") to prevent false interruptions during agent speech. It adds a new InterruptionFilter class, patches core agent activity logic, integrates filtering into a voice agent example, and includes comprehensive tests and configuration.

Changes

Cohort / File(s) Summary
Core Agent Activity Module
livekit-agents/livekit/agents/voice/agent_activity.py
Adds backchannel word sets (BACKCHANNEL_WORDS, COMMAND_WORDS) and delta-extraction logic (_extract_words, _get_transcript_delta, _update_last_transcript) to compute new words since last processed turn. Integrates backchannel checks into _interrupt_by_audio_activity and on_end_of_turn to block interruptions when delta is backchannel-only.
Backchannel Patch Module
examples/voice_agents/backchannel_patch.py
New module that loads configuration from filter_config.json, computes transcript deltas, and provides patched methods (patched_interrupt_by_audio_activity, patched_on_end_of_turn) to replace AgentActivity methods at runtime with backchannel filtering behavior.
Interruption Filter Logic
examples/voice_agents/interruption_filter.py
New class InterruptionFilter that determines whether a user input should interrupt agent speech by checking transcript against backchannel/command word sets, with configurable thresholds and runtime modification methods (add/remove word sets).
Example Agent Integration
examples/voice_agents/basic_agent.py
Integrates interruption filtering into the voice agent example via backchannel patch application, state tracking (agent_is_speaking, pending_interruption_check), and event handlers (_on_agent_state_changed, _on_user_input_transcribed) to manage interim and final transcript filtering.
Configuration & Tests
examples/voice_agents/filter_config.json, examples/voice_agents/test_interruption_filter.py
Configuration file defining backchannel_words (e.g., "um", "uh", "yeah") and command_words (e.g., "stop", "wait"). Comprehensive test suite covering backchannel detection during agent speech, command word handling, case-insensitivity, custom word sets, and dynamic word set modification.
Documentation
examples/voice_agents/README.md
Replaces generic asset catalog with focused narrative on intelligent interruption handling; reorganizes content into operational steps (Run, Connect client), explains core problem and solution ("Guard at the Gate" interception), includes delta-detection logic explanation and concrete test scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Agent
    participant VoiceAgent as Voice Agent (basic_agent)
    participant TranscriptHandler as Transcript Handler
    participant InterruptionFilter
    participant AgentActivity
    
    User->>Agent: Speech input (interim transcript)
    Agent->>VoiceAgent: user_input_transcribed event
    VoiceAgent->>TranscriptHandler: _on_user_input_transcribed
    TranscriptHandler->>InterruptionFilter: should_interrupt(transcript, agent_is_speaking=true)
    InterruptionFilter->>InterruptionFilter: normalize & tokenize
    InterruptionFilter->>InterruptionFilter: check for command words
    alt command word detected
        InterruptionFilter-->>TranscriptHandler: return true
        TranscriptHandler->>Agent: allow interruption (log)
    else pure backchannel
        InterruptionFilter-->>TranscriptHandler: return false
        TranscriptHandler->>Agent: suppress interruption (log)
    else substantial content
        InterruptionFilter-->>TranscriptHandler: return true
        TranscriptHandler->>Agent: allow interruption
    end
    
    User->>Agent: Speech input (final transcript)
    Agent->>AgentActivity: on_end_of_turn event
    AgentActivity->>AgentActivity: _check_backchannel_delta
    AgentActivity->>AgentActivity: compute delta vs last transcript
    alt delta is backchannel-only
        AgentActivity->>AgentActivity: block end-of-turn
        AgentActivity-->>Agent: clear user turn (if agent speaking)
    else delta is substantial
        AgentActivity->>AgentActivity: proceed with turn completion
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The PR introduces heterogeneous changes across multiple modules with distinct responsibilities: core agent activity patching with delta-extraction logic, a new InterruptionFilter class with word-set management, runtime method patching, event handler integration, configuration parsing, and comprehensive test coverage. Logic density is moderate-to-high with multiple decision points and state tracking. The spread across both library code (agent_activity.py) and example code requires separate reasoning for integration correctness.

Possibly related PRs

  • livekit/agents#4536: Refactors interrupt handling in agent_activity.py; may conflict or require coordination with backchannel delta-check integration in _interrupt_by_audio_activity and on_end_of_turn.

Suggested reviewers

  • longcw

Poem

🐰 Hops with joy over backchannel care,
"Uh-huhs" and "yeahs" float through the air!
No more false stops when agents are chatting,
Smart filters guard—no interrupts patting. 🛡️

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.85% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is vague and uses generic wording that does not clearly convey the main feature being implemented. Consider using a more descriptive title that highlights the core feature, such as 'Add intelligent backchannel interruption filtering for voice agents' or 'Implement context-aware interrupt handler to prevent unintended agent pausing'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🤖 Fix all issues with AI agents
In `@examples/voice_agents/backchannel_patch.py`:
- Around line 38-39: The global _last_processed_transcript is unsafe for
concurrent AgentActivity instances; change state to an instance attribute by
moving the variable into AgentActivity (e.g., self._bc_last_transcript) and
update any patched methods that read/write _last_processed_transcript to use
self._bc_last_transcript instead so each AgentActivity keeps its own
last-transcript state; ensure initialization (in __init__ or when the patch is
applied) and all references in the patched methods are updated accordingly.
- Around line 122-172: This monkeypatch reimplements AgentActivity internals
(patched_interrupt_by_audio_activity referencing
AgentActivity._interrupt_by_audio_activity and on_end_of_turn behavior) and is
fragile to upstream changes; add a fail-fast version check at module import that
reads livekit_agents.__version__ and asserts it matches the exact version (or
acceptable semver range) this patch targets, raise a clear RuntimeError with
guidance if the check fails, add a single-line comment documenting the specific
livekit-agents version the monkeypatch was written for, and tighten
requirements.txt (pin or add an upper bound for livekit-agents) so CI will catch
incompatibilities early; consider adding a TODO linking to upstream contribution
as future work.
- Around line 14-16: CONFIG_FILE is a relative path which will break when the
script is executed from a different working directory; change CONFIG_FILE to
compute an absolute path based on this module's location (using __file__) and
update load_config() to open that resolved path so the file is found regardless
of cwd — locate the CONFIG_FILE constant and the load_config function in
backchannel_patch.py and replace the relative path usage with a path join
(module directory + "filter_config.json") before loading BACKCHANNEL_WORDS and
COMMAND_WORDS.

In `@examples/voice_agents/basic_agent.py`:
- Around line 53-55: Remove the unused remnants by deleting the attributes
self.pending_interruption_check and self.interruption_check_delay from the class
in examples/voice_agents/basic_agent.py (or, if intended, implement their use by
wiring interruption checks into the STT/transcript flow); specifically remove
the assignments to pending_interruption_check and interruption_check_delay (or
replace with a documented, implemented interruption scheduling mechanism
referenced by those names) so there are no unused variables left in the agent
class.

In `@examples/voice_agents/interruption_filter.py`:
- Line 29: DEFAULT_BACKCHANNEL_WORDS contains multi-word entries like "got it"
but _is_pure_backchannel currently tokenizes input and checks individual tokens
against that set, so multi-word phrases never match; update _is_pure_backchannel
to detect multi-word backchannel phrases the same way _contains_command handles
multi-word commands (e.g., check contiguous token sequences or join tokens and
search for phrase matches), or alternatively remove multi-word entries from
DEFAULT_BACKCHANNEL_WORDS; modify the logic in _is_pure_backchannel to first
test multi-word phrases from DEFAULT_BACKCHANNEL_WORDS against the token
sequence before falling back to per-token membership checks.
- Around line 57-58: The instance assignments for backchannel_words and
command_words currently reference the class-level sets
(DEFAULT_BACKCHANNEL_WORDS, DEFAULT_COMMAND_WORDS), causing shared mutable
state; change the constructor to copy those defaults when None is passed (e.g.,
set self.backchannel_words = set(self.DEFAULT_BACKCHANNEL_WORDS) and
self.command_words = set(self.DEFAULT_COMMAND_WORDS)) so that methods like
add_backchannel_word() mutate only the instance's sets rather than the shared
class defaults.

In `@examples/voice_agents/test_interruption_filter.py`:
- Around line 89-93: The test test_substantial_input_interrupts is passing for
the wrong reason because the phrase "I don't understand" contains "don't", which
is in DEFAULT_COMMAND_WORDS; update the test to use a phrase with substantial
content that contains no command words (e.g. "I'm confused" or "that's
confusing") so the assertion exercises the substantial-input detection path in
should_interrupt rather than matching DEFAULT_COMMAND_WORDS; modify the call(s)
to filter.should_interrupt in test_substantial_input_interrupts accordingly.

In `@livekit-agents/livekit/agents/voice/agent_activity.py`:
- Around line 88-99: The fallback that returns the last 3 words when last_words
is empty or full_len <= last_len should be changed to return an empty string
(not the last 3 words) to avoid misclassifying legitimate speech as backchannel;
update the branches that handle "if not last_words" and "if full_len <=
last_len" to return "" and add a brief docstring/comment in the transcript-delta
function (referencing last_words and full_words) explaining that an empty string
means "no new speech delta" and why this avoids false backchannels.
- Around line 60-108: The module-level mutable _last_processed_transcript causes
cross-session contamination; move this state into the AgentActivity instance by
adding self._last_processed_transcript: str = "" in AgentActivity.__init__ and
refactor the helpers so they operate on instance state (make
_get_transcript_delta and _update_last_transcript instance methods or accept an
AgentActivity/self parameter) while keeping _extract_words as a pure helper;
ensure calls update and read self._last_processed_transcript instead of the
removed global.
🧹 Nitpick comments (10)
examples/voice_agents/README.md (1)

30-32: Minor markdown indentation nit.

Static analysis reports nested list items at lines 31-32 use 4-space indentation instead of the expected 2-space (MD007). This is a cosmetic issue.

♻️ Suggested fix
 *   **Decide:**
-    *   If the words are **Backchannel** (e.g., "yeah", "ok"): **BLOCK THE PAUSE**. The agent keeps speaking smoothly.
-    *   If the words are **Commands** (e.g., "stop", "wait") or **Content**: **ALLOW THE PAUSE**. The agent stops immediately.
+  *   If the words are **Backchannel** (e.g., "yeah", "ok"): **BLOCK THE PAUSE**. The agent keeps speaking smoothly.
+  *   If the words are **Commands** (e.g., "stop", "wait") or **Content**: **ALLOW THE PAUSE**. The agent stops immediately.
examples/voice_agents/backchannel_patch.py (3)

80-104: Semantic confusion: _is_backchannel_only returns True for empty input.

Returning True for empty/whitespace text (lines 82-87) is counterintuitive for a function named "is backchannel only." While the calling code at line 109 handles this correctly, this design is confusing for maintenance.

Consider either:

  1. Renaming to _should_block_interruption to reflect actual semantics, or
  2. Returning False for empty input and handling the "empty delta" case explicitly in the caller.

41-46: Code duplication: _extract_words is duplicated across multiple files.

This function is identical to InterruptionFilter._extract_words in interruption_filter.py and also appears in agent_activity.py. Consider extracting to a shared utility module to maintain consistency.


41-41: Missing type annotation for return type.

Per coding guidelines (mypy strict mode), the return type should be list[str] instead of bare list.

-def _extract_words(text: str) -> list:
+def _extract_words(text: str) -> list[str]:
examples/voice_agents/interruption_filter.py (1)

10-10: Consider using built-in set[str] instead of Set[str] from typing.

Python 3.9+ supports lowercase generic types directly. For consistency with list[str] used on line 124, consider using set[str].

-from typing import Set
 ...
-    DEFAULT_BACKCHANNEL_WORDS: Set[str] = {...}
+    DEFAULT_BACKCHANNEL_WORDS: set[str] = {...}
examples/voice_agents/test_interruption_filter.py (1)

8-8: Relative import may fail depending on test execution directory.

from interruption_filter import InterruptionFilter assumes the test is run from examples/voice_agents/. Consider using a relative import or adjusting sys.path for robustness.

# Option 1: Relative import (if package structure allows)
from .interruption_filter import InterruptionFilter

# Option 2: Add path manipulation at top of file
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent))
from interruption_filter import InterruptionFilter
examples/voice_agents/basic_agent.py (2)

28-30: Redundant filtering: Both the monkeypatch and event handlers attempt to filter backchannels.

backchannel_patch.apply_patch() intercepts interruptions at the AgentActivity level, while _on_user_input_transcribed also filters using InterruptionFilter. This creates ambiguity about which layer is blocking and may lead to maintenance confusion.

Consider:

  1. Using only the monkeypatch approach (remove event handler filtering), or
  2. Using only the event handler approach (remove the patch), or
  3. Documenting clearly that this is a defense-in-depth strategy with the patch as primary and event handler as backup.

1-1: Unused import: asyncio is imported but not directly used.

If asyncio isn't needed, remove it to keep imports clean. If it's used indirectly by the framework, consider adding a comment explaining why.

livekit-agents/livekit/agents/voice/agent_activity.py (2)

63-69: Move import re to module level and add proper type hint.

Importing inside the function incurs overhead on every call. Also, the return type hint should be list[str] per coding guidelines.

♻️ Proposed fix

Add import re near line 3 with other imports, then:

-def _extract_words(text: str) -> list:
-    """Extract words from text, removing punctuation."""
-    import re
+def _extract_words(text: str) -> list[str]:
+    """Extract words from text, removing punctuation."""
     if not text:
         return []
     normalized = text.lower().strip()
     return [w for w in re.sub(r'[^\w\s-]', ' ', normalized).split() if w]

153-159: Consider using structured logging instead of f-strings with emojis.

The emoji prefixes (🛡️, ✅) may not render correctly in all log aggregation systems. Consider using structured log fields instead.

♻️ Suggested improvement
     if is_bc:
-        logger.info(
-            f"🛡️ [DELTA FILTER] Transcript delta '{delta}' is backchannel - blocking turn"
-        )
+        logger.info(
+            "Transcript delta is backchannel - blocking turn",
+            extra={"delta": delta, "filter_type": "backchannel"}
+        )
     else:
-        logger.debug(
-            f"✅ [DELTA FILTER] Transcript delta '{delta}' is NOT backchannel - allowing turn"
-        )
+        logger.debug(
+            "Transcript delta is NOT backchannel - allowing turn",
+            extra={"delta": delta, "filter_type": "backchannel"}
+        )
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 80f2e33 and 92a8faa.

📒 Files selected for processing (7)
  • examples/voice_agents/README.md
  • examples/voice_agents/backchannel_patch.py
  • examples/voice_agents/basic_agent.py
  • examples/voice_agents/filter_config.json
  • examples/voice_agents/interruption_filter.py
  • examples/voice_agents/test_interruption_filter.py
  • livekit-agents/livekit/agents/voice/agent_activity.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

  • livekit-agents/livekit/agents/voice/agent_activity.py
  • examples/voice_agents/test_interruption_filter.py
  • examples/voice_agents/backchannel_patch.py
  • examples/voice_agents/interruption_filter.py
  • examples/voice_agents/basic_agent.py
🧬 Code graph analysis (3)
examples/voice_agents/test_interruption_filter.py (1)
examples/voice_agents/interruption_filter.py (6)
  • InterruptionFilter (15-216)
  • should_interrupt (66-118)
  • add_backchannel_word (198-201)
  • add_command_word (203-206)
  • remove_backchannel_word (208-211)
  • is_backchannel_word (186-196)
examples/voice_agents/interruption_filter.py (2)
examples/voice_agents/backchannel_patch.py (1)
  • _extract_words (41-46)
livekit-agents/livekit/agents/voice/agent_activity.py (1)
  • _extract_words (63-69)
examples/voice_agents/basic_agent.py (2)
examples/voice_agents/interruption_filter.py (2)
  • InterruptionFilter (15-216)
  • should_interrupt (66-118)
examples/voice_agents/backchannel_patch.py (1)
  • apply_patch (219-223)
🪛 markdownlint-cli2 (0.18.1)
examples/voice_agents/README.md

31-31: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


32-32: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

🔇 Additional comments (11)
examples/voice_agents/README.md (1)

1-68: LGTM!

The documentation is well-structured and clearly explains the feature's purpose, implementation approach, configuration options, and testing scenarios. Good job providing actionable testing steps.

examples/voice_agents/interruption_filter.py (1)

66-118: LGTM!

The should_interrupt method has clear, well-documented logic with appropriate early returns. The decision flow (command check → agent state check → backchannel check) is intuitive.

examples/voice_agents/test_interruption_filter.py (1)

11-135: Good test coverage overall.

The test suite covers the main scenarios comprehensively: backchannel detection, command detection, mixed inputs, edge cases, and dynamic word management.

Consider adding tests for:

  1. Multi-word backchannel phrases like "got it" (currently broken per the issue in interruption_filter.py)
  2. Verifying that adding words to one filter instance doesn't affect others (to catch the mutable default bug)
examples/voice_agents/basic_agent.py (3)

37-127: The overall integration approach is sound.

The agent properly tracks speaking state via events and applies filtering logic. The separation between state tracking (_on_agent_state_changed) and transcript evaluation (_on_user_input_transcribed) is clean.


117-121: No action required—clear_user_turn() is a valid AgentSession method.

The method is properly defined in livekit/agents/voice/agent_session.py (line 961) and is implemented to delegate to the underlying activity manager after validating the session state. No risk of AttributeError.

Likely an incorrect or invalid review comment.


71-72: No action needed—the state comparison is correct.

The comparison new_state == "speaking" is valid. The LiveKit agent framework defines AgentState as Literal["initializing", "idle", "listening", "thinking", "speaking"] in livekit-agents/livekit/agents/voice/events.py, and "speaking" is a documented and tested state value used throughout the framework.

examples/voice_agents/filter_config.json (1)

2-9: Duplicate entry and questionable word in backchannel list.

  1. "mhmm" appears twice (line 5 and line 8) - remove the duplicate.
  2. "hey" (line 8) is typically an attention-getter or greeting, not a passive acknowledgment. Including it may cause the agent to ignore legitimate attempts to get its attention.
♻️ Suggested fix
     "backchannel_words": [
         "yeah", "yea", "yes", "yep", "yup",
         "ok", "okay", "alright", "aight",
         "hmm", "hm", "mhm", "mmhmm", "uh-huh", "uhuh", "uh", "huh",
         "right", "sure", "gotcha",
         "aha", "ah", "oh", "ooh",
-        "mm", "mhmm", "mmm", "hey"
+        "mm", "mmm"
     ],

Likely an incorrect or invalid review comment.

livekit-agents/livekit/agents/voice/agent_activity.py (4)

46-58: Consider internationalization for backchannel words.

The hardcoded word lists work for English but won't cover other languages. If multi-language support is needed in the future, consider making these configurable or loading from a language-specific resource.

For now, this is acceptable for English-only deployments.


110-137: LGTM - Backchannel detection logic is sound.

The function correctly:

  • Prioritizes command words (always interrupt)
  • Handles hyphenated words like "uh-huh"
  • Treats empty input as backchannel (don't interrupt)

1309-1315: Integration placement is correct.

The backchannel check is appropriately placed after the min_interruption_words check, ensuring both filters work together. The early return prevents the interruption when backchannel is detected.

Note: This code will need updates once the global state issue is resolved (see earlier comment).


1510-1525: Backchannel filter integration in on_end_of_turn is well-placed.

The check correctly:

  • Only applies when agent is actively speaking (_current_speech is not None)
  • Cancels preemptive generation when blocking
  • Returns False to indicate the turn was not processed

The observation about _update_last_transcript being skipped for blocked backchannel turns is intentional. Per the docstring on _get_transcript_delta ("This is crucial for detecting backchannel in accumulated transcripts"), blocked backchannel words accumulate in the delta and are re-evaluated on the next turn. This allows the system to filter repeated backchannel sounds that occur while the agent is speaking, only advancing the transcript state when non-backchannel words are detected.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +14 to +16
CONFIG_FILE = "filter_config.json"
BACKCHANNEL_WORDS = set()
COMMAND_WORDS = set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Relative config path will fail when script is run from a different working directory.

CONFIG_FILE = "filter_config.json" uses a relative path. If the agent is started from a directory other than examples/voice_agents/, the config file won't be found.

🐛 Suggested fix using `__file__` for path resolution
+import pathlib
+
-CONFIG_FILE = "filter_config.json"
+CONFIG_FILE = pathlib.Path(__file__).parent / "filter_config.json"

Then update load_config():

-        if os.path.exists(CONFIG_FILE):
-            with open(CONFIG_FILE, "r") as f:
+        if CONFIG_FILE.exists():
+            with open(CONFIG_FILE, "r") as f:
🤖 Prompt for AI Agents
In `@examples/voice_agents/backchannel_patch.py` around lines 14 - 16, CONFIG_FILE
is a relative path which will break when the script is executed from a different
working directory; change CONFIG_FILE to compute an absolute path based on this
module's location (using __file__) and update load_config() to open that
resolved path so the file is found regardless of cwd — locate the CONFIG_FILE
constant and the load_config function in backchannel_patch.py and replace the
relative path usage with a path join (module directory + "filter_config.json")
before loading BACKCHANNEL_WORDS and COMMAND_WORDS.

Comment on lines +38 to +39
# Global tracking for delta calculation
_last_processed_transcript = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Global transcript state is not thread-safe and will cause incorrect behavior with multiple concurrent sessions.

_last_processed_transcript is a module-level global. If multiple AgentActivity instances run concurrently (multiple rooms/sessions), they'll share and corrupt this state, leading to incorrect delta calculations.

Consider storing the last transcript as an instance attribute on AgentActivity (e.g., self._bc_last_transcript) within the patched methods instead of using a global.

🤖 Prompt for AI Agents
In `@examples/voice_agents/backchannel_patch.py` around lines 38 - 39, The global
_last_processed_transcript is unsafe for concurrent AgentActivity instances;
change state to an instance attribute by moving the variable into AgentActivity
(e.g., self._bc_last_transcript) and update any patched methods that read/write
_last_processed_transcript to use self._bc_last_transcript instead so each
AgentActivity keeps its own last-transcript state; ensure initialization (in
__init__ or when the patch is applied) and all references in the patched methods
are updated accordingly.

Comment on lines +122 to +172
# Store original methods to call them if needed, or we just reimplement logic
# Reimplementing logic is safer to inject checks in middle

def patched_interrupt_by_audio_activity(self) -> None:
# Copied logic from AgentActivity._interrupt_by_audio_activity
# with injected checks
opt = self._session.options
use_pause = opt.resume_false_interruption and opt.false_interruption_timeout is not None

if hasattr(self.llm, "capabilities") and self.llm.capabilities.turn_detection:
return

if (
self.stt is not None
and opt.min_interruption_words > 0
and self._audio_recognition is not None
):
text = self._audio_recognition.current_transcript

# Original count check
if len(split_words(text, split_character=True)) < opt.min_interruption_words:
return

# --- PATCH START ---
if _check_backchannel_delta(text):
logger.debug(f"🚫 [PATCH FILTER] Blocking AUDIO interruption - backchannel delta")
return
# --- PATCH END ---

if self._rt_session is not None:
self._rt_session.start_user_activity()

if (
self._current_speech is not None
and not self._current_speech.interrupted
and self._current_speech.allow_interruptions
):
self._paused_speech = self._current_speech

if self._false_interruption_timer:
self._false_interruption_timer.cancel()
self._false_interruption_timer = None

if use_pause and self._session.output.audio and self._session.output.audio.can_pause:
# THIS IS THE CRITICAL LINE THAT WAS PAUSING
self._session.output.audio.pause()
self._session._update_agent_state("listening")
else:
if self._rt_session is not None:
self._rt_session.interrupt()
self._current_speech.interrupt()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check livekit-agents version dependency
find . -type f \( -name "pyproject.toml" -o -name "setup.py" -o -name "requirements*.txt" \) -exec grep -H "livekit" {} \; 2>/dev/null | head -20

Repository: livekit/agents

Length of output: 172


🏁 Script executed:

#!/bin/bash
# Examine the backchannel_patch.py file to understand the implementation
wc -l examples/voice_agents/backchannel_patch.py 2>/dev/null || echo "File not found, searching..."
fd -type f "backchannel_patch.py"

Repository: livekit/agents

Length of output: 292


🏁 Script executed:

#!/bin/bash
# Check the structure and actual implementation of the patched methods
head -50 examples/voice_agents/backchannel_patch.py

Repository: livekit/agents

Length of output: 1888


🏁 Script executed:

#!/bin/bash
# Read the specific lines mentioned in the review (122-172)
sed -n '122,172p' examples/voice_agents/backchannel_patch.py

Repository: livekit/agents

Length of output: 1985


🏁 Script executed:

#!/bin/bash
# Read more to understand the patching mechanism
sed -n '173,223p' examples/voice_agents/backchannel_patch.py

Repository: livekit/agents

Length of output: 1859


🏁 Script executed:

#!/bin/bash
# Find all pyproject.toml and requirements files in the repo
find . -name "pyproject.toml" -o -name "setup.py" -o -name "requirements*.txt" 2>/dev/null | head -10

Repository: livekit/agents

Length of output: 540


🏁 Script executed:

#!/bin/bash
# Check livekit-agents version requirement
cat ./livekit-agents/pyproject.toml | grep -A 20 "^\[project\]" | head -30

Repository: livekit/agents

Length of output: 933


🏁 Script executed:

#!/bin/bash
# Check examples voice_agents requirements
cat ./examples/voice_agents/requirements.txt

Repository: livekit/agents

Length of output: 187


🏁 Script executed:

#!/bin/bash
# Search for any version checks or guards in the backchannel_patch.py file
grep -n -i "version\|assert\|check.*version" examples/voice_agents/backchannel_patch.py

Repository: livekit/agents

Length of output: 40


🏁 Script executed:

#!/bin/bash
# Check if there's any README or documentation about version compatibility
find ./examples/voice_agents -name "README*" -o -name "*.md" | xargs cat 2>/dev/null | head -100

Repository: livekit/agents

Length of output: 3822


Fragile monkeypatch: Reimplemented methods create tight coupling to upstream AgentActivity implementation.

This code copies internal logic from AgentActivity._interrupt_by_audio_activity and AgentActivity.on_end_of_turn and injects checks via monkey-patching. Any upstream changes to these methods (renamed attributes, logic changes, new parameters) will silently break this patch. The version constraint in requirements.txt is loose (livekit-agents>=1.0), providing no protection against breaking changes.

Recommendations:

  1. Add a version assertion or check to fail fast if livekit-agents version changes incompatibly.
  2. Document the specific livekit-agents version this patch targets.
  3. If this feature proves valuable, consider contributing a proper extension point upstream rather than maintaining a monkeypatch.
🤖 Prompt for AI Agents
In `@examples/voice_agents/backchannel_patch.py` around lines 122 - 172, This
monkeypatch reimplements AgentActivity internals
(patched_interrupt_by_audio_activity referencing
AgentActivity._interrupt_by_audio_activity and on_end_of_turn behavior) and is
fragile to upstream changes; add a fail-fast version check at module import that
reads livekit_agents.__version__ and asserts it matches the exact version (or
acceptable semver range) this patch targets, raise a clear RuntimeError with
guidance if the check fails, add a single-line comment documenting the specific
livekit-agents version the monkeypatch was written for, and tighten
requirements.txt (pin or add an upper bound for livekit-agents) so CI will catch
incompatibilities early; consider adding a TODO linking to upstream contribution
as future work.

Comment on lines +53 to +55
# Queue for pending interruption checks
self.pending_interruption_check = None
self.interruption_check_delay = 0.15 # 150ms buffer for STT to produce transcript
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unused variables: pending_interruption_check and interruption_check_delay are defined but never used.

These appear to be remnants of planned functionality that wasn't implemented.

-        # Queue for pending interruption checks
-        self.pending_interruption_check = None
-        self.interruption_check_delay = 0.15  # 150ms buffer for STT to produce transcript
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Queue for pending interruption checks
self.pending_interruption_check = None
self.interruption_check_delay = 0.15 # 150ms buffer for STT to produce transcript
🤖 Prompt for AI Agents
In `@examples/voice_agents/basic_agent.py` around lines 53 - 55, Remove the unused
remnants by deleting the attributes self.pending_interruption_check and
self.interruption_check_delay from the class in
examples/voice_agents/basic_agent.py (or, if intended, implement their use by
wiring interruption checks into the STT/transcript flow); specifically remove
the assignments to pending_interruption_check and interruption_check_delay (or
replace with a documented, implemented interruption scheduling mechanism
referenced by those names) so there are no unused variables left in the agent
class.

"yeah", "yea", "yes", "yep", "yup",
"ok", "okay", "alright", "aight",
"hmm", "hm", "mhm", "mmhmm", "uh-huh", "uhuh",
"right", "sure", "gotcha", "got it",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Multi-word backchannel phrase "got it" will never match.

_is_pure_backchannel (lines 154-184) checks each word individually against backchannel_words. Since "got it" is tokenized into ["got", "it"], and neither "got" nor "it" is individually in the set, this phrase won't be recognized as backchannel.

Either:

  1. Remove multi-word phrases from DEFAULT_BACKCHANNEL_WORDS, or
  2. Add similar phrase-matching logic as _contains_command uses for multi-word commands.
🤖 Prompt for AI Agents
In `@examples/voice_agents/interruption_filter.py` at line 29,
DEFAULT_BACKCHANNEL_WORDS contains multi-word entries like "got it" but
_is_pure_backchannel currently tokenizes input and checks individual tokens
against that set, so multi-word phrases never match; update _is_pure_backchannel
to detect multi-word backchannel phrases the same way _contains_command handles
multi-word commands (e.g., check contiguous token sequences or join tokens and
search for phrase matches), or alternatively remove multi-word entries from
DEFAULT_BACKCHANNEL_WORDS; modify the logic in _is_pure_backchannel to first
test multi-word phrases from DEFAULT_BACKCHANNEL_WORDS against the token
sequence before falling back to per-token membership checks.

Comment on lines +57 to +58
self.backchannel_words = backchannel_words or self.DEFAULT_BACKCHANNEL_WORDS
self.command_words = command_words or self.DEFAULT_COMMAND_WORDS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Mutable default sets are shared across instances, causing unintended side effects.

When backchannel_words=None, line 57 assigns self.backchannel_words = self.DEFAULT_BACKCHANNEL_WORDS, which references the class-level set directly. Calling add_backchannel_word() on any instance will mutate the class default, affecting all other instances.

🐛 Recommended fix: Create copies of defaults
-        self.backchannel_words = backchannel_words or self.DEFAULT_BACKCHANNEL_WORDS
-        self.command_words = command_words or self.DEFAULT_COMMAND_WORDS
+        self.backchannel_words = set(backchannel_words) if backchannel_words else set(self.DEFAULT_BACKCHANNEL_WORDS)
+        self.command_words = set(command_words) if command_words else set(self.DEFAULT_COMMAND_WORDS)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self.backchannel_words = backchannel_words or self.DEFAULT_BACKCHANNEL_WORDS
self.command_words = command_words or self.DEFAULT_COMMAND_WORDS
self.backchannel_words = set(backchannel_words) if backchannel_words else set(self.DEFAULT_BACKCHANNEL_WORDS)
self.command_words = set(command_words) if command_words else set(self.DEFAULT_COMMAND_WORDS)
🤖 Prompt for AI Agents
In `@examples/voice_agents/interruption_filter.py` around lines 57 - 58, The
instance assignments for backchannel_words and command_words currently reference
the class-level sets (DEFAULT_BACKCHANNEL_WORDS, DEFAULT_COMMAND_WORDS), causing
shared mutable state; change the constructor to copy those defaults when None is
passed (e.g., set self.backchannel_words = set(self.DEFAULT_BACKCHANNEL_WORDS)
and self.command_words = set(self.DEFAULT_COMMAND_WORDS)) so that methods like
add_backchannel_word() mutate only the instance's sets rather than the shared
class defaults.

Comment on lines +89 to +93
def test_substantial_input_interrupts(self):
"""Substantial non-backchannel input should interrupt."""
assert self.filter.should_interrupt("tell me about that", agent_is_speaking=True) == True
assert self.filter.should_interrupt("can you explain", agent_is_speaking=True) == True
assert self.filter.should_interrupt("I don't understand", agent_is_speaking=True) == True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test passes for the wrong reason: "don't" is a command word.

"I don't understand" triggers interruption because "don't" is in DEFAULT_COMMAND_WORDS, not because of substantial content detection. This test doesn't actually verify the "substantial input" path.

Replace with a phrase that contains no command words:

-        assert self.filter.should_interrupt("I don't understand", agent_is_speaking=True) == True
+        assert self.filter.should_interrupt("what do you mean by that", agent_is_speaking=True) == True
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_substantial_input_interrupts(self):
"""Substantial non-backchannel input should interrupt."""
assert self.filter.should_interrupt("tell me about that", agent_is_speaking=True) == True
assert self.filter.should_interrupt("can you explain", agent_is_speaking=True) == True
assert self.filter.should_interrupt("I don't understand", agent_is_speaking=True) == True
def test_substantial_input_interrupts(self):
"""Substantial non-backchannel input should interrupt."""
assert self.filter.should_interrupt("tell me about that", agent_is_speaking=True) == True
assert self.filter.should_interrupt("can you explain", agent_is_speaking=True) == True
assert self.filter.should_interrupt("what do you mean by that", agent_is_speaking=True) == True
🤖 Prompt for AI Agents
In `@examples/voice_agents/test_interruption_filter.py` around lines 89 - 93, The
test test_substantial_input_interrupts is passing for the wrong reason because
the phrase "I don't understand" contains "don't", which is in
DEFAULT_COMMAND_WORDS; update the test to use a phrase with substantial content
that contains no command words (e.g. "I'm confused" or "that's confusing") so
the assertion exercises the substantial-input detection path in should_interrupt
rather than matching DEFAULT_COMMAND_WORDS; modify the call(s) to
filter.should_interrupt in test_substantial_input_interrupts accordingly.

Comment on lines +60 to +108
# Track the last processed transcript to extract delta
_last_processed_transcript: str = ""

def _extract_words(text: str) -> list:
"""Extract words from text, removing punctuation."""
import re
if not text:
return []
normalized = text.lower().strip()
return [w for w in re.sub(r'[^\w\s-]', ' ', normalized).split() if w]

def _get_transcript_delta(full_transcript: str) -> str:
"""
Get only the NEW portion of the transcript since the last processed turn.
This is crucial for detecting backchannel in accumulated transcripts.
"""
global _last_processed_transcript

if not full_transcript:
return ""

# Normalize both transcripts
full_words = _extract_words(full_transcript)
last_words = _extract_words(_last_processed_transcript)

if not full_words:
return ""

if not last_words:
# No previous transcript, return last few words (typical backchannel length)
return " ".join(full_words[-3:])

# Find where the new words start
# The new words are whatever comes AFTER the last processed words
last_len = len(last_words)
full_len = len(full_words)

if full_len <= last_len:
# Nothing new or same length, return last few words
return " ".join(full_words[-3:])

# Extract delta (new words only)
delta_words = full_words[last_len:]
return " ".join(delta_words)

def _update_last_transcript(transcript: str) -> None:
"""Update the last processed transcript."""
global _last_processed_transcript
_last_processed_transcript = transcript
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Global mutable state causes cross-session contamination.

_last_processed_transcript is module-level global state shared across ALL AgentActivity instances. In multi-agent or multi-session deployments, this will cause:

  1. Data races: Concurrent sessions modifying the same variable without synchronization
  2. Logic errors: Session A's transcript affecting Session B's delta calculations

The state must be moved to instance-level within AgentActivity.

🔧 Suggested fix: Move state to AgentActivity instance

Remove the global variable and add instance state:

-# Track the last processed transcript to extract delta
-_last_processed_transcript: str = ""

In the AgentActivity.__init__ method, add:

self._last_processed_transcript: str = ""

Then update the helper functions to accept the instance or refactor them as methods:

-def _get_transcript_delta(full_transcript: str) -> str:
-    global _last_processed_transcript
+def _get_transcript_delta(full_transcript: str, last_processed: str) -> str:
     if not full_transcript:
         return ""
     
     full_words = _extract_words(full_transcript)
-    last_words = _extract_words(_last_processed_transcript)
+    last_words = _extract_words(last_processed)
     # ... rest of logic
🤖 Prompt for AI Agents
In `@livekit-agents/livekit/agents/voice/agent_activity.py` around lines 60 - 108,
The module-level mutable _last_processed_transcript causes cross-session
contamination; move this state into the AgentActivity instance by adding
self._last_processed_transcript: str = "" in AgentActivity.__init__ and refactor
the helpers so they operate on instance state (make _get_transcript_delta and
_update_last_transcript instance methods or accept an AgentActivity/self
parameter) while keeping _extract_words as a pure helper; ensure calls update
and read self._last_processed_transcript instead of the removed global.

Comment on lines +88 to +99
if not last_words:
# No previous transcript, return last few words (typical backchannel length)
return " ".join(full_words[-3:])

# Find where the new words start
# The new words are whatever comes AFTER the last processed words
last_len = len(last_words)
full_len = len(full_words)

if full_len <= last_len:
# Nothing new or same length, return last few words
return " ".join(full_words[-3:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Unclear fallback logic when returning last 3 words.

When there's no previous transcript or when the transcript hasn't grown, the function returns the last 3 words instead of the actual delta. This could incorrectly classify legitimate speech as backchannel if the last 3 words happen to be backchannel words.

Consider returning an empty string or the full transcript in these edge cases, with explicit documentation of the expected behavior.

🤖 Prompt for AI Agents
In `@livekit-agents/livekit/agents/voice/agent_activity.py` around lines 88 - 99,
The fallback that returns the last 3 words when last_words is empty or full_len
<= last_len should be changed to return an empty string (not the last 3 words)
to avoid misclassifying legitimate speech as backchannel; update the branches
that handle "if not last_words" and "if full_len <= last_len" to return "" and
add a brief docstring/comment in the transcript-delta function (referencing
last_words and full_words) explaining that an empty string means "no new speech
delta" and why this avoids false backchannels.

@SaHiL-Ez SaHiL-Ez closed this Jan 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants