-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Feature/interrupt handler sahil #4550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
SaHiL-Ez seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
📝 WalkthroughWalkthroughThis PR implements intelligent interruption handling for LiveKit Agent by detecting and filtering "backchannel" words (acknowledgments like "uh-huh," "yeah") to prevent false interruptions during agent speech. It adds a new InterruptionFilter class, patches core agent activity logic, integrates filtering into a voice agent example, and includes comprehensive tests and configuration. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Agent
participant VoiceAgent as Voice Agent (basic_agent)
participant TranscriptHandler as Transcript Handler
participant InterruptionFilter
participant AgentActivity
User->>Agent: Speech input (interim transcript)
Agent->>VoiceAgent: user_input_transcribed event
VoiceAgent->>TranscriptHandler: _on_user_input_transcribed
TranscriptHandler->>InterruptionFilter: should_interrupt(transcript, agent_is_speaking=true)
InterruptionFilter->>InterruptionFilter: normalize & tokenize
InterruptionFilter->>InterruptionFilter: check for command words
alt command word detected
InterruptionFilter-->>TranscriptHandler: return true
TranscriptHandler->>Agent: allow interruption (log)
else pure backchannel
InterruptionFilter-->>TranscriptHandler: return false
TranscriptHandler->>Agent: suppress interruption (log)
else substantial content
InterruptionFilter-->>TranscriptHandler: return true
TranscriptHandler->>Agent: allow interruption
end
User->>Agent: Speech input (final transcript)
Agent->>AgentActivity: on_end_of_turn event
AgentActivity->>AgentActivity: _check_backchannel_delta
AgentActivity->>AgentActivity: compute delta vs last transcript
alt delta is backchannel-only
AgentActivity->>AgentActivity: block end-of-turn
AgentActivity-->>Agent: clear user turn (if agent speaking)
else delta is substantial
AgentActivity->>AgentActivity: proceed with turn completion
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes The PR introduces heterogeneous changes across multiple modules with distinct responsibilities: core agent activity patching with delta-extraction logic, a new InterruptionFilter class with word-set management, runtime method patching, event handler integration, configuration parsing, and comprehensive test coverage. Logic density is moderate-to-high with multiple decision points and state tracking. The spread across both library code (agent_activity.py) and example code requires separate reasoning for integration correctness. Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🤖 Fix all issues with AI agents
In `@examples/voice_agents/backchannel_patch.py`:
- Around line 38-39: The global _last_processed_transcript is unsafe for
concurrent AgentActivity instances; change state to an instance attribute by
moving the variable into AgentActivity (e.g., self._bc_last_transcript) and
update any patched methods that read/write _last_processed_transcript to use
self._bc_last_transcript instead so each AgentActivity keeps its own
last-transcript state; ensure initialization (in __init__ or when the patch is
applied) and all references in the patched methods are updated accordingly.
- Around line 122-172: This monkeypatch reimplements AgentActivity internals
(patched_interrupt_by_audio_activity referencing
AgentActivity._interrupt_by_audio_activity and on_end_of_turn behavior) and is
fragile to upstream changes; add a fail-fast version check at module import that
reads livekit_agents.__version__ and asserts it matches the exact version (or
acceptable semver range) this patch targets, raise a clear RuntimeError with
guidance if the check fails, add a single-line comment documenting the specific
livekit-agents version the monkeypatch was written for, and tighten
requirements.txt (pin or add an upper bound for livekit-agents) so CI will catch
incompatibilities early; consider adding a TODO linking to upstream contribution
as future work.
- Around line 14-16: CONFIG_FILE is a relative path which will break when the
script is executed from a different working directory; change CONFIG_FILE to
compute an absolute path based on this module's location (using __file__) and
update load_config() to open that resolved path so the file is found regardless
of cwd — locate the CONFIG_FILE constant and the load_config function in
backchannel_patch.py and replace the relative path usage with a path join
(module directory + "filter_config.json") before loading BACKCHANNEL_WORDS and
COMMAND_WORDS.
In `@examples/voice_agents/basic_agent.py`:
- Around line 53-55: Remove the unused remnants by deleting the attributes
self.pending_interruption_check and self.interruption_check_delay from the class
in examples/voice_agents/basic_agent.py (or, if intended, implement their use by
wiring interruption checks into the STT/transcript flow); specifically remove
the assignments to pending_interruption_check and interruption_check_delay (or
replace with a documented, implemented interruption scheduling mechanism
referenced by those names) so there are no unused variables left in the agent
class.
In `@examples/voice_agents/interruption_filter.py`:
- Line 29: DEFAULT_BACKCHANNEL_WORDS contains multi-word entries like "got it"
but _is_pure_backchannel currently tokenizes input and checks individual tokens
against that set, so multi-word phrases never match; update _is_pure_backchannel
to detect multi-word backchannel phrases the same way _contains_command handles
multi-word commands (e.g., check contiguous token sequences or join tokens and
search for phrase matches), or alternatively remove multi-word entries from
DEFAULT_BACKCHANNEL_WORDS; modify the logic in _is_pure_backchannel to first
test multi-word phrases from DEFAULT_BACKCHANNEL_WORDS against the token
sequence before falling back to per-token membership checks.
- Around line 57-58: The instance assignments for backchannel_words and
command_words currently reference the class-level sets
(DEFAULT_BACKCHANNEL_WORDS, DEFAULT_COMMAND_WORDS), causing shared mutable
state; change the constructor to copy those defaults when None is passed (e.g.,
set self.backchannel_words = set(self.DEFAULT_BACKCHANNEL_WORDS) and
self.command_words = set(self.DEFAULT_COMMAND_WORDS)) so that methods like
add_backchannel_word() mutate only the instance's sets rather than the shared
class defaults.
In `@examples/voice_agents/test_interruption_filter.py`:
- Around line 89-93: The test test_substantial_input_interrupts is passing for
the wrong reason because the phrase "I don't understand" contains "don't", which
is in DEFAULT_COMMAND_WORDS; update the test to use a phrase with substantial
content that contains no command words (e.g. "I'm confused" or "that's
confusing") so the assertion exercises the substantial-input detection path in
should_interrupt rather than matching DEFAULT_COMMAND_WORDS; modify the call(s)
to filter.should_interrupt in test_substantial_input_interrupts accordingly.
In `@livekit-agents/livekit/agents/voice/agent_activity.py`:
- Around line 88-99: The fallback that returns the last 3 words when last_words
is empty or full_len <= last_len should be changed to return an empty string
(not the last 3 words) to avoid misclassifying legitimate speech as backchannel;
update the branches that handle "if not last_words" and "if full_len <=
last_len" to return "" and add a brief docstring/comment in the transcript-delta
function (referencing last_words and full_words) explaining that an empty string
means "no new speech delta" and why this avoids false backchannels.
- Around line 60-108: The module-level mutable _last_processed_transcript causes
cross-session contamination; move this state into the AgentActivity instance by
adding self._last_processed_transcript: str = "" in AgentActivity.__init__ and
refactor the helpers so they operate on instance state (make
_get_transcript_delta and _update_last_transcript instance methods or accept an
AgentActivity/self parameter) while keeping _extract_words as a pure helper;
ensure calls update and read self._last_processed_transcript instead of the
removed global.
🧹 Nitpick comments (10)
examples/voice_agents/README.md (1)
30-32: Minor markdown indentation nit.Static analysis reports nested list items at lines 31-32 use 4-space indentation instead of the expected 2-space (MD007). This is a cosmetic issue.
♻️ Suggested fix
* **Decide:** - * If the words are **Backchannel** (e.g., "yeah", "ok"): **BLOCK THE PAUSE**. The agent keeps speaking smoothly. - * If the words are **Commands** (e.g., "stop", "wait") or **Content**: **ALLOW THE PAUSE**. The agent stops immediately. + * If the words are **Backchannel** (e.g., "yeah", "ok"): **BLOCK THE PAUSE**. The agent keeps speaking smoothly. + * If the words are **Commands** (e.g., "stop", "wait") or **Content**: **ALLOW THE PAUSE**. The agent stops immediately.examples/voice_agents/backchannel_patch.py (3)
80-104: Semantic confusion:_is_backchannel_onlyreturnsTruefor empty input.Returning
Truefor empty/whitespace text (lines 82-87) is counterintuitive for a function named "is backchannel only." While the calling code at line 109 handles this correctly, this design is confusing for maintenance.Consider either:
- Renaming to
_should_block_interruptionto reflect actual semantics, or- Returning
Falsefor empty input and handling the "empty delta" case explicitly in the caller.
41-46: Code duplication:_extract_wordsis duplicated across multiple files.This function is identical to
InterruptionFilter._extract_wordsininterruption_filter.pyand also appears inagent_activity.py. Consider extracting to a shared utility module to maintain consistency.
41-41: Missing type annotation for return type.Per coding guidelines (mypy strict mode), the return type should be
list[str]instead of barelist.-def _extract_words(text: str) -> list: +def _extract_words(text: str) -> list[str]:examples/voice_agents/interruption_filter.py (1)
10-10: Consider using built-inset[str]instead ofSet[str]from typing.Python 3.9+ supports lowercase generic types directly. For consistency with
list[str]used on line 124, consider usingset[str].-from typing import Set ... - DEFAULT_BACKCHANNEL_WORDS: Set[str] = {...} + DEFAULT_BACKCHANNEL_WORDS: set[str] = {...}examples/voice_agents/test_interruption_filter.py (1)
8-8: Relative import may fail depending on test execution directory.
from interruption_filter import InterruptionFilterassumes the test is run fromexamples/voice_agents/. Consider using a relative import or adjustingsys.pathfor robustness.# Option 1: Relative import (if package structure allows) from .interruption_filter import InterruptionFilter # Option 2: Add path manipulation at top of file import sys from pathlib import Path sys.path.insert(0, str(Path(__file__).parent)) from interruption_filter import InterruptionFilterexamples/voice_agents/basic_agent.py (2)
28-30: Redundant filtering: Both the monkeypatch and event handlers attempt to filter backchannels.
backchannel_patch.apply_patch()intercepts interruptions at theAgentActivitylevel, while_on_user_input_transcribedalso filters usingInterruptionFilter. This creates ambiguity about which layer is blocking and may lead to maintenance confusion.Consider:
- Using only the monkeypatch approach (remove event handler filtering), or
- Using only the event handler approach (remove the patch), or
- Documenting clearly that this is a defense-in-depth strategy with the patch as primary and event handler as backup.
1-1: Unused import:asynciois imported but not directly used.If
asyncioisn't needed, remove it to keep imports clean. If it's used indirectly by the framework, consider adding a comment explaining why.livekit-agents/livekit/agents/voice/agent_activity.py (2)
63-69: Moveimport reto module level and add proper type hint.Importing inside the function incurs overhead on every call. Also, the return type hint should be
list[str]per coding guidelines.♻️ Proposed fix
Add
import renear line 3 with other imports, then:-def _extract_words(text: str) -> list: - """Extract words from text, removing punctuation.""" - import re +def _extract_words(text: str) -> list[str]: + """Extract words from text, removing punctuation.""" if not text: return [] normalized = text.lower().strip() return [w for w in re.sub(r'[^\w\s-]', ' ', normalized).split() if w]
153-159: Consider using structured logging instead of f-strings with emojis.The emoji prefixes (🛡️, ✅) may not render correctly in all log aggregation systems. Consider using structured log fields instead.
♻️ Suggested improvement
if is_bc: - logger.info( - f"🛡️ [DELTA FILTER] Transcript delta '{delta}' is backchannel - blocking turn" - ) + logger.info( + "Transcript delta is backchannel - blocking turn", + extra={"delta": delta, "filter_type": "backchannel"} + ) else: - logger.debug( - f"✅ [DELTA FILTER] Transcript delta '{delta}' is NOT backchannel - allowing turn" - ) + logger.debug( + "Transcript delta is NOT backchannel - allowing turn", + extra={"delta": delta, "filter_type": "backchannel"} + )
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
examples/voice_agents/README.mdexamples/voice_agents/backchannel_patch.pyexamples/voice_agents/basic_agent.pyexamples/voice_agents/filter_config.jsonexamples/voice_agents/interruption_filter.pyexamples/voice_agents/test_interruption_filter.pylivekit-agents/livekit/agents/voice/agent_activity.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-agents/livekit/agents/voice/agent_activity.pyexamples/voice_agents/test_interruption_filter.pyexamples/voice_agents/backchannel_patch.pyexamples/voice_agents/interruption_filter.pyexamples/voice_agents/basic_agent.py
🧬 Code graph analysis (3)
examples/voice_agents/test_interruption_filter.py (1)
examples/voice_agents/interruption_filter.py (6)
InterruptionFilter(15-216)should_interrupt(66-118)add_backchannel_word(198-201)add_command_word(203-206)remove_backchannel_word(208-211)is_backchannel_word(186-196)
examples/voice_agents/interruption_filter.py (2)
examples/voice_agents/backchannel_patch.py (1)
_extract_words(41-46)livekit-agents/livekit/agents/voice/agent_activity.py (1)
_extract_words(63-69)
examples/voice_agents/basic_agent.py (2)
examples/voice_agents/interruption_filter.py (2)
InterruptionFilter(15-216)should_interrupt(66-118)examples/voice_agents/backchannel_patch.py (1)
apply_patch(219-223)
🪛 markdownlint-cli2 (0.18.1)
examples/voice_agents/README.md
31-31: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
32-32: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
🔇 Additional comments (11)
examples/voice_agents/README.md (1)
1-68: LGTM!The documentation is well-structured and clearly explains the feature's purpose, implementation approach, configuration options, and testing scenarios. Good job providing actionable testing steps.
examples/voice_agents/interruption_filter.py (1)
66-118: LGTM!The
should_interruptmethod has clear, well-documented logic with appropriate early returns. The decision flow (command check → agent state check → backchannel check) is intuitive.examples/voice_agents/test_interruption_filter.py (1)
11-135: Good test coverage overall.The test suite covers the main scenarios comprehensively: backchannel detection, command detection, mixed inputs, edge cases, and dynamic word management.
Consider adding tests for:
- Multi-word backchannel phrases like "got it" (currently broken per the issue in
interruption_filter.py)- Verifying that adding words to one filter instance doesn't affect others (to catch the mutable default bug)
examples/voice_agents/basic_agent.py (3)
37-127: The overall integration approach is sound.The agent properly tracks speaking state via events and applies filtering logic. The separation between state tracking (
_on_agent_state_changed) and transcript evaluation (_on_user_input_transcribed) is clean.
117-121: No action required—clear_user_turn()is a validAgentSessionmethod.The method is properly defined in
livekit/agents/voice/agent_session.py(line 961) and is implemented to delegate to the underlying activity manager after validating the session state. No risk ofAttributeError.Likely an incorrect or invalid review comment.
71-72: No action needed—the state comparison is correct.The comparison
new_state == "speaking"is valid. The LiveKit agent framework definesAgentStateasLiteral["initializing", "idle", "listening", "thinking", "speaking"]inlivekit-agents/livekit/agents/voice/events.py, and "speaking" is a documented and tested state value used throughout the framework.examples/voice_agents/filter_config.json (1)
2-9: Duplicate entry and questionable word in backchannel list.
"mhmm"appears twice (line 5 and line 8) - remove the duplicate."hey"(line 8) is typically an attention-getter or greeting, not a passive acknowledgment. Including it may cause the agent to ignore legitimate attempts to get its attention.♻️ Suggested fix
"backchannel_words": [ "yeah", "yea", "yes", "yep", "yup", "ok", "okay", "alright", "aight", "hmm", "hm", "mhm", "mmhmm", "uh-huh", "uhuh", "uh", "huh", "right", "sure", "gotcha", "aha", "ah", "oh", "ooh", - "mm", "mhmm", "mmm", "hey" + "mm", "mmm" ],Likely an incorrect or invalid review comment.
livekit-agents/livekit/agents/voice/agent_activity.py (4)
46-58: Consider internationalization for backchannel words.The hardcoded word lists work for English but won't cover other languages. If multi-language support is needed in the future, consider making these configurable or loading from a language-specific resource.
For now, this is acceptable for English-only deployments.
110-137: LGTM - Backchannel detection logic is sound.The function correctly:
- Prioritizes command words (always interrupt)
- Handles hyphenated words like "uh-huh"
- Treats empty input as backchannel (don't interrupt)
1309-1315: Integration placement is correct.The backchannel check is appropriately placed after the
min_interruption_wordscheck, ensuring both filters work together. The early return prevents the interruption when backchannel is detected.Note: This code will need updates once the global state issue is resolved (see earlier comment).
1510-1525: Backchannel filter integration inon_end_of_turnis well-placed.The check correctly:
- Only applies when agent is actively speaking (
_current_speech is not None)- Cancels preemptive generation when blocking
- Returns
Falseto indicate the turn was not processedThe observation about
_update_last_transcriptbeing skipped for blocked backchannel turns is intentional. Per the docstring on_get_transcript_delta("This is crucial for detecting backchannel in accumulated transcripts"), blocked backchannel words accumulate in the delta and are re-evaluated on the next turn. This allows the system to filter repeated backchannel sounds that occur while the agent is speaking, only advancing the transcript state when non-backchannel words are detected.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| CONFIG_FILE = "filter_config.json" | ||
| BACKCHANNEL_WORDS = set() | ||
| COMMAND_WORDS = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relative config path will fail when script is run from a different working directory.
CONFIG_FILE = "filter_config.json" uses a relative path. If the agent is started from a directory other than examples/voice_agents/, the config file won't be found.
🐛 Suggested fix using `__file__` for path resolution
+import pathlib
+
-CONFIG_FILE = "filter_config.json"
+CONFIG_FILE = pathlib.Path(__file__).parent / "filter_config.json"Then update load_config():
- if os.path.exists(CONFIG_FILE):
- with open(CONFIG_FILE, "r") as f:
+ if CONFIG_FILE.exists():
+ with open(CONFIG_FILE, "r") as f:🤖 Prompt for AI Agents
In `@examples/voice_agents/backchannel_patch.py` around lines 14 - 16, CONFIG_FILE
is a relative path which will break when the script is executed from a different
working directory; change CONFIG_FILE to compute an absolute path based on this
module's location (using __file__) and update load_config() to open that
resolved path so the file is found regardless of cwd — locate the CONFIG_FILE
constant and the load_config function in backchannel_patch.py and replace the
relative path usage with a path join (module directory + "filter_config.json")
before loading BACKCHANNEL_WORDS and COMMAND_WORDS.
| # Global tracking for delta calculation | ||
| _last_processed_transcript = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global transcript state is not thread-safe and will cause incorrect behavior with multiple concurrent sessions.
_last_processed_transcript is a module-level global. If multiple AgentActivity instances run concurrently (multiple rooms/sessions), they'll share and corrupt this state, leading to incorrect delta calculations.
Consider storing the last transcript as an instance attribute on AgentActivity (e.g., self._bc_last_transcript) within the patched methods instead of using a global.
🤖 Prompt for AI Agents
In `@examples/voice_agents/backchannel_patch.py` around lines 38 - 39, The global
_last_processed_transcript is unsafe for concurrent AgentActivity instances;
change state to an instance attribute by moving the variable into AgentActivity
(e.g., self._bc_last_transcript) and update any patched methods that read/write
_last_processed_transcript to use self._bc_last_transcript instead so each
AgentActivity keeps its own last-transcript state; ensure initialization (in
__init__ or when the patch is applied) and all references in the patched methods
are updated accordingly.
| # Store original methods to call them if needed, or we just reimplement logic | ||
| # Reimplementing logic is safer to inject checks in middle | ||
|
|
||
| def patched_interrupt_by_audio_activity(self) -> None: | ||
| # Copied logic from AgentActivity._interrupt_by_audio_activity | ||
| # with injected checks | ||
| opt = self._session.options | ||
| use_pause = opt.resume_false_interruption and opt.false_interruption_timeout is not None | ||
|
|
||
| if hasattr(self.llm, "capabilities") and self.llm.capabilities.turn_detection: | ||
| return | ||
|
|
||
| if ( | ||
| self.stt is not None | ||
| and opt.min_interruption_words > 0 | ||
| and self._audio_recognition is not None | ||
| ): | ||
| text = self._audio_recognition.current_transcript | ||
|
|
||
| # Original count check | ||
| if len(split_words(text, split_character=True)) < opt.min_interruption_words: | ||
| return | ||
|
|
||
| # --- PATCH START --- | ||
| if _check_backchannel_delta(text): | ||
| logger.debug(f"🚫 [PATCH FILTER] Blocking AUDIO interruption - backchannel delta") | ||
| return | ||
| # --- PATCH END --- | ||
|
|
||
| if self._rt_session is not None: | ||
| self._rt_session.start_user_activity() | ||
|
|
||
| if ( | ||
| self._current_speech is not None | ||
| and not self._current_speech.interrupted | ||
| and self._current_speech.allow_interruptions | ||
| ): | ||
| self._paused_speech = self._current_speech | ||
|
|
||
| if self._false_interruption_timer: | ||
| self._false_interruption_timer.cancel() | ||
| self._false_interruption_timer = None | ||
|
|
||
| if use_pause and self._session.output.audio and self._session.output.audio.can_pause: | ||
| # THIS IS THE CRITICAL LINE THAT WAS PAUSING | ||
| self._session.output.audio.pause() | ||
| self._session._update_agent_state("listening") | ||
| else: | ||
| if self._rt_session is not None: | ||
| self._rt_session.interrupt() | ||
| self._current_speech.interrupt() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check livekit-agents version dependency
find . -type f \( -name "pyproject.toml" -o -name "setup.py" -o -name "requirements*.txt" \) -exec grep -H "livekit" {} \; 2>/dev/null | head -20Repository: livekit/agents
Length of output: 172
🏁 Script executed:
#!/bin/bash
# Examine the backchannel_patch.py file to understand the implementation
wc -l examples/voice_agents/backchannel_patch.py 2>/dev/null || echo "File not found, searching..."
fd -type f "backchannel_patch.py"Repository: livekit/agents
Length of output: 292
🏁 Script executed:
#!/bin/bash
# Check the structure and actual implementation of the patched methods
head -50 examples/voice_agents/backchannel_patch.pyRepository: livekit/agents
Length of output: 1888
🏁 Script executed:
#!/bin/bash
# Read the specific lines mentioned in the review (122-172)
sed -n '122,172p' examples/voice_agents/backchannel_patch.pyRepository: livekit/agents
Length of output: 1985
🏁 Script executed:
#!/bin/bash
# Read more to understand the patching mechanism
sed -n '173,223p' examples/voice_agents/backchannel_patch.pyRepository: livekit/agents
Length of output: 1859
🏁 Script executed:
#!/bin/bash
# Find all pyproject.toml and requirements files in the repo
find . -name "pyproject.toml" -o -name "setup.py" -o -name "requirements*.txt" 2>/dev/null | head -10Repository: livekit/agents
Length of output: 540
🏁 Script executed:
#!/bin/bash
# Check livekit-agents version requirement
cat ./livekit-agents/pyproject.toml | grep -A 20 "^\[project\]" | head -30Repository: livekit/agents
Length of output: 933
🏁 Script executed:
#!/bin/bash
# Check examples voice_agents requirements
cat ./examples/voice_agents/requirements.txtRepository: livekit/agents
Length of output: 187
🏁 Script executed:
#!/bin/bash
# Search for any version checks or guards in the backchannel_patch.py file
grep -n -i "version\|assert\|check.*version" examples/voice_agents/backchannel_patch.pyRepository: livekit/agents
Length of output: 40
🏁 Script executed:
#!/bin/bash
# Check if there's any README or documentation about version compatibility
find ./examples/voice_agents -name "README*" -o -name "*.md" | xargs cat 2>/dev/null | head -100Repository: livekit/agents
Length of output: 3822
Fragile monkeypatch: Reimplemented methods create tight coupling to upstream AgentActivity implementation.
This code copies internal logic from AgentActivity._interrupt_by_audio_activity and AgentActivity.on_end_of_turn and injects checks via monkey-patching. Any upstream changes to these methods (renamed attributes, logic changes, new parameters) will silently break this patch. The version constraint in requirements.txt is loose (livekit-agents>=1.0), providing no protection against breaking changes.
Recommendations:
- Add a version assertion or check to fail fast if
livekit-agentsversion changes incompatibly. - Document the specific
livekit-agentsversion this patch targets. - If this feature proves valuable, consider contributing a proper extension point upstream rather than maintaining a monkeypatch.
🤖 Prompt for AI Agents
In `@examples/voice_agents/backchannel_patch.py` around lines 122 - 172, This
monkeypatch reimplements AgentActivity internals
(patched_interrupt_by_audio_activity referencing
AgentActivity._interrupt_by_audio_activity and on_end_of_turn behavior) and is
fragile to upstream changes; add a fail-fast version check at module import that
reads livekit_agents.__version__ and asserts it matches the exact version (or
acceptable semver range) this patch targets, raise a clear RuntimeError with
guidance if the check fails, add a single-line comment documenting the specific
livekit-agents version the monkeypatch was written for, and tighten
requirements.txt (pin or add an upper bound for livekit-agents) so CI will catch
incompatibilities early; consider adding a TODO linking to upstream contribution
as future work.
| # Queue for pending interruption checks | ||
| self.pending_interruption_check = None | ||
| self.interruption_check_delay = 0.15 # 150ms buffer for STT to produce transcript |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused variables: pending_interruption_check and interruption_check_delay are defined but never used.
These appear to be remnants of planned functionality that wasn't implemented.
- # Queue for pending interruption checks
- self.pending_interruption_check = None
- self.interruption_check_delay = 0.15 # 150ms buffer for STT to produce transcript📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Queue for pending interruption checks | |
| self.pending_interruption_check = None | |
| self.interruption_check_delay = 0.15 # 150ms buffer for STT to produce transcript |
🤖 Prompt for AI Agents
In `@examples/voice_agents/basic_agent.py` around lines 53 - 55, Remove the unused
remnants by deleting the attributes self.pending_interruption_check and
self.interruption_check_delay from the class in
examples/voice_agents/basic_agent.py (or, if intended, implement their use by
wiring interruption checks into the STT/transcript flow); specifically remove
the assignments to pending_interruption_check and interruption_check_delay (or
replace with a documented, implemented interruption scheduling mechanism
referenced by those names) so there are no unused variables left in the agent
class.
| "yeah", "yea", "yes", "yep", "yup", | ||
| "ok", "okay", "alright", "aight", | ||
| "hmm", "hm", "mhm", "mmhmm", "uh-huh", "uhuh", | ||
| "right", "sure", "gotcha", "got it", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multi-word backchannel phrase "got it" will never match.
_is_pure_backchannel (lines 154-184) checks each word individually against backchannel_words. Since "got it" is tokenized into ["got", "it"], and neither "got" nor "it" is individually in the set, this phrase won't be recognized as backchannel.
Either:
- Remove multi-word phrases from
DEFAULT_BACKCHANNEL_WORDS, or - Add similar phrase-matching logic as
_contains_commanduses for multi-word commands.
🤖 Prompt for AI Agents
In `@examples/voice_agents/interruption_filter.py` at line 29,
DEFAULT_BACKCHANNEL_WORDS contains multi-word entries like "got it" but
_is_pure_backchannel currently tokenizes input and checks individual tokens
against that set, so multi-word phrases never match; update _is_pure_backchannel
to detect multi-word backchannel phrases the same way _contains_command handles
multi-word commands (e.g., check contiguous token sequences or join tokens and
search for phrase matches), or alternatively remove multi-word entries from
DEFAULT_BACKCHANNEL_WORDS; modify the logic in _is_pure_backchannel to first
test multi-word phrases from DEFAULT_BACKCHANNEL_WORDS against the token
sequence before falling back to per-token membership checks.
| self.backchannel_words = backchannel_words or self.DEFAULT_BACKCHANNEL_WORDS | ||
| self.command_words = command_words or self.DEFAULT_COMMAND_WORDS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Mutable default sets are shared across instances, causing unintended side effects.
When backchannel_words=None, line 57 assigns self.backchannel_words = self.DEFAULT_BACKCHANNEL_WORDS, which references the class-level set directly. Calling add_backchannel_word() on any instance will mutate the class default, affecting all other instances.
🐛 Recommended fix: Create copies of defaults
- self.backchannel_words = backchannel_words or self.DEFAULT_BACKCHANNEL_WORDS
- self.command_words = command_words or self.DEFAULT_COMMAND_WORDS
+ self.backchannel_words = set(backchannel_words) if backchannel_words else set(self.DEFAULT_BACKCHANNEL_WORDS)
+ self.command_words = set(command_words) if command_words else set(self.DEFAULT_COMMAND_WORDS)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| self.backchannel_words = backchannel_words or self.DEFAULT_BACKCHANNEL_WORDS | |
| self.command_words = command_words or self.DEFAULT_COMMAND_WORDS | |
| self.backchannel_words = set(backchannel_words) if backchannel_words else set(self.DEFAULT_BACKCHANNEL_WORDS) | |
| self.command_words = set(command_words) if command_words else set(self.DEFAULT_COMMAND_WORDS) |
🤖 Prompt for AI Agents
In `@examples/voice_agents/interruption_filter.py` around lines 57 - 58, The
instance assignments for backchannel_words and command_words currently reference
the class-level sets (DEFAULT_BACKCHANNEL_WORDS, DEFAULT_COMMAND_WORDS), causing
shared mutable state; change the constructor to copy those defaults when None is
passed (e.g., set self.backchannel_words = set(self.DEFAULT_BACKCHANNEL_WORDS)
and self.command_words = set(self.DEFAULT_COMMAND_WORDS)) so that methods like
add_backchannel_word() mutate only the instance's sets rather than the shared
class defaults.
| def test_substantial_input_interrupts(self): | ||
| """Substantial non-backchannel input should interrupt.""" | ||
| assert self.filter.should_interrupt("tell me about that", agent_is_speaking=True) == True | ||
| assert self.filter.should_interrupt("can you explain", agent_is_speaking=True) == True | ||
| assert self.filter.should_interrupt("I don't understand", agent_is_speaking=True) == True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test passes for the wrong reason: "don't" is a command word.
"I don't understand" triggers interruption because "don't" is in DEFAULT_COMMAND_WORDS, not because of substantial content detection. This test doesn't actually verify the "substantial input" path.
Replace with a phrase that contains no command words:
- assert self.filter.should_interrupt("I don't understand", agent_is_speaking=True) == True
+ assert self.filter.should_interrupt("what do you mean by that", agent_is_speaking=True) == True📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_substantial_input_interrupts(self): | |
| """Substantial non-backchannel input should interrupt.""" | |
| assert self.filter.should_interrupt("tell me about that", agent_is_speaking=True) == True | |
| assert self.filter.should_interrupt("can you explain", agent_is_speaking=True) == True | |
| assert self.filter.should_interrupt("I don't understand", agent_is_speaking=True) == True | |
| def test_substantial_input_interrupts(self): | |
| """Substantial non-backchannel input should interrupt.""" | |
| assert self.filter.should_interrupt("tell me about that", agent_is_speaking=True) == True | |
| assert self.filter.should_interrupt("can you explain", agent_is_speaking=True) == True | |
| assert self.filter.should_interrupt("what do you mean by that", agent_is_speaking=True) == True |
🤖 Prompt for AI Agents
In `@examples/voice_agents/test_interruption_filter.py` around lines 89 - 93, The
test test_substantial_input_interrupts is passing for the wrong reason because
the phrase "I don't understand" contains "don't", which is in
DEFAULT_COMMAND_WORDS; update the test to use a phrase with substantial content
that contains no command words (e.g. "I'm confused" or "that's confusing") so
the assertion exercises the substantial-input detection path in should_interrupt
rather than matching DEFAULT_COMMAND_WORDS; modify the call(s) to
filter.should_interrupt in test_substantial_input_interrupts accordingly.
| # Track the last processed transcript to extract delta | ||
| _last_processed_transcript: str = "" | ||
|
|
||
| def _extract_words(text: str) -> list: | ||
| """Extract words from text, removing punctuation.""" | ||
| import re | ||
| if not text: | ||
| return [] | ||
| normalized = text.lower().strip() | ||
| return [w for w in re.sub(r'[^\w\s-]', ' ', normalized).split() if w] | ||
|
|
||
| def _get_transcript_delta(full_transcript: str) -> str: | ||
| """ | ||
| Get only the NEW portion of the transcript since the last processed turn. | ||
| This is crucial for detecting backchannel in accumulated transcripts. | ||
| """ | ||
| global _last_processed_transcript | ||
|
|
||
| if not full_transcript: | ||
| return "" | ||
|
|
||
| # Normalize both transcripts | ||
| full_words = _extract_words(full_transcript) | ||
| last_words = _extract_words(_last_processed_transcript) | ||
|
|
||
| if not full_words: | ||
| return "" | ||
|
|
||
| if not last_words: | ||
| # No previous transcript, return last few words (typical backchannel length) | ||
| return " ".join(full_words[-3:]) | ||
|
|
||
| # Find where the new words start | ||
| # The new words are whatever comes AFTER the last processed words | ||
| last_len = len(last_words) | ||
| full_len = len(full_words) | ||
|
|
||
| if full_len <= last_len: | ||
| # Nothing new or same length, return last few words | ||
| return " ".join(full_words[-3:]) | ||
|
|
||
| # Extract delta (new words only) | ||
| delta_words = full_words[last_len:] | ||
| return " ".join(delta_words) | ||
|
|
||
| def _update_last_transcript(transcript: str) -> None: | ||
| """Update the last processed transcript.""" | ||
| global _last_processed_transcript | ||
| _last_processed_transcript = transcript |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Global mutable state causes cross-session contamination.
_last_processed_transcript is module-level global state shared across ALL AgentActivity instances. In multi-agent or multi-session deployments, this will cause:
- Data races: Concurrent sessions modifying the same variable without synchronization
- Logic errors: Session A's transcript affecting Session B's delta calculations
The state must be moved to instance-level within AgentActivity.
🔧 Suggested fix: Move state to AgentActivity instance
Remove the global variable and add instance state:
-# Track the last processed transcript to extract delta
-_last_processed_transcript: str = ""In the AgentActivity.__init__ method, add:
self._last_processed_transcript: str = ""Then update the helper functions to accept the instance or refactor them as methods:
-def _get_transcript_delta(full_transcript: str) -> str:
- global _last_processed_transcript
+def _get_transcript_delta(full_transcript: str, last_processed: str) -> str:
if not full_transcript:
return ""
full_words = _extract_words(full_transcript)
- last_words = _extract_words(_last_processed_transcript)
+ last_words = _extract_words(last_processed)
# ... rest of logic🤖 Prompt for AI Agents
In `@livekit-agents/livekit/agents/voice/agent_activity.py` around lines 60 - 108,
The module-level mutable _last_processed_transcript causes cross-session
contamination; move this state into the AgentActivity instance by adding
self._last_processed_transcript: str = "" in AgentActivity.__init__ and refactor
the helpers so they operate on instance state (make _get_transcript_delta and
_update_last_transcript instance methods or accept an AgentActivity/self
parameter) while keeping _extract_words as a pure helper; ensure calls update
and read self._last_processed_transcript instead of the removed global.
| if not last_words: | ||
| # No previous transcript, return last few words (typical backchannel length) | ||
| return " ".join(full_words[-3:]) | ||
|
|
||
| # Find where the new words start | ||
| # The new words are whatever comes AFTER the last processed words | ||
| last_len = len(last_words) | ||
| full_len = len(full_words) | ||
|
|
||
| if full_len <= last_len: | ||
| # Nothing new or same length, return last few words | ||
| return " ".join(full_words[-3:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear fallback logic when returning last 3 words.
When there's no previous transcript or when the transcript hasn't grown, the function returns the last 3 words instead of the actual delta. This could incorrectly classify legitimate speech as backchannel if the last 3 words happen to be backchannel words.
Consider returning an empty string or the full transcript in these edge cases, with explicit documentation of the expected behavior.
🤖 Prompt for AI Agents
In `@livekit-agents/livekit/agents/voice/agent_activity.py` around lines 88 - 99,
The fallback that returns the last 3 words when last_words is empty or full_len
<= last_len should be changed to return an empty string (not the last 3 words)
to avoid misclassifying legitimate speech as backchannel; update the branches
that handle "if not last_words" and "if full_len <= last_len" to return "" and
add a brief docstring/comment in the transcript-delta function (referencing
last_words and full_words) explaining that an empty string means "no new speech
delta" and why this avoids false backchannels.
Implemented a context-aware interruption handling system to solve the issue where the agent would abruptly stop speaking when the user gave passive acknowledgments (e.g., "yeah", "mhmm", "okay").
🛠️ Key Implementation Details
Runtime Monkeypatch: Implemented a runtime patch for
AgentActivity
to intercept the VAD (Voice Activity Detection) interruption signal before it pauses the audio stream.
Delta-Based Detection: Created logic to extract only the newly spoken words (the delta) from the accumulated transcript. This ensures that a user saying "mhmm" is correctly identified as a backchannel, while "mhmm wait stop" is treated as an interruption.
External Configuration: Added
filter_config.json
to allow easy modification of ignored words ("backchannel_words") and interrupt triggers ("command_words") without code changes.
📂 Files Added/Modified
examples/voice_agents/backchannel_patch.py
: Contains the core logic for the monkeypatch and delta detection.
examples/voice_agents/filter_config.json
: New configuration file for defining ignored word lists.
examples/voice_agents/basic_agent.py
: Updated to apply the patch on startup.
examples/voice_agents/README.md
: Updated with usage and configuration instructions.
✅ Test Results
Scenario 1 (Backchannel): User says "yeah/okay" while agent speaks -> Agent continues without pausing.
Scenario 2 (Interruption): User says "stop/wait" -> Agent interrupts immediately.
Scenario 3 (Mixed): User says "yeah but wait" -> Agent interrupts (detects command word).
Here is the attached drive link with video link - https://drive.google.com/drive/folders/125VJhOGH4_xUBFxZsN2SOSiYlcwZthEU?usp=sharing
Summary by CodeRabbit
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.