-
Notifications
You must be signed in to change notification settings - Fork 2.8k
interrupt handling assignment solved #4561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
interrupt handling assignment solved #4561
Conversation
|
|
📝 WalkthroughWalkthroughThis PR introduces backchannel-aware interruption handling for an AI agent, adding a new documentation guide and modifying the interruption logic in Changes
Sequence DiagramsequenceDiagram
participant User
participant AudioActivity as Audio Activity<br/>(VAD/Transcript)
participant Agent
participant BackchannelConfig as Backchannel<br/>Config
participant InterruptLogic as Interrupt<br/>Decision Logic
User->>AudioActivity: Speaks (with input)
AudioActivity->>AudioActivity: Generate transcript
AudioActivity->>InterruptLogic: _interrupt_by_audio_activity(transcript)
InterruptLogic->>Agent: Check is_agent_speaking?
alt Agent is Speaking
InterruptLogic->>BackchannelConfig: should_interrupt(transcript)?
BackchannelConfig->>BackchannelConfig: is_backchannel_only()?
alt Backchannel Only (e.g., "yeah", "ok")
BackchannelConfig-->>InterruptLogic: False (skip interrupt)
InterruptLogic->>InterruptLogic: Log, skip interrupt
else Contains Interrupt Word (e.g., "stop", "wait")
BackchannelConfig-->>InterruptLogic: True (allow interrupt)
InterruptLogic->>Agent: Trigger interrupt
end
else Agent Not Speaking
InterruptLogic->>Agent: Trigger interrupt
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@livekit-agents/livekit/agents/voice/agent_activity.py`:
- Around line 122-128: The contains_interrupt_word method is falsely matching
substrings (e.g., "butter" matching "but"); change its logic in
contains_interrupt_word to perform word-boundary matching instead of simple
substring checks: after normalizing text = transcript.lower().strip(), either
tokenize the text (split on whitespace/punctuation) and check membership against
self.interrupt_words, or use regex with word boundaries (e.g.,
re.search(rf'\b{re.escape(word)}\b', text)) for each word in
self.interrupt_words; ensure you escape interrupt words and keep existing
lowercasing so matches are exact whole-word matches.
🧹 Nitpick comments (6)
livekit-agents/livekit/agents/voice/agent_activity.py (3)
79-80: Import placement and consolidation.Standard library imports (
re) should be grouped with other standard library imports near the top of the file. Thefieldimport should be combined with the existingdataclassimport on line 9.Suggested fix
On line 9, update the import:
from dataclasses import dataclass, fieldThen move
import reto be grouped with other standard library imports near the top (e.g., after line 7 withtime).Remove lines 79-81 entirely.
207-208: Consider exposing BackchannelConfig as a configurable parameter.The backchannel configuration is currently hardcoded. For production use, consider allowing users to customize the word lists via constructor parameters or session options.
1468-1479: Line length exceeds 100 characters on line 1473.Per coding guidelines, maintain a maximum line length of 100 characters.
Suggested fix
if ( self._current_speech is not None and not self._current_speech.interrupted and self._current_speech.allow_interruptions - and not self._backchannel_config.should_interrupt(info.new_transcript, agent_is_speaking=True) + and not self._backchannel_config.should_interrupt( + info.new_transcript, agent_is_speaking=True + ) ):Interruption_Handling.md (3)
43-54: Add language specifiers to fenced code blocks.Per markdownlint, fenced code blocks should have a language specified. For word lists, you can use
textorplaintext.Suggested fix
-``` +```text yeah, yes, yep, yup, ok, okay, hmm, hm, mm, mhm, uh-huh, right, sure, got it, ah, uh, um, oh, i see, alright, all rightWords That Always Interrupt
-
+text
stop, wait, hold on, pause, no, cancel, quit,
exit, excuse me, actually, but, however
83-94: Add language specifier to flowchart code block.The flowchart diagram should have a language specifier. Use
textfor ASCII diagrams.Suggested fix
-``` +```text User speaks "yeah" ↓ Is agent talking? ↓ YES → Is it just filler words? ↓ YES → IGNORE (agent keeps talking) NO → INTERRUPT (agent stops) ↓ NO → RESPOND normally</details> --- `107-110`: **Typo: "skipped" should clarify the behavior.** Line 109 mentions "skipped the interruptions" but the logs show "Ignoring backchannel" messages. Consider rewording for clarity: "Look for `Ignoring backchannel while agent speaking` in the logs to see when filler words were detected and ignored." </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: Organization UI **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 07223717cb78f843c5408dbfaca474e38aa38c8e and eb25dd6e77bea6465962730226d311dfbeff04d7. </details> <details> <summary>📒 Files selected for processing (2)</summary> * `Interruption_Handling.md` * `livekit-agents/livekit/agents/voice/agent_activity.py` </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>📓 Path-based instructions (1)</summary> <details> <summary>**/*.py</summary> **📄 CodeRabbit inference engine (AGENTS.md)** > `**/*.py`: Format code with ruff > Run ruff linter and auto-fix issues > Run mypy type checker in strict mode > Maintain line length of 100 characters maximum > Ensure Python 3.9+ compatibility > Use Google-style docstrings Files: - `livekit-agents/livekit/agents/voice/agent_activity.py` </details> </details><details> <summary>🧬 Code graph analysis (1)</summary> <details> <summary>livekit-agents/livekit/agents/voice/agent_activity.py (3)</summary><blockquote> <details> <summary>livekit-agents/livekit/agents/voice/speech_handle.py (4)</summary> * `interrupted` (65-66) * `done` (97-98) * `allow_interruptions` (69-70) * `allow_interruptions` (73-91) </details> <details> <summary>livekit-agents/livekit/agents/voice/audio_recognition.py (1)</summary> * `current_transcript` (324-330) </details> <details> <summary>livekit-agents/livekit/agents/voice/events.py (1)</summary> * `UserInputTranscribedEvent` (115-121) </details> </blockquote></details> </details><details> <summary>🪛 markdownlint-cli2 (0.18.1)</summary> <details> <summary>Interruption_Handling.md</summary> 43-43: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 51-51: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 83-83: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)</summary> * GitHub Check: unit-tests * GitHub Check: type-check (3.9) * GitHub Check: type-check (3.13) </details> <details> <summary>🔇 Additional comments (5)</summary><blockquote> <details> <summary>livekit-agents/livekit/agents/voice/agent_activity.py (5)</summary><blockquote> `335-342`: **LGTM!** The `is_agent_speaking` property correctly encapsulates the speaking state check with appropriate null-safety. --- `1254-1264`: **Transcript logging may include user speech content.** The debug log at line 1262 includes the full transcript. While this is at DEBUG level, ensure this aligns with your privacy/compliance requirements for logging user speech. --- `1332-1335`: **Disabling VAD-based interruption may impact responsiveness.** Completely bypassing VAD interruption while the agent is speaking means interruptions are now dependent on STT transcript availability. This could introduce noticeable latency for urgent user commands if STT is slow. Consider whether a hybrid approach might be better—e.g., allowing VAD-based interruption after a speech duration threshold, even before transcript is available. --- `1345-1361`: **LGTM!** The transcript extraction and passing to backchannel-aware interruption logic is correctly implemented. --- `1376-1395`: **LGTM!** Consistent with the interim transcript handling pattern. </blockquote></details> </blockquote></details> <sub>✏️ Tip: You can disable this entire section by setting `review_details` to `false` in your review settings.</sub> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
| def contains_interrupt_word(self, transcript: str) -> bool: | ||
| """Check if transcript contains any interrupt command.""" | ||
| text = transcript.lower().strip() | ||
| for word in self.interrupt_words: | ||
| if word in text: | ||
| return True | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Substring matching causes false positives for interrupt detection.
The word in text check uses substring matching, which will incorrectly trigger interruptions. For example:
- "butter" contains "but"
- "snow" contains "no"
- "pausing" contains "pause"
Suggested fix using word boundary matching
def contains_interrupt_word(self, transcript: str) -> bool:
"""Check if transcript contains any interrupt command."""
text = transcript.lower().strip()
for word in self.interrupt_words:
- if word in text:
+ # Use word boundary matching to avoid partial matches
+ pattern = r'\b' + re.escape(word) + r'\b'
+ if re.search(pattern, text):
return True
return False🤖 Prompt for AI Agents
In `@livekit-agents/livekit/agents/voice/agent_activity.py` around lines 122 -
128, The contains_interrupt_word method is falsely matching substrings (e.g.,
"butter" matching "but"); change its logic in contains_interrupt_word to perform
word-boundary matching instead of simple substring checks: after normalizing
text = transcript.lower().strip(), either tokenize the text (split on
whitespace/punctuation) and check membership against self.interrupt_words, or
use regex with word boundaries (e.g., re.search(rf'\b{re.escape(word)}\b',
text)) for each word in self.interrupt_words; ensure you escape interrupt words
and keep existing lowercasing so matches are exact whole-word matches.
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.