Skip to content

Conversation

@gautam2905
Copy link

@gautam2905 gautam2905 commented Jan 20, 2026

Summary by CodeRabbit

Release Notes

  • New Features
    • Added intelligent backchannel handling to improve interruption detection. The system now distinguishes between acknowledgment words (e.g., "yeah", "ok") and explicit interruption commands (e.g., "stop", "wait"), preventing the agent from being interrupted by simple user acknowledgments.

✏️ Tip: You can customize this high-level summary in your review settings.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 20, 2026

📝 Walkthrough

Walkthrough

This PR introduces backchannel-aware interruption handling for an AI agent, adding a new documentation guide and modifying the interruption logic in agent_activity.py. The implementation includes a BackchannelConfig class to filter filler/backchannel words, an is_agent_speaking property, and enhanced transcript-aware decision logic across multiple interrupt pathways.

Changes

Cohort / File(s) Summary
Documentation
Interruption_Handling.md
New guide documenting intelligent interruption handling system, including behavior for affirmative and interrupt words, configuration examples, decision flow, and test scenarios.
Core Implementation
livekit-agents/livekit/agents/voice/agent_activity.py
Adds BackchannelConfig dataclass with configurable ignore/interrupt word lists and decision methods; introduces is_agent_speaking property; modifies _interrupt_by_audio_activity to accept optional transcript parameter and consult backchannel config; updates on_interim_transcript, on_final_transcript, on_end_of_turn, and on_vad_inference_done to respect backchannel filtering and agent speaking state.

Sequence Diagram

sequenceDiagram
    participant User
    participant AudioActivity as Audio Activity<br/>(VAD/Transcript)
    participant Agent
    participant BackchannelConfig as Backchannel<br/>Config
    participant InterruptLogic as Interrupt<br/>Decision Logic

    User->>AudioActivity: Speaks (with input)
    AudioActivity->>AudioActivity: Generate transcript
    AudioActivity->>InterruptLogic: _interrupt_by_audio_activity(transcript)
    InterruptLogic->>Agent: Check is_agent_speaking?
    
    alt Agent is Speaking
        InterruptLogic->>BackchannelConfig: should_interrupt(transcript)?
        BackchannelConfig->>BackchannelConfig: is_backchannel_only()?
        
        alt Backchannel Only (e.g., "yeah", "ok")
            BackchannelConfig-->>InterruptLogic: False (skip interrupt)
            InterruptLogic->>InterruptLogic: Log, skip interrupt
        else Contains Interrupt Word (e.g., "stop", "wait")
            BackchannelConfig-->>InterruptLogic: True (allow interrupt)
            InterruptLogic->>Agent: Trigger interrupt
        end
    else Agent Not Speaking
        InterruptLogic->>Agent: Trigger interrupt
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • longcw

Poem

🐰 A backchannel whisper floats through the air,
"Yeah" and "okay" need gentle care,
While "stop" and "wait" cut swift and clear,
The agent listens, aware and sincere,
Smart interruptions, no false alarms here! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.77% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'interrupt handling assignment solved' is vague and uses informal language. While it relates to the interruption handling feature added, it doesn't clearly describe what was actually implemented or changed. Use a more descriptive title that clearly explains the change, such as 'Add backchannel interruption handling with configurable word filtering' or 'Implement intelligent interruption handler for agent voice activity'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@gautam2905 gautam2905 closed this Jan 20, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@livekit-agents/livekit/agents/voice/agent_activity.py`:
- Around line 122-128: The contains_interrupt_word method is falsely matching
substrings (e.g., "butter" matching "but"); change its logic in
contains_interrupt_word to perform word-boundary matching instead of simple
substring checks: after normalizing text = transcript.lower().strip(), either
tokenize the text (split on whitespace/punctuation) and check membership against
self.interrupt_words, or use regex with word boundaries (e.g.,
re.search(rf'\b{re.escape(word)}\b', text)) for each word in
self.interrupt_words; ensure you escape interrupt words and keep existing
lowercasing so matches are exact whole-word matches.
🧹 Nitpick comments (6)
livekit-agents/livekit/agents/voice/agent_activity.py (3)

79-80: Import placement and consolidation.

Standard library imports (re) should be grouped with other standard library imports near the top of the file. The field import should be combined with the existing dataclass import on line 9.

Suggested fix

On line 9, update the import:

from dataclasses import dataclass, field

Then move import re to be grouped with other standard library imports near the top (e.g., after line 7 with time).

Remove lines 79-81 entirely.


207-208: Consider exposing BackchannelConfig as a configurable parameter.

The backchannel configuration is currently hardcoded. For production use, consider allowing users to customize the word lists via constructor parameters or session options.


1468-1479: Line length exceeds 100 characters on line 1473.

Per coding guidelines, maintain a maximum line length of 100 characters.

Suggested fix
         if (
             self._current_speech is not None
             and not self._current_speech.interrupted
             and self._current_speech.allow_interruptions
-            and not self._backchannel_config.should_interrupt(info.new_transcript, agent_is_speaking=True)
+            and not self._backchannel_config.should_interrupt(
+                info.new_transcript, agent_is_speaking=True
+            )
         ):
Interruption_Handling.md (3)

43-54: Add language specifiers to fenced code blocks.

Per markdownlint, fenced code blocks should have a language specified. For word lists, you can use text or plaintext.

Suggested fix
-```
+```text
 yeah, yes, yep, yup, ok, okay, hmm, hm, mm, mhm, 
 uh-huh, right, sure, got it, ah, uh, um, oh, 
 i see, alright, all right

Words That Always Interrupt

- +text
stop, wait, hold on, pause, no, cancel, quit,
exit, excuse me, actually, but, however


83-94: Add language specifier to flowchart code block.

The flowchart diagram should have a language specifier. Use text for ASCII diagrams.

Suggested fix
-```
+```text
 User speaks "yeah"
       ↓
 Is agent talking? 
       ↓
   YES → Is it just filler words?
             ↓
         YES → IGNORE (agent keeps talking)
         NO  → INTERRUPT (agent stops)
       ↓
   NO  → RESPOND normally
</details>

---

`107-110`: **Typo: "skipped" should clarify the behavior.**

Line 109 mentions "skipped the interruptions" but the logs show "Ignoring backchannel" messages. Consider rewording for clarity: "Look for `Ignoring backchannel while agent speaking` in the logs to see when filler words were detected and ignored."

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 07223717cb78f843c5408dbfaca474e38aa38c8e and eb25dd6e77bea6465962730226d311dfbeff04d7.

</details>

<details>
<summary>📒 Files selected for processing (2)</summary>

* `Interruption_Handling.md`
* `livekit-agents/livekit/agents/voice/agent_activity.py`

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>📓 Path-based instructions (1)</summary>

<details>
<summary>**/*.py</summary>


**📄 CodeRabbit inference engine (AGENTS.md)**

> `**/*.py`: Format code with ruff
> Run ruff linter and auto-fix issues
> Run mypy type checker in strict mode
> Maintain line length of 100 characters maximum
> Ensure Python 3.9+ compatibility
> Use Google-style docstrings

Files:
- `livekit-agents/livekit/agents/voice/agent_activity.py`

</details>

</details><details>
<summary>🧬 Code graph analysis (1)</summary>

<details>
<summary>livekit-agents/livekit/agents/voice/agent_activity.py (3)</summary><blockquote>

<details>
<summary>livekit-agents/livekit/agents/voice/speech_handle.py (4)</summary>

* `interrupted` (65-66)
* `done` (97-98)
* `allow_interruptions` (69-70)
* `allow_interruptions` (73-91)

</details>
<details>
<summary>livekit-agents/livekit/agents/voice/audio_recognition.py (1)</summary>

* `current_transcript` (324-330)

</details>
<details>
<summary>livekit-agents/livekit/agents/voice/events.py (1)</summary>

* `UserInputTranscribedEvent` (115-121)

</details>

</blockquote></details>

</details><details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

<details>
<summary>Interruption_Handling.md</summary>

43-43: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

51-51: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

83-83: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)</summary>

* GitHub Check: unit-tests
* GitHub Check: type-check (3.9)
* GitHub Check: type-check (3.13)

</details>

<details>
<summary>🔇 Additional comments (5)</summary><blockquote>

<details>
<summary>livekit-agents/livekit/agents/voice/agent_activity.py (5)</summary><blockquote>

`335-342`: **LGTM!**

The `is_agent_speaking` property correctly encapsulates the speaking state check with appropriate null-safety.

---

`1254-1264`: **Transcript logging may include user speech content.**

The debug log at line 1262 includes the full transcript. While this is at DEBUG level, ensure this aligns with your privacy/compliance requirements for logging user speech.

---

`1332-1335`: **Disabling VAD-based interruption may impact responsiveness.**

Completely bypassing VAD interruption while the agent is speaking means interruptions are now dependent on STT transcript availability. This could introduce noticeable latency for urgent user commands if STT is slow.

Consider whether a hybrid approach might be better—e.g., allowing VAD-based interruption after a speech duration threshold, even before transcript is available.

---

`1345-1361`: **LGTM!**

The transcript extraction and passing to backchannel-aware interruption logic is correctly implemented.

---

`1376-1395`: **LGTM!**

Consistent with the interim transcript handling pattern.

</blockquote></details>

</blockquote></details>

<sub>✏️ Tip: You can disable this entire section by setting `review_details` to `false` in your review settings.</sub>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment on lines +122 to +128
def contains_interrupt_word(self, transcript: str) -> bool:
"""Check if transcript contains any interrupt command."""
text = transcript.lower().strip()
for word in self.interrupt_words:
if word in text:
return True
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Substring matching causes false positives for interrupt detection.

The word in text check uses substring matching, which will incorrectly trigger interruptions. For example:

  • "butter" contains "but"
  • "snow" contains "no"
  • "pausing" contains "pause"
Suggested fix using word boundary matching
 def contains_interrupt_word(self, transcript: str) -> bool:
     """Check if transcript contains any interrupt command."""
     text = transcript.lower().strip()
     for word in self.interrupt_words:
-        if word in text:
+        # Use word boundary matching to avoid partial matches
+        pattern = r'\b' + re.escape(word) + r'\b'
+        if re.search(pattern, text):
             return True
     return False
🤖 Prompt for AI Agents
In `@livekit-agents/livekit/agents/voice/agent_activity.py` around lines 122 -
128, The contains_interrupt_word method is falsely matching substrings (e.g.,
"butter" matching "but"); change its logic in contains_interrupt_word to perform
word-boundary matching instead of simple substring checks: after normalizing
text = transcript.lower().strip(), either tokenize the text (split on
whitespace/punctuation) and check membership against self.interrupt_words, or
use regex with word boundaries (e.g., re.search(rf'\b{re.escape(word)}\b',
text)) for each word in self.interrupt_words; ensure you escape interrupt words
and keep existing lowercasing so matches are exact whole-word matches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants