Skip to content

stale transcripts getting added to next user turn when using min_interrupt_words #4498

@0xnktd

Description

@0xnktd

Bug Description

When using min_interruption_words to prevent agent interruptions from short backchannel utterances, these transcripts are buffered and incorrectly included in the next user turn, causing confusion for the LLM.

Problem Details

  • Agent is speaking and user says a backchannel (e.g., "uh-huh", "okay")
  • If the utterance has fewer words than min_interruption_words, the agent continues speaking
  • The transcript gets stored in AudioRecognition buffers (_audio_transcript, _audio_interim_transcript)
  • When the agent finishes speaking, these buffered transcripts appear in the next user turn
  • The LLM receives stale backchannel transcripts as if they were new user input

Expected Behavior

Backchannel utterances that don't meet the interruption threshold should be discarded and not included in subsequent user turns.

Reproduction Steps

1. 
2.
3.
...
- Sample code snippet, or a GitHub Gist link -

Operating System

Linux

Models Used

No response

Package Versions

livekit-agents==1.3.9

Session/Room/Call IDs

No response

Proposed Solution

Implement a method in AudioRecognition class to clear stale transcripts that are older than a threshold time.

Additional Context

No response

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions