Skip to content

Phantom VAD activity that kills ongoing llm_node when using a STT model with endpointing #4243

@yiphei

Description

@yiphei

Bug Description

i have the following stt and turn detection setup

        stt=deepgram.STTv2(),
        turn_detection="stt",

this was working fine in livekit agents 1.2.8. After upgrading to 1.3, this no longer works well. It often produces phantom VAD activity that would interrupt the agent, even though the user said nothing at all.

Expected Behavior

No phantom VAD activity

Reproduction Steps

  1. run the following agent code in console mode
import asyncio
from dotenv import load_dotenv

from livekit import agents, rtc
from livekit.agents import AgentServer, AgentSession, Agent, room_io
from livekit.plugins import deepgram
from livekit.plugins import noise_cancellation, silero

load_dotenv()


class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="""You are a helpful voice AI assistant.
            You eagerly assist users with their questions by providing information from your extensive knowledge.
            Your responses are concise, to the point, and without any complex formatting or punctuation including emojis, asterisks, or other symbols.
            You are curious, friendly, and have a sense of humor.""",
        )

    async def llm_node(self, chat_ctx, tools, model_settings):
        try:
            async for chunk in Agent.default.llm_node(
                self, chat_ctx, tools, model_settings
            ):
                yield chunk
        except asyncio.CancelledError:
            print("CancelledError in llm_node")
            raise


server = AgentServer()


@server.rtc_session()
async def my_agent(ctx: agents.JobContext):
    session = AgentSession(
        stt=deepgram.STTv2(),
        llm="openai/gpt-4.1-mini",
        tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
        vad=silero.VAD.load(min_silence_duration=0.3),
        turn_detection="stt",
        allow_interruptions=True,
        min_endpointing_delay=0.1,
        max_endpointing_delay=1.5,
        min_interruption_duration=0.3,
        user_away_timeout=4,
    )

    @session.on("user_state_changed")
    def _on_user_state_changed(ev) -> None:
        print(f"{'@' * 50} user_state_changed: {ev.new_state} {'@' * 50}")

    @session.on("agent_state_changed")
    def _on_agent_state_changed(ev) -> None:
        print(f"{'*' * 50} agent_state_changed: {ev.new_state} {'*' * 50}")

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_options=room_io.RoomOptions(
            audio_input=room_io.AudioInputOptions(
                noise_cancellation=lambda params: noise_cancellation.BVCTelephony()
                if params.participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP
                else noise_cancellation.BVC(),
            ),
        ),
    )


if __name__ == "__main__":
    agents.cli.run_app(server)

  1. when the agent session starts, say a simple "Hello"
  2. bug should happen

Note that these steps dont reproduce the bug 100% of the times, but im able to reproduce it about 30% of the times. When the bug is reproduced, this is what the console logs should look like

➜  daikon git:(repro-bug) uv run quickstart.py console
    Agents   Starting console mode 🚀

    14:06:53.701 DEBUG  asyncio            Using selector: KqueueSelector
    14:06:53.702 INFO   livekit.agents     starting worker {"version": "1.3.6", "rtc-version": "1.0.20"}
    14:06:53.703 INFO   livekit.agents     HTTP server listening on :60009
    14:06:53.717 INFO   livekit.agents     initializing job runner {"tid": 51230842}
                 DEBUG  asyncio            Using selector: KqueueSelector
                 INFO   livekit.agents     job runner initialized {"tid": 51230842, "elapsed_time": 0.0}
    14:06:53.766 DEBUG  livekit.agents     http_session(): creating a new httpclient ctx
************************************************** agent_state_changed: listening **************************************************
    14:06:53.767 DEBUG  livekit.agents     using audio io: `Console` -> `AgentSession` -> `Console`
                 WARNI… livekit.agents     resume_false_interruption is enabled but audio output does not support pause, it will be ignored {"audio_output": "Console"}
    14:06:53.768 DEBUG  livekit.agents     using transcript io: `AgentSession` -> (none)
    14:06:53.991 DEBUG  livekit.….deepgram Established new Deepgram STT WebSocket connection:
                                           {"headers": {"dg-project-id": "9e619461-68cf-4601-b7f1-1349880caf93", "dg-request-id": "f518be90-739c-4ecf-8b1c-00cc6f7edf5f", "Date": "Fri, 12
Dec 2025 22:06:53 GMT"}}
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ user_state_changed: speaking @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    14:06:55.988 DEBUG  livekit.agents     received user transcript {"user_transcript": "Hello?", "language": "en", "transcript_delay": 0.045729875564575195}
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ user_state_changed: listening @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
************************************************** agent_state_changed: thinking **************************************************
CancelledError in llm_node
************************************************** agent_state_changed: listening **************************************************
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ user_state_changed: away @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    14:07:05.648 INFO   livekit.agents     shutting down worker {"id": "unregistered"}
    14:07:05.652 DEBUG  livekit.agents     session closed {"reason": "user_initiated", "error": null}
    14:07:05.654 DEBUG  livekit.agents     shutting down job task {"reason": "", "user_initiated": false}
    14:07:05.655 DEBUG  livekit.agents     job exiting {"reason": "", "tid": 51230842, "job_id": "fake-job-33f171799214", "room_id": "FAKE_RM_b79cb0e0d1f9"}
    14:07:05.657 DEBUG  livekit.agents     http_session(): closing the httpclient ctx

As you can observe in these logs, the agent state changed from thinking to listening without the user state changing to speaking. Yet, the llm_node got cancelled as noted by the log CancelledError in llm_node

Operating System

macOS Sequoia

Models Used

No response

Package Versions

"livekit-agents[cartesia,deepgram,openai,silero,turn-detector,google,elevenlabs,hume,inworld,assemblyai,mistralai,speechmatics]>=1.3.6"
    "livekit-plugins-noise-cancellation>=0.2.5"

Session/Room/Call IDs

No response

Proposed Solution

Additional Context

No response

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions