feat: answering machine detection#4906
Conversation
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
| def _on_first_audio(self) -> None: | ||
| """Start AMD on the first audio frame and pause speech authorization.""" | ||
| if self._classifier is None or self._classifier.started: | ||
| return | ||
| self._classifier.start() | ||
| if self._session is not None and self._session._activity is not None: | ||
| self._session._activity._pause_authorization() |
There was a problem hiding this comment.
🟡 AMD authorization pause not applied to new AgentActivity created during agent handoff
When AMD pauses authorization via _pause_authorization() on the current AgentActivity, and then an agent handoff occurs (e.g., via update_agent), a new AgentActivity is created with _authorization_allowed initialized as set (livekit-agents/livekit/agents/voice/agent_activity.py:155-156). The AMD's _on_first_audio only fires once (it checks self._classifier.started at detector.py:139 and returns), so _pause_authorization() is never called on the new activity. This means the new activity's speech will bypass AMD's authorization gate, defeating the purpose of holding speech until AMD resolves.
Scenario
- AMD starts, calls
_pause_authorization()on current activity - Agent handoff occurs (e.g. user calls
session.update_agent()) while AMD is still pending - New
AgentActivityis created with_authorization_allowedalready set - Speech on the new activity proceeds without waiting for AMD result
Prompt for agents
The AMD detector pauses authorization on the current AgentActivity, but if an agent handoff creates a new AgentActivity while AMD is still pending, the new activity won't have authorization paused. To fix this, either: (1) propagate the AMD authorization pause state to new AgentActivity instances when they are created in _update_activity (in agent_session.py), e.g. by checking if session._amd is pending and calling _pause_authorization() on the new activity; or (2) have the AMD store a reference to the session rather than the activity and apply the pause on whatever is the current activity at any given time, checking this in the activity's authorization wait path.
Was this helpful? React with 👍 or 👎 to provide feedback.
| result = self._result | ||
|
|
||
| if result.is_machine and self._interrupt_on_machine: | ||
| await self._session.interrupt(force=True) | ||
|
|
||
| if result.category == AMDCategory.MACHINE_IVR and self._ivr_detection: | ||
| await self._session._start_ivr_detection(transcript=result.transcript) | ||
|
|
||
| # eagerly resume so agent can speak immediately to a human | ||
| if self._session._activity is not None: | ||
| self._session._activity._resume_authorization() | ||
|
|
||
| return result |
There was a problem hiding this comment.
🟡 execute() does not resume authorization if an exception occurs before _resume_authorization()
In execute(), if self._session.interrupt(force=True) (line 105) or self._session._start_ivr_detection(...) (line 108) raises an exception, the _resume_authorization() call at line 112 is skipped. When execute() is used inside the async with AMD(...) context manager, __aexit__ → aclose() will resume authorization as a fallback. However, if execute() is called directly (without the context manager), authorization remains permanently paused, deadlocking all subsequent speech generation.
| result = self._result | |
| if result.is_machine and self._interrupt_on_machine: | |
| await self._session.interrupt(force=True) | |
| if result.category == AMDCategory.MACHINE_IVR and self._ivr_detection: | |
| await self._session._start_ivr_detection(transcript=result.transcript) | |
| # eagerly resume so agent can speak immediately to a human | |
| if self._session._activity is not None: | |
| self._session._activity._resume_authorization() | |
| return result | |
| result = self._result | |
| try: | |
| if result.is_machine and self._interrupt_on_machine: | |
| await self._session.interrupt(force=True) | |
| if result.category == AMDCategory.MACHINE_IVR and self._ivr_detection: | |
| await self._session._start_ivr_detection(transcript=result.transcript) | |
| finally: | |
| # eagerly resume so agent can speak immediately to a human | |
| if self._session._activity is not None: | |
| self._session._activity._resume_authorization() | |
| return result |
Was this helpful? React with 👍 or 👎 to provide feedback.
* upstream/main: fix: add PARTICIPANT_KIND_CONNECTOR to default participant kinds (livekit#5339) feat: expose service_tier in CompletionUsage from OpenAI Responses API (livekit#5341) feat: answering machine detection (livekit#4906) fix: wait_for_participant waits until participant is fully active (livekit#5271) (gemini realtime): add warnings in update_chat_ctx and update_instructions (livekit#5332) fix: convert oneOf to anyOf in strict schema for discriminated unions (livekit#5324) fix(voice): make function call history preservation configurable in AgentTask (livekit#5288)
* fix(voice): make function call history preservation configurable in AgentTask (livekit#5288) * fix: convert oneOf to anyOf in strict schema for discriminated unions (livekit#5324) * (gemini realtime): add warnings in update_chat_ctx and update_instructions (livekit#5332) * fix: wait_for_participant waits until participant is fully active (livekit#5271) * feat: answering machine detection (livekit#4906) Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> * feat: expose service_tier in CompletionUsage from OpenAI Responses API (livekit#5341) * fix: add PARTICIPANT_KIND_CONNECTOR to default participant kinds (livekit#5339) --------- Co-authored-by: Gopal Bagaswar <67310594+GopalGB@users.noreply.github.com> Co-authored-by: Long Chen <longch1024@gmail.com> Co-authored-by: Tina Nguyen <72938484+tinalenguyen@users.noreply.github.com> Co-authored-by: David Zhao <dz@livekit.io> Co-authored-by: Chenghao Mou <chenghao.mou@livekit.io> Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Piyush Gambhir <90608533+piyush-gambhir@users.noreply.github.com> Co-authored-by: Anunay Maheshwari <anunaym14@gmail.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
AMDResultwith categories:human,machine-ivr,machine-vm,machine-unavailable, anduncertainamd.execute()API for agents to await detection resultsexamples/telephony/amd.pyUsage