Skip to content

fix(voice): allow awaiting speech handles inside function tools#1266

Merged
u9g merged 7 commits intomainfrom
jl/speech-handle-awaitable-tool-fix
Apr 16, 2026
Merged

fix(voice): allow awaiting speech handles inside function tools#1266
u9g merged 7 commits intomainfrom
jl/speech-handle-awaitable-tool-fix

Conversation

@u9g
Copy link
Copy Markdown
Contributor

@u9g u9g commented Apr 16, 2026

Description

Fixes a deadlock/throw-on-valid-use in the voice pipeline: before this change, SpeechHandle.waitForPlayout() threw a "circular wait" error whenever any function tool was on the async stack — even when the awaited handle was a different one scheduled from inside the tool. That blocked the common "announce-then-work" pattern:

execute: async ({ location }, { ctx }) => {
  await ctx.session.say(`Checking ${location}...`).waitForPlayout();
  // ... do the work ...
},

The throw was caught defensively on purpose — but the check was too broad. Python (livekit-agents) narrows it to the SpeechHandle that owns the active tool, which correctly rejects the true circular wait while allowing nested handles. This PR ports that narrower check to JS and additionally makes SpeechHandle directly awaitable so await session.generateReply() (and session.say()) match Python's idiom.

Changes Made

  • speech_handle.ts: narrow the waitForPlayout() throw to the owning SpeechHandle only (port of livekit-agents/livekit/agents/voice/speech_handle.py:156-182).
  • agent.ts / generation.ts: extend functionCallStorage to carry the owning SpeechHandle so the owner check can discriminate.
  • speech_handle.ts: add .then() so SpeechHandle is awaitable (port of Python's __await__). Uses a shadow-and-restore trick on .then to let Promise assimilation fulfill with this without infinite recursion; adds a ResolvedSpeechHandle = Omit<SpeechHandle, 'then'> alias to keep TS's Awaited<T> unwrap from hitting TS1062.
  • agent_activity.ts: add _backgroundSpeeches tracking for handles whose generation has finished but whose tool execution is still running; propagate interrupt() to them (port of Python's _background_speeches; closes TODO(AJS-273)).
  • Regression tests in speech_handle.test.ts covering the owner check, the awaitable protocol, .then restoration, and a tool-context scenario that used to deadlock.

Pre-Review Checklist

  • Build passes: All builds (lint, typecheck, tests) pass locally
  • AI-generated code reviewed: Removed unnecessary comments and ensured code quality
  • Changes explained: All changes are properly documented and justified above
  • Scope appropriate: All changes relate to the PR title, or explanations provided for why they're included

Testing

  • Automated tests added/updated (if applicable) — new speech_handle.test.ts (9 tests) covering the 4 behaviors
  • All tests pass (the speech_handle.test.ts suite passes; other failures seen locally were unrelated env issues — missing API keys in plugin tests)
  • Make sure both restaurant_agent.ts and realtime_agent.ts work properly (for major changes)

Additional manual verification: linked the local @livekit/agents into agent-starter-node and exercised a tool that calls await ctx.session.say(...).waitForPlayout() — previously threw, now plays the announcement, waits, and returns normally.

Additional Notes

The Python reference for each structural change is called out inline via // Ref: python ... comments. The .then implementation comments describe the shadow-vs-delete mechanism used to avoid .then recursion during Promise assimilation.

Breaking changes: none at the API level. await session.generateReply() previously returned the handle immediately without waiting; callers that depended on that fire-and-forget behavior will now block until playout completes. Easy workaround: drop the await. This matches Python's behavior.


Note to reviewers: Please ensure the pre-review checklist is completed before starting your review.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 16, 2026

🦋 Changeset detected

Latest commit: da946e6

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 25 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@theomonnom theomonnom requested a review from toubatbrian April 16, 2026 17:59
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

u9g added 5 commits April 16, 2026 16:58
SpeechHandle.waitForPlayout() previously threw whenever any function
tool was on the async stack, which blocked the valid pattern of
awaiting a new handle (e.g. session.generateReply().waitForPlayout())
from inside a tool. Narrow the check to the owning handle only, so the
real circular-wait is still caught but nested speech works — matching
Python's behavior in speech_handle.py:156-182.

Supporting changes:
- functionCallStorage now carries the owning SpeechHandle so the owner
  check is possible (agent.ts, generation.ts).
- Track handles whose generation has finished but whose tool execution
  is still running in AgentActivity._backgroundSpeeches, and propagate
  interrupt() to them — mirrors Python's _background_speeches and
  closes TODO(AJS-273).
Add a `.then` method so `await session.generateReply()` and
`await session.say('...')` resolve to the SpeechHandle once playout
finishes — matching Python's `__await__` behavior.

Avoids the Promise-assimilation recursion that would otherwise occur
from returning `this` (a thenable) by shadowing `.then` with an
own-property `undefined` for the duration of the synchronous
Resolve(this) check, then deleting the own property so the prototype
method is exposed again for subsequent direct `.then(cb)` calls and
re-awaits.
Narrow the callback parameter and default return type of SpeechHandle.then
to ResolvedSpeechHandle (= Omit<SpeechHandle, 'then'>). Without this,
TypeScript's Awaited<T> unwrap recurses through SpeechHandle's own .then
callback forever, emitting TS1062 at every `await handle` site. Omitting
`then` from the structural view terminates the unwrap because the
pattern object & { then(...) } no longer matches.
Regression tests verifying:
- waitForPlayout() throws only when called on the handle that owns the
  active tool; awaiting a different (child) handle from inside a tool
  no longer throws.
- SpeechHandle is awaitable and `await handle` resolves to the handle.
- The prototype .then is restored after an await (direct .then(cb) calls
  still work; no leftover own-property shadow).
- The previously-broken pattern of awaiting a child handle from inside
  a tool handler completes without deadlocking.
@u9g u9g force-pushed the jl/speech-handle-awaitable-tool-fix branch from 1bcfe0c to 19bab5b Compare April 16, 2026 20:59
devin-ai-integration[bot]

This comment was marked as resolved.

Comment thread agents/src/voice/speech_handle.ts
Comment thread agents/src/voice/speech_handle.ts Outdated
Comment thread agents/src/voice/speech_handle.ts
Copy link
Copy Markdown
Contributor

@toubatbrian toubatbrian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, otherwise LGTM!

Add SpeechHandleCircularWaitError and change waitForPlayout's return
type to ThrowsPromise<void, SpeechHandleCircularWaitError>. Callers
that .catch() the promise now get the error typed specifically instead
of as a bare Error. Uses dedent for the multi-line error message.

To let the ThrowsPromise narrow its rejection type to exactly
SpeechHandleCircularWaitError, change doneFut to Future<void, never>
(it is never rejected — only _markDone() resolves it).

Also drop a handful of now-stale `// Ref: python ...` comments that
were carried over from the initial port — the relationship to the
Python source is documented on the methods' JSDoc already.
Copy link
Copy Markdown
Contributor

@toubatbrian toubatbrian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Revert waitForPlayout() back to async Promise<void> and throw
SpeechHandleCircularWaitError directly. Also revert doneFut to
Future<void> — the Future<void, never> narrowing was only useful to
feed the ThrowsPromise rejection-type narrowing on waitForPlayout,
which we're no longer doing.

The SpeechHandleCircularWaitError class itself stays: callers can
still catch it by type (via instanceof), we just don't propagate the
error type through the promise signature.
@u9g u9g merged commit df8d0ca into main Apr 16, 2026
8 of 9 checks passed
@u9g u9g deleted the jl/speech-handle-awaitable-tool-fix branch April 16, 2026 23:04
@github-actions github-actions Bot mentioned this pull request Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants