fix(voice): allow awaiting speech handles inside function tools#1266
Merged
fix(voice): allow awaiting speech handles inside function tools#1266
Conversation
🦋 Changeset detectedLatest commit: da946e6 The changes in this PR will be included in the next version bump. This PR includes changesets to release 25 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
SpeechHandle.waitForPlayout() previously threw whenever any function tool was on the async stack, which blocked the valid pattern of awaiting a new handle (e.g. session.generateReply().waitForPlayout()) from inside a tool. Narrow the check to the owning handle only, so the real circular-wait is still caught but nested speech works — matching Python's behavior in speech_handle.py:156-182. Supporting changes: - functionCallStorage now carries the owning SpeechHandle so the owner check is possible (agent.ts, generation.ts). - Track handles whose generation has finished but whose tool execution is still running in AgentActivity._backgroundSpeeches, and propagate interrupt() to them — mirrors Python's _background_speeches and closes TODO(AJS-273).
Add a `.then` method so `await session.generateReply()` and
`await session.say('...')` resolve to the SpeechHandle once playout
finishes — matching Python's `__await__` behavior.
Avoids the Promise-assimilation recursion that would otherwise occur
from returning `this` (a thenable) by shadowing `.then` with an
own-property `undefined` for the duration of the synchronous
Resolve(this) check, then deleting the own property so the prototype
method is exposed again for subsequent direct `.then(cb)` calls and
re-awaits.
Narrow the callback parameter and default return type of SpeechHandle.then
to ResolvedSpeechHandle (= Omit<SpeechHandle, 'then'>). Without this,
TypeScript's Awaited<T> unwrap recurses through SpeechHandle's own .then
callback forever, emitting TS1062 at every `await handle` site. Omitting
`then` from the structural view terminates the unwrap because the
pattern object & { then(...) } no longer matches.
Regression tests verifying: - waitForPlayout() throws only when called on the handle that owns the active tool; awaiting a different (child) handle from inside a tool no longer throws. - SpeechHandle is awaitable and `await handle` resolves to the handle. - The prototype .then is restored after an await (direct .then(cb) calls still work; no leftover own-property shadow). - The previously-broken pattern of awaiting a child handle from inside a tool handler completes without deadlocking.
1bcfe0c to
19bab5b
Compare
toubatbrian
reviewed
Apr 16, 2026
toubatbrian
reviewed
Apr 16, 2026
toubatbrian
reviewed
Apr 16, 2026
toubatbrian
reviewed
Apr 16, 2026
Contributor
toubatbrian
left a comment
There was a problem hiding this comment.
Added some comments, otherwise LGTM!
Add SpeechHandleCircularWaitError and change waitForPlayout's return type to ThrowsPromise<void, SpeechHandleCircularWaitError>. Callers that .catch() the promise now get the error typed specifically instead of as a bare Error. Uses dedent for the multi-line error message. To let the ThrowsPromise narrow its rejection type to exactly SpeechHandleCircularWaitError, change doneFut to Future<void, never> (it is never rejected — only _markDone() resolves it). Also drop a handful of now-stale `// Ref: python ...` comments that were carried over from the initial port — the relationship to the Python source is documented on the methods' JSDoc already.
Revert waitForPlayout() back to async Promise<void> and throw SpeechHandleCircularWaitError directly. Also revert doneFut to Future<void> — the Future<void, never> narrowing was only useful to feed the ThrowsPromise rejection-type narrowing on waitForPlayout, which we're no longer doing. The SpeechHandleCircularWaitError class itself stays: callers can still catch it by type (via instanceof), we just don't propagate the error type through the promise signature.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes a deadlock/throw-on-valid-use in the voice pipeline: before this change,
SpeechHandle.waitForPlayout()threw a "circular wait" error whenever any function tool was on the async stack — even when the awaited handle was a different one scheduled from inside the tool. That blocked the common "announce-then-work" pattern:The throw was caught defensively on purpose — but the check was too broad. Python (
livekit-agents) narrows it to the SpeechHandle that owns the active tool, which correctly rejects the true circular wait while allowing nested handles. This PR ports that narrower check to JS and additionally makesSpeechHandledirectly awaitable soawait session.generateReply()(andsession.say()) match Python's idiom.Changes Made
speech_handle.ts: narrow thewaitForPlayout()throw to the owning SpeechHandle only (port oflivekit-agents/livekit/agents/voice/speech_handle.py:156-182).agent.ts/generation.ts: extendfunctionCallStorageto carry the owningSpeechHandleso the owner check can discriminate.speech_handle.ts: add.then()soSpeechHandleis awaitable (port of Python's__await__). Uses a shadow-and-restore trick on.thento let Promise assimilation fulfill withthiswithout infinite recursion; adds aResolvedSpeechHandle = Omit<SpeechHandle, 'then'>alias to keep TS'sAwaited<T>unwrap from hitting TS1062.agent_activity.ts: add_backgroundSpeechestracking for handles whose generation has finished but whose tool execution is still running; propagateinterrupt()to them (port of Python's_background_speeches; closesTODO(AJS-273)).speech_handle.test.tscovering the owner check, the awaitable protocol,.thenrestoration, and a tool-context scenario that used to deadlock.Pre-Review Checklist
Testing
speech_handle.test.ts(9 tests) covering the 4 behaviorsspeech_handle.test.tssuite passes; other failures seen locally were unrelated env issues — missing API keys in plugin tests)restaurant_agent.tsandrealtime_agent.tswork properly (for major changes)Additional manual verification: linked the local
@livekit/agentsintoagent-starter-nodeand exercised a tool that callsawait ctx.session.say(...).waitForPlayout()— previously threw, now plays the announcement, waits, and returns normally.Additional Notes
The Python reference for each structural change is called out inline via
// Ref: python ...comments. The.thenimplementation comments describe the shadow-vs-delete mechanism used to avoid.thenrecursion during Promise assimilation.Breaking changes: none at the API level.
await session.generateReply()previously returned the handle immediately without waiting; callers that depended on that fire-and-forget behavior will now block until playout completes. Easy workaround: drop theawait. This matches Python's behavior.Note to reviewers: Please ensure the pre-review checklist is completed before starting your review.