Fix Workers AI voice STT turn edge cases#1458
Merged
Merged
Conversation
🦋 Changeset detectedLatest commit: 8637f6d The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
agents
@cloudflare/ai-chat
@cloudflare/codemode
hono-agents
@cloudflare/shell
@cloudflare/think
@cloudflare/voice
@cloudflare/worker-bundler
commit: |
Merged
threepointone
added a commit
that referenced
this pull request
May 11, 2026
Co-authored-by: Cursor <cursoragent@cursor.com>
threepointone
added a commit
that referenced
this pull request
May 11, 2026
Co-authored-by: Cursor <cursoragent@cursor.com>
threepointone
added a commit
that referenced
this pull request
May 11, 2026
* fix(voice): harden Workers AI STT turn handling Follow up on PR #1458 by preserving Flux turn transcripts across lifecycle events and using model-detected speech start for low-latency barge-in. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(chat): harden stream resume negotiation close races Follow-up to PR #1463: route stream-resume negotiation sends through close-safe helpers so WebSocket close races do not crash resume handling in think and ai-chat. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(voice): parse raw NDJSON text streams Follow-up to PR #1462: make the voice text stream parser honor its documented NDJSON support while preserving SSE parsing for AI text streams. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(agents): defer recovered agent-tool finish hooks (#1476) Co-authored-by: Cursor <cursoragent@cursor.com> * test(voice): cover useVoiceAgent enabled lifecycle (#1478) Co-authored-by: Cursor <cursoragent@cursor.com> * fix(ai-chat): close resumed streams on disconnect (#1487) Co-authored-by: Cursor <cursoragent@cursor.com> * fix(voice): invalidate playback on client interrupt (#1458) Co-authored-by: Cursor <cursoragent@cursor.com> * fix(voice): invalidate playback when ending calls (#1458) Co-authored-by: Cursor <cursoragent@cursor.com> * Run deferred finish hooks after successful startup Ensure recovered agent-tool finish hooks are only executed after a successful user onStart. Await _runDeferredAgentToolFinishHooks inside the onStart flow so deferred finishes are skipped when startup fails. Add a test and helper (reconcileCompletedChildWithFailedStartupForTest) to verify finish hooks are not run on failed startup and to cover lifecycle ordering and event emission. --------- Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two Workers AI STT edge cases reported in #1454 for
@cloudflare/voice:Update/EagerEndOfTurn, then sendEndOfTurnwith an emptytranscript. We now preserve the latest non-empty interim transcript for the active turn and use it as the final utterance fallback.Resultsmessages around abnormal close/teardown. We now defensively normalize finalized segment state before reading it so late messages cannot throw while handlingResults.Root cause
FluxSessiononly calledonUtterancewhen theEndOfTurnevent itself carried text. In real Flux streams, the last transcript may already have arrived viaUpdateorEagerEndOfTurn, withEndOfTurn.transcript === "", so the SDK silently dropped the user utterance and the voice pipeline never invokedonTurn.Nova3Sessionassumed#finalizedSegmentswas always initialized when handlingResults. The field is initialized in normal construction, but the reported stale-message/teardown path can observe it as unavailable, causing.join()/.lengthreads to throw from the message handler.Changes
FluxSession's current turn transcript from non-emptyUpdateandEagerEndOfTurnevents.EndOfTurn, emitEndOfTurn.transcript || currentTranscriptand clear turn state.StartOfTurn, completedEndOfTurn, andTurnResumedto avoid stale utterances.@cloudflare/voice.Testing
npx vitest --run packages/voice/src/tests/workers-ai-providers.test.ts --config packages/voice/src/tests/vitest.config.tsnpm run build -w @cloudflare/voicenpx oxlint packages/voice/src/workers-ai-providers.ts packages/voice/src/tests/workers-ai-providers.test.tsnpx oxfmt --check packages/voice/src/workers-ai-providers.ts packages/voice/src/tests/workers-ai-providers.test.ts .changeset/fix-voice-stt-turns.mdAlso attempted
npm run check; it fails before lint/typecheck on existing export-check issues unrelated to this PR:agentsmissing severaldist/*export files@cloudflare/thinkmissing severaldist/*export filesAlso attempted
npm test -w @cloudflare/voice; package worker tests passedFixes #1454