Skip to content

Fix Workers AI voice STT turn edge cases#1458

Merged
whoiskatrin merged 1 commit into
mainfrom
fix/voice-workers-ai-stt-turns
May 4, 2026
Merged

Fix Workers AI voice STT turn edge cases#1458
whoiskatrin merged 1 commit into
mainfrom
fix/voice-workers-ai-stt-turns

Conversation

@whoiskatrin
Copy link
Copy Markdown
Contributor

@whoiskatrin whoiskatrin commented May 4, 2026

Summary

Fixes two Workers AI STT edge cases reported in #1454 for @cloudflare/voice:

  • Flux can send the final recognized text in Update / EagerEndOfTurn, then send EndOfTurn with an empty transcript. We now preserve the latest non-empty interim transcript for the active turn and use it as the final utterance fallback.
  • Nova 3 can deliver stale Results messages around abnormal close/teardown. We now defensively normalize finalized segment state before reading it so late messages cannot throw while handling Results.

Root cause

FluxSession only called onUtterance when the EndOfTurn event itself carried text. In real Flux streams, the last transcript may already have arrived via Update or EagerEndOfTurn, with EndOfTurn.transcript === "", so the SDK silently dropped the user utterance and the voice pipeline never invoked onTurn.

Nova3Session assumed #finalizedSegments was always initialized when handling Results. The field is initialized in normal construction, but the reported stale-message/teardown path can observe it as unavailable, causing .join() / .length reads to throw from the message handler.

Changes

  • Track FluxSession's current turn transcript from non-empty Update and EagerEndOfTurn events.
  • On Flux EndOfTurn, emit EndOfTurn.transcript || currentTranscript and clear turn state.
  • Clear Flux turn state on StartOfTurn, completed EndOfTurn, and TurnResumed to avoid stale utterances.
  • Guard Nova 3 finalized segment reads with defensive re-initialization/nullish reads.
  • Add focused Workers AI provider tests using mock WebSockets.
  • Add a patch changeset for @cloudflare/voice.

Testing

  • npx vitest --run packages/voice/src/tests/workers-ai-providers.test.ts --config packages/voice/src/tests/vitest.config.ts
  • npm run build -w @cloudflare/voice
  • npx oxlint packages/voice/src/workers-ai-providers.ts packages/voice/src/tests/workers-ai-providers.test.ts
  • npx oxfmt --check packages/voice/src/workers-ai-providers.ts packages/voice/src/tests/workers-ai-providers.test.ts .changeset/fix-voice-stt-turns.md

Also attempted npm run check; it fails before lint/typecheck on existing export-check issues unrelated to this PR:

  • agents missing several dist/* export files
  • @cloudflare/think missing several dist/* export files

Also attempted npm test -w @cloudflare/voice; package worker tests passed

Fixes #1454

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 4, 2026

🦋 Changeset detected

Latest commit: 8637f6d

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@cloudflare/voice Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 4, 2026

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1458

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1458

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1458

hono-agents

npm i https://pkg.pr.new/hono-agents@1458

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1458

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1458

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1458

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1458

commit: 8637f6d

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

@whoiskatrin whoiskatrin marked this pull request as ready for review May 4, 2026 19:57
@whoiskatrin whoiskatrin requested a review from mattzcarey May 4, 2026 19:57
@whoiskatrin whoiskatrin merged commit 84cb429 into main May 4, 2026
2 checks passed
@whoiskatrin whoiskatrin deleted the fix/voice-workers-ai-stt-turns branch May 4, 2026 20:17
@github-actions github-actions Bot mentioned this pull request May 4, 2026
threepointone added a commit that referenced this pull request May 11, 2026
Co-authored-by: Cursor <cursoragent@cursor.com>
threepointone added a commit that referenced this pull request May 11, 2026
Co-authored-by: Cursor <cursoragent@cursor.com>
threepointone added a commit that referenced this pull request May 11, 2026
* fix(voice): harden Workers AI STT turn handling

Follow up on PR #1458 by preserving Flux turn transcripts across lifecycle events and using model-detected speech start for low-latency barge-in.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(chat): harden stream resume negotiation close races

Follow-up to PR #1463: route stream-resume negotiation sends through close-safe helpers so WebSocket close races do not crash resume handling in think and ai-chat.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(voice): parse raw NDJSON text streams

Follow-up to PR #1462: make the voice text stream parser honor its documented NDJSON support while preserving SSE parsing for AI text streams.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(agents): defer recovered agent-tool finish hooks (#1476)

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(voice): cover useVoiceAgent enabled lifecycle (#1478)

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(ai-chat): close resumed streams on disconnect (#1487)

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(voice): invalidate playback on client interrupt (#1458)

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(voice): invalidate playback when ending calls (#1458)

Co-authored-by: Cursor <cursoragent@cursor.com>

* Run deferred finish hooks after successful startup

Ensure recovered agent-tool finish hooks are only executed after a successful user onStart. Await _runDeferredAgentToolFinishHooks inside the onStart flow so deferred finishes are skipped when startup fails. Add a test and helper (reconcileCompletedChildWithFailedStartupForTest) to verify finish hooks are not run on failed startup and to cover lifecycle ordering and event emission.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: @cloudflare/voice Nova3Session & FluxSession drop empty-transcript utterances (breaks real-world talkback)

1 participant