Skip to content

feat(stt): add diarization capabilities and speaker_id support#1267

Merged
toubatbrian merged 8 commits intomainfrom
claude/practical-archimedes-muwA8
Apr 17, 2026
Merged

feat(stt): add diarization capabilities and speaker_id support#1267
toubatbrian merged 8 commits intomainfrom
claude/practical-archimedes-muwA8

Conversation

@toubatbrian
Copy link
Copy Markdown
Contributor

Summary

Port of livekit/agents#5438 — adds STT diarization capability detection and speakerId passthrough from the Python agents framework.

What's ported

  • speakerId on TimedString (voice/io.ts): Added optional speakerId?: string | null field to the TimedString interface and createTimedString() factory, matching Python's TimedString.speaker_id.

  • speakerId on SpeechData (stt/stt.ts): Added optional speakerId?: string | null field to the SpeechData interface, so transcript events can carry speaker identity at the utterance level.

  • diarization on STTCapabilities (stt/stt.ts): Added optional diarization?: boolean capability flag. Also added a protected updateCapabilities() method on the base STT class so subclasses (like inference STT) can dynamically toggle capabilities after construction.

  • Diarization capability detection (inference/stt.ts): Ported the _DIARIZATION_EXTRA_KEYS / _diarization_enabled() pattern from Python. The inference STT now infers diarization: true from provider-specific modelOptions keys (diarize for Deepgram/xAI, speaker_labels for AssemblyAI) both at construction and when updateOptions() is called.

  • xAI STT model type in inference (inference/stt.ts): Added XaiSTTModels ('xai/stt-1') and XaiOptions interface with diarize, endpointing, format, and interim_results fields. Updated the STTOptions conditional type to resolve XaiOptions for xAI models.

  • speaker_id in wire protocol (inference/api_protos.ts): Added optional speaker_id field to the Zod schemas for sttWordSchema, sttInterimTranscriptEventSchema, and sttFinalTranscriptEventSchema.

  • speaker_id passthrough in processTranscript() (inference/stt.ts): The speech data builder now extracts speaker_id from both event-level and word-level server responses and maps them to speakerId on SpeechData and TimedString.

  • speaker_labels on AssemblyAIOptions: Added the missing speaker_labels boolean option.

Implementation nuances (Python → TypeScript)

Aspect Python TypeScript (this PR)
TimedString speaker_id: str | None attribute on a str subclass speakerId?: string | null field on the TimedString interface
Capability mutation self._capabilities = replace(self._capabilities, diarization=...) using dataclasses.replace() this.updateCapabilities({ diarization: ... }) — new protected method since #capabilities is a true private field inaccessible to subclasses
Type-safe options Python @overload per provider with TypedDict TypeScript conditional type STTOptions<TModel> that resolves per-model
Options merge in updateOptions self._opts.extra_kwargs.update(extra) (dict merge) { ...this.opts.modelOptions, ...opts.modelOptions } (spread merge)
Naming convention speaker_id (snake_case) speakerId (camelCase) on public interfaces; speaker_id preserved in wire protocol Zod schemas

What's NOT ported (Python-specific)

  • uv.lock changes (Python lockfile / new plugin registrations)
  • Python-specific @overload type stubs (TypeScript uses conditional types instead)
  • aiohttp.ClientSession typing (Node.js uses different HTTP primitives)

Test plan

  • All 42 inference STT tests pass (7 new diarization capability tests added)
  • API proto tests pass with updated Zod schemas
  • Build succeeds (pnpm build:agents)
  • ESLint passes on all changed files (0 errors)
  • Verify with a live xAI STT session that speakerId is populated when diarize: true
  • Verify with Deepgram that speakerId flows through when diarize: true

https://claude.ai/code/session_01VtE2b4qcjcN21cvDhsdcFo

Port of livekit/agents#5438 — adds STT diarization capability detection
and speaker_id passthrough from the Python agents framework.

https://claude.ai/code/session_01VtE2b4qcjcN21cvDhsdcFo
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 16, 2026

🦋 Changeset detected

Latest commit: 6398aef

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 25 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 16, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ toubatbrian
❌ claude
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown
Contributor Author

cc @toubatbrian @livekit/agent-devs for review — this is a port of livekit/agents#5438 (STT diarization capabilities + speaker_id on TimedString/SpeechData).


Generated by Claude Code

devin-ai-integration[bot]

This comment was marked as resolved.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

- Add `diarize?: boolean` to DeepgramOptions so typed users of
  STT<'deepgram/nova-3'> can enable diarization without type casts.
- Fix SpeechStream.updateOptions to merge modelOptions instead of
  overwriting the stream's local state, preserving prior values when
  callers update only a subset of keys.

https://claude.ai/code/session_01VtE2b4qcjcN21cvDhsdcFo
devin-ai-integration[bot]

This comment was marked as resolved.

@toubatbrian toubatbrian merged commit 3e67a90 into main Apr 17, 2026
8 of 9 checks passed
@toubatbrian toubatbrian deleted the claude/practical-archimedes-muwA8 branch April 17, 2026 04:11
@github-actions github-actions Bot mentioned this pull request Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants