feat(stt): add diarization capabilities and speaker_id support by toubatbrian · Pull Request #1267 · livekit/agents-js

toubatbrian · 2026-04-16T21:01:33Z

Summary

Port of livekit/agents#5438 — adds STT diarization capability detection and speakerId passthrough from the Python agents framework.

What's ported

speakerId on TimedString (voice/io.ts): Added optional speakerId?: string | null field to the TimedString interface and createTimedString() factory, matching Python's TimedString.speaker_id.
speakerId on SpeechData (stt/stt.ts): Added optional speakerId?: string | null field to the SpeechData interface, so transcript events can carry speaker identity at the utterance level.
diarization on STTCapabilities (stt/stt.ts): Added optional diarization?: boolean capability flag. Also added a protected updateCapabilities() method on the base STT class so subclasses (like inference STT) can dynamically toggle capabilities after construction.
Diarization capability detection (inference/stt.ts): Ported the _DIARIZATION_EXTRA_KEYS / _diarization_enabled() pattern from Python. The inference STT now infers diarization: true from provider-specific modelOptions keys (diarize for Deepgram/xAI, speaker_labels for AssemblyAI) both at construction and when updateOptions() is called.
xAI STT model type in inference (inference/stt.ts): Added XaiSTTModels ('xai/stt-1') and XaiOptions interface with diarize, endpointing, format, and interim_results fields. Updated the STTOptions conditional type to resolve XaiOptions for xAI models.
speaker_id in wire protocol (inference/api_protos.ts): Added optional speaker_id field to the Zod schemas for sttWordSchema, sttInterimTranscriptEventSchema, and sttFinalTranscriptEventSchema.
speaker_id passthrough in processTranscript() (inference/stt.ts): The speech data builder now extracts speaker_id from both event-level and word-level server responses and maps them to speakerId on SpeechData and TimedString.
speaker_labels on AssemblyAIOptions: Added the missing speaker_labels boolean option.

Implementation nuances (Python → TypeScript)

Aspect	Python	TypeScript (this PR)
TimedString	`speaker_id: str \| None` attribute on a `str` subclass	`speakerId?: string \| null` field on the `TimedString` interface
Capability mutation	`self._capabilities = replace(self._capabilities, diarization=...)` using `dataclasses.replace()`	`this.updateCapabilities({ diarization: ... })` — new protected method since `#capabilities` is a true private field inaccessible to subclasses
Type-safe options	Python `@overload` per provider with `TypedDict`	TypeScript conditional type `STTOptions<TModel>` that resolves per-model
Options merge in `updateOptions`	`self._opts.extra_kwargs.update(extra)` (dict merge)	`{ ...this.opts.modelOptions, ...opts.modelOptions }` (spread merge)
Naming convention	`speaker_id` (snake_case)	`speakerId` (camelCase) on public interfaces; `speaker_id` preserved in wire protocol Zod schemas

What's NOT ported (Python-specific)

uv.lock changes (Python lockfile / new plugin registrations)
Python-specific @overload type stubs (TypeScript uses conditional types instead)
aiohttp.ClientSession typing (Node.js uses different HTTP primitives)

Test plan

All 42 inference STT tests pass (7 new diarization capability tests added)
API proto tests pass with updated Zod schemas
Build succeeds (pnpm build:agents)
ESLint passes on all changed files (0 errors)
Verify with a live xAI STT session that speakerId is populated when diarize: true
Verify with Deepgram that speakerId flows through when diarize: true

https://claude.ai/code/session_01VtE2b4qcjcN21cvDhsdcFo

Port of livekit/agents#5438 — adds STT diarization capability detection and speaker_id passthrough from the Python agents framework. https://claude.ai/code/session_01VtE2b4qcjcN21cvDhsdcFo

changeset-bot · 2026-04-16T21:01:38Z

🦋 Changeset detected

Latest commit: 6398aef

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 25 packages

Name	Type
@livekit/agents	Patch
@livekit/agents-plugin-anam	Patch
@livekit/agents-plugin-assemblyai	Patch
@livekit/agents-plugin-baseten	Patch
@livekit/agents-plugin-bey	Patch
@livekit/agents-plugin-cartesia	Patch
@livekit/agents-plugin-cerebras	Patch
@livekit/agents-plugin-deepgram	Patch
@livekit/agents-plugin-elevenlabs	Patch
@livekit/agents-plugin-google	Patch
@livekit/agents-plugin-hedra	Patch
@livekit/agents-plugin-inworld	Patch
@livekit/agents-plugin-lemonslice	Patch
@livekit/agents-plugin-livekit	Patch
@livekit/agents-plugin-mistral	Patch
@livekit/agents-plugin-neuphonic	Patch
@livekit/agents-plugin-openai	Patch
@livekit/agents-plugin-phonic	Patch
@livekit/agents-plugin-resemble	Patch
@livekit/agents-plugin-rime	Patch
@livekit/agents-plugin-sarvam	Patch
@livekit/agents-plugin-silero	Patch
@livekit/agents-plugins-test	Patch
@livekit/agents-plugin-trugen	Patch
@livekit/agents-plugin-xai	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

CLAassistant · 2026-04-16T21:01:41Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ toubatbrian
❌ claude
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

toubatbrian · 2026-04-16T21:01:49Z

cc @toubatbrian @livekit/agent-devs for review — this is a port of livekit/agents#5438 (STT diarization capabilities + speaker_id on TimedString/SpeechData).

Generated by Claude Code

- Add `diarize?: boolean` to DeepgramOptions so typed users of STT<'deepgram/nova-3'> can enable diarization without type casts. - Fix SpeechStream.updateOptions to merge modelOptions instead of overwriting the stream's local state, preserving prior values when callers update only a subset of keys. https://claude.ai/code/session_01VtE2b4qcjcN21cvDhsdcFo

Per CLAUDE.md porting rules, every JS change corresponding to a Python change must carry an inline // Ref comment. https://claude.ai/code/session_01VtE2b4qcjcN21cvDhsdcFo

feat(stt): add diarization capabilities and speaker_id support

6aa99cc

Port of livekit/agents#5438 — adds STT diarization capability detection and speaker_id passthrough from the Python agents framework. https://claude.ai/code/session_01VtE2b4qcjcN21cvDhsdcFo

This comment was marked as resolved.

Sign in to view

claude and others added 6 commits April 16, 2026 23:35

style: add missing // Ref comments for Python port traceability

86b763d

Per CLAUDE.md porting rules, every JS change corresponding to a Python change must carry an inline // Ref comment. https://claude.ai/code/session_01VtE2b4qcjcN21cvDhsdcFo

Update stt.ts

7da67d6

remove ref comments

95cbfea

Create seven-fans-exist.md

e5d45af

record speakerId

ed7ab3f

Merge branch 'main' into claude/practical-archimedes-muwA8

6398aef

theomonnom approved these changes Apr 17, 2026

View reviewed changes

toubatbrian merged commit 3e67a90 into main Apr 17, 2026
8 of 9 checks passed

toubatbrian deleted the claude/practical-archimedes-muwA8 branch April 17, 2026 04:11

github-actions Bot mentioned this pull request Apr 17, 2026

Version Packages #1271

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(stt): add diarization capabilities and speaker_id support#1267

feat(stt): add diarization capabilities and speaker_id support#1267
toubatbrian merged 8 commits intomainfrom
claude/practical-archimedes-muwA8

toubatbrian commented Apr 16, 2026

Uh oh!

changeset-bot Bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Apr 16, 2026 •

edited

Loading

Uh oh!

toubatbrian commented Apr 16, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

toubatbrian commented Apr 16, 2026

Summary

What's ported

Implementation nuances (Python → TypeScript)

What's NOT ported (Python-specific)

Test plan

Uh oh!

changeset-bot Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

CLAassistant commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

toubatbrian commented Apr 16, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

changeset-bot Bot commented Apr 16, 2026 •

edited

Loading

CLAassistant commented Apr 16, 2026 •

edited

Loading