Skip to content

feat(stt): add speaker diarization support to STT interface and proxy#5283

Closed
russellmartin-livekit wants to merge 0 commit intomainfrom
claude/slack-support-diarization-stt-providers-cWpcE
Closed

feat(stt): add speaker diarization support to STT interface and proxy#5283
russellmartin-livekit wants to merge 0 commit intomainfrom
claude/slack-support-diarization-stt-providers-cWpcE

Conversation

@russellmartin-livekit
Copy link
Copy Markdown
Contributor

@russellmartin-livekit russellmartin-livekit commented Mar 30, 2026

Related gateway change: https://github.com/livekit/agent-gateway/pull/557

Changes in agents

  • Options: Updates DeepgramOptions and AssemblyaiOptions to explicitly type diarization flags (diarize and speaker_labels).
  • Capabilities: Modifies the STT inference wrapper to dynamically set the diarization capability based on the provided extra_kwargs during initialization and update_options.
  • Data Models: Adds speaker_id to TimedString and populates it from the inference proxy's response.
  • Testing: Adds tests to verify diarization capability detection.

Fixes AGT-2608

Slack thread: https://live-kit.slack.com/archives/C06TN33TV44/p1772573869144129?thread_ts=1771977322.899519&cid=C06TN33TV44

https://claude.ai/code/session_01VRKQuBXiq8BHKr9AiJ6uEw

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 30, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
8 out of 9 committers have signed the CLA.

✅ Panmax
✅ rililinx
✅ adrian-cowham
✅ realgarik
✅ longcw
✅ piyush-gambhir
✅ russellmartin-livekit
✅ dhruvladia-sarvam
❌ claude
You have signed the CLA already but the status is still pending? Let us recheck it.

devin-ai-integration[bot]

This comment was marked as resolved.

Comment thread tests/test_inference_stt_fallback.py Outdated
Comment on lines +262 to +307
class TestSTTDiarizationCapabilities:
"""Tests for STT diarization capability detection from extra_kwargs."""

def test_no_diarization_by_default(self):
"""Without diarization params, capabilities.diarization is False."""
stt = _make_stt()
assert stt.capabilities.diarization is False

def test_diarization_enabled_with_deepgram_diarize(self):
"""Deepgram's 'diarize' param enables diarization capability."""
stt = _make_stt(extra_kwargs={"diarize": True})
assert stt.capabilities.diarization is True

def test_diarization_disabled_with_diarize_false(self):
"""Deepgram's 'diarize: False' keeps diarization capability False."""
stt = _make_stt(extra_kwargs={"diarize": False})
assert stt.capabilities.diarization is False

def test_diarization_enabled_with_assemblyai_speaker_labels(self):
"""AssemblyAI's 'speaker_labels' param enables diarization capability."""
stt = _make_stt(model="assemblyai/universal-streaming", extra_kwargs={"speaker_labels": True})
assert stt.capabilities.diarization is True

def test_diarization_disabled_with_speaker_labels_false(self):
"""AssemblyAI's 'speaker_labels: False' keeps diarization capability False."""
stt = _make_stt(model="assemblyai/universal-streaming", extra_kwargs={"speaker_labels": False})
assert stt.capabilities.diarization is False

def test_diarization_with_other_extra_kwargs(self):
"""Diarization works alongside other extra_kwargs."""
stt = _make_stt(extra_kwargs={"diarize": True, "punctuate": True, "smart_format": True})
assert stt.capabilities.diarization is True

def test_update_options_enables_diarization(self):
"""update_options with diarization params enables diarization capability."""
stt = _make_stt()
assert stt.capabilities.diarization is False
stt.update_options(extra={"diarize": True})
assert stt.capabilities.diarization is True

def test_update_options_disables_diarization(self):
"""update_options can disable diarization by setting params to False."""
stt = _make_stt(extra_kwargs={"diarize": True})
assert stt.capabilities.diarization is True
stt.update_options(extra={"diarize": False})
assert stt.capabilities.diarization is False
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tests aren't really useful?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah its overkill, removed most of them

end_time=self.start_time_offset + data.get("start", 0) + data.get("duration", 0),
confidence=data.get("confidence", 1.0),
text=text,
speaker_id=f"S{speaker}" if speaker is not None else None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's tricky with diarization in our inference API. Is that we need a way for the speaker_id to be consistent across every provider.

Maybe some of them aren't even int, and str?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could standardize the inference side or I could just pass through whatever we get from the provider

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the gateway should standardize it.
It would be OK if it was extra, but everything inside the core "inference" API should be the same across every provider

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, my latest changes standardizes it on the gateway size

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standardize on gateway side to return string ints so it stays compatible with plugins that require strings. Also added support for word level diarization

devin-ai-integration[bot]

This comment was marked as resolved.

@russellmartin-livekit russellmartin-livekit changed the title fix(inference): set STT capabilities.diarization from extra_kwargs feat(stt): add speaker diarization support to STT interface and proxy Apr 4, 2026
@russellmartin-livekit russellmartin-livekit requested review from a team and theomonnom April 7, 2026 16:45
@russellmartin-livekit russellmartin-livekit force-pushed the claude/slack-support-diarization-stt-providers-cWpcE branch from df7684b to b991e5a Compare April 13, 2026 22:50
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 18 additional findings in Devin Review.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants