feat(deepgram): port flux-general-multi STTv2 support from python#1275
feat(deepgram): port flux-general-multi STTv2 support from python#1275tinalenguyen merged 3 commits intomainfrom
Conversation
Port of livekit/agents#5486. - Add `flux-general-multi` to Deepgram V2Models. - Add `languageHint` option on STTv2 to bias the multi-language model (ignored with a warning when used with a non-multi model). - Propagate detected languages from Deepgram responses into `SpeechData.sourceLanguages`; set the dominant detected language as the primary `language` on each transcript alternative. - Add `sourceLanguages?: LanguageCode[]` to core `SpeechData`, mirroring the Python `source_languages` field used by translation-capable and multi-language-detection STT providers.
🦋 Changeset detectedLatest commit: 11dd913 The changes in this PR will be included in the next version bump. This PR includes changesets to release 26 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e18df61759
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Summary
Ports livekit/agents#5486 ("(deepgram sttv2): add flux-general-multi support") from the Python
livekit-agentsrepo intoagents-js.This PR adds support for Deepgram's new
flux-general-multiSTTv2 model, which performs multi-language detection over an utterance. When this model is active, Deepgram returns alanguagesarray on each transcript message, ordered by prevalence. The most-prevalent language is exposed as the primarylanguage, and the full list is exposed on a newsourceLanguagesfield.cc @toubatbrian @livekit/agent-devs for review.
Ported features
1.
flux-general-multimodel (plugins/deepgram/src/models.ts)Extended the
V2Modelsunion:2.
languageHintoption onSTTv2(plugins/deepgram/src/stt_v2.ts)New optional
languageHint: string[]option onSTTv2Options, serialized into the Deepgram websocket URL as thelanguage_hintquery parameter. This option is only meaningful withflux-general-multi; setting it on any other model logs a warning (matching Python'slogger.warning(...)behavior). The same warning path is wired intoupdateOptionsso late model/hint changes still surface mismatches.3.
source_languages→sourceLanguagesonSpeechData(agents/src/stt/stt.ts)New optional field on the core
SpeechDatainterface:The docstring mirrors the updated Python docstring: the field is populated either by translation-capable STT services (where
languageis the target andsourceLanguagescarries the original spoken language(s)) or by multi-language detection services (wherelanguageis the dominant detected language andsourceLanguagescarries all detected languages sorted by prevalence).4.
parseTranscriptionupdatesThe Deepgram STTv2 parser now reads
data.languages, runs each entry throughnormalizeLanguage, and:SpeechData.languageto the dominant (first) detected language when the array is non-empty, falling back to the stream's configuredlanguageotherwise.SpeechData.sourceLanguagesto the full normalized list (or leaves itundefinedwhen the field is absent).Implementation nuances vs Python
language_hint(snake_case) maps to JS'slanguageHint(camelCase); the wire parameter stayslanguage_hint. Same forsource_languages→sourceLanguages.LanguageCode(...)values from Deepgram. The JS plugin has always run language strings throughnormalizeLanguage(...)before putting them onSpeechData.language(this matches the existing pattern used elsewhere in the Deepgram plugin and the wider codebase), so the port normalizes each entry of thelanguagesarray as well.%spositional formatting; JS uses pino-style structured logging ({ model: ... }, 'message'). The warning content is preserved verbatim.source_texts: The Python diff updates the docstring forSpeechData.source_textstoo, but the JSSpeechDatahas never had that field (it was never ported with the translation feature) so nothing to change here. This is noted so we don't silently drift — we can portsourceTextsin a follow-up if/when translation STT lands in JS.stt.py: The Python PR also tweaks thesource_languagesdocstring in corestt.py. Those clarifications are inlined into the JSDoc on the new JS field.Files changed
agents/src/stt/stt.tssourceLanguages?: LanguageCode[]toSpeechDatawith docstring matching Pythonplugins/deepgram/src/models.tsflux-general-multitoV2Modelsplugins/deepgram/src/stt_v2.tslanguageHintoption, wire it into the websocket URL, warn on model mismatch in constructor +updateOptions, populatesourceLanguagesand override primarylanguagefrom Deepgram'slanguagesarray.changeset/deepgram-flux-general-multi.mdminorchangeset for@livekit/agents-plugin-deepgramand@livekit/agentsTest plan
pnpm build:agentssucceedspnpm --filter "@livekit/agents-plugin-deepgram..." buildsucceedspnpm format:checkpassesrestaurant_agent.tswithnew STTv2({ model: 'flux-general-multi', languageHint: ['en', 'es'] })and verifysourceLanguagesis populated onFINAL_TRANSCRIPTevents when switching languages mid-utterancemodel: 'flux-general-en'+languageHintthe warning is emitted and the stream still succeeds (hint ignored)This PR was created by an automated Claude Code Routine maintained by @toubatbrian. The routine is currently in experimentation stage.