feat(voice): add voice provider registry with third-party STT/TTS support#2586
Conversation
…port Mirror the LLM inference provider pattern for voice: unified registry, slug:model routing grammar, per-provider API keys via auth-profiles.json, and a chip-toggle + modal UI matching AIPanel. Rust core: - VoiceProviderCreds, VoiceCapability, SttApiStyle/TtsApiStyle types - voice_providers, stt_provider, tts_provider fields on Config - ExternalSttProvider (OpenAI-compat + Deepgram) / ExternalTtsProvider (OpenAI-compat + ElevenLabs) with slug:model factory dispatch - effective_stt_provider / effective_tts_provider with legacy fallback - voice_update_provider_settings, voice_list_models, voice_test_provider RPCs - 240 voice tests passing (16 new) Frontend: - voiceSettingsApi.ts: parse/serialize, load/save, list/test, key management - VoicePanel: chip toggles (Cloud/Whisper/Piper/Deepgram/ElevenLabs/OpenAI), modal dialogs for API keys + local installs, routing dropdowns - 31 TS round-trip tests Built-in slugs: deepgram (STT), elevenlabs (TTS), openai (both).
- Add "Test STT" and "Test TTS" buttons in the routing section with inline result display - Fix ElevenLabs test: validate API key via GET /voices instead of synthesizing audio (which requires a valid voice ID in the URL) - Fix empty endpoint URL when adding built-in providers (deepgram, elevenlabs, openai) — populate from metadata - Swap Cancel/Test button positions in the provider modal - Rename chip labels to "Whisper (Local)" and "Piper (Local)"
Remove Whisper/Piper install buttons from the Voice Routing section — installation is now handled in the modal that opens when toggling a local provider on. Keep model/voice pickers in routing for all providers.
ElevenLabs requires a valid voice ID in the API path. Set the built-in default to George (JBFqnCBsd6RMkjVDRZzb) so test and synthesis calls work without the user manually picking a voice first.
The config_get_client_config RPC was missing voice_providers, stt_provider, and tts_provider fields. loadVoiceSettings() read them as undefined, then the 2s poll loop overwrote local state — causing the chip to toggle off immediately after saving. Also add debug logging to load/save paths for diagnosis.
Root cause: the 2s poll seeded stt/tts routing from voice_status (which reads legacy local_ai.* fields) and mapped any non-whisper/piper value to 'cloud' — wiping external providers like elevenlabs. Fixes: - Seed routing dropdowns from loadVoiceSettings() (reads top-level config fields via client_config) instead of voice_status - Sync voice_update_provider_settings writes to local_ai.stt_provider / local_ai.tts_provider so voice_status and dispatch RPCs stay in sync - Fall back to voice_status seeding only when loadVoiceSettings fails (older cores without the new RPC)
Update capability from tts-only to both. Default STT model: scribe_v1.
Show a dropdown of curated ElevenLabs voices (from elevenlabsVoicePresets.ts) in the TTS routing section when ElevenLabs is selected, plus an "Other…" free-text input for custom voice IDs. Also widen stt/tts provider state types to accept any slug string (not just cloud/whisper/piper).
GET /voices requires voices_read scope which restricted API keys lack. Switch to GET /user/subscription which only needs basic auth.
Add validate_only flag to voice_test_provider RPC: - Modal "Test Key" button: validate_only=true → hits /user/subscription (ElevenLabs) or /models (OpenAI) to verify credentials - Routing "Test TTS" button: validate_only=false → actually synthesizes "Hello" with the selected voice ID (e.g. elevenlabs:JBFqnCBsd6RMkjVDRZzb)
When Piper fails with a dylib load error for libespeak-ng, surface a clear message: "Run: brew install espeak-ng" instead of the raw dyld error.
…sages User-facing test results now show clean messages like "piper requires espeak-ng which is not installed. Run: brew install espeak-ng" instead of "[voice-tts] piper requires espeak-ng..."
…stered If config.stt_provider or tts_provider points to a slug that was removed from voice_providers, the routing dropdown now falls back to 'cloud' instead of keeping the stale slug (which would cause "no voice provider with slug 'X' found" on test/dispatch).
The previous seed-once pattern (prev || sttStr) left stale slugs in state when a provider was removed. Now the poll always overwrites the routing dropdown state from the core config (unless the provider modal is open), so removed providers immediately revert to cloud.
Chips are visible but disabled with opacity-60 and a "(coming soon)" label. Only ElevenLabs is fully enabled for now.
The 2s poll was overwriting user routing changes before they could persist. Increase to 15s (users rarely change config externally) and switch routing sync back to seed-once (prev || value) so user edits aren't clobbered.
📝 WalkthroughWalkthroughThis PR implements a configurable voice provider registry enabling users to register and route STT/TTS workloads to external providers (Deepgram, ElevenLabs, OpenAI) or local engines (Whisper, Piper). It adds backend schema, RPC handlers, provider dispatch logic, and a redesigned VoicePanel UI with provider chips and routing controls. ChangesVoice Provider Registry & Routing
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
Routing changes (STT/TTS dropdown) are now local until the user clicks Save. The button disables when there are no unsaved changes. Fixes the "no voice provider with slug 'X' found" error that occurred when the dropdown was changed but never persisted.
The Save button called persistProviders which used the legacy voice_set_providers RPC. That handler only wrote to config.local_ai.stt/tts_provider but not the top-level config.stt/tts_provider — so the config wasn't actually persisted for the new read path. Now both handlers sync both fields.
The i18n coverage test requires new keys in en.ts to also appear in en-1.ts and all locale chunk files. Added 47 voice provider keys (chip, modal, routing, externalProviders) with English fallback values.
Remove tests for removed dictation section (hotkey, save/start/stop). Add mocks for voiceSettingsApi. Update provider change tests to use the new Save button flow. Add tests for chip rendering, coming-soon state, API key modal, and save button enable/disable.
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/openhuman/voice/factory.rs (1)
878-895:⚠️ Potential issue | 🟠 Major | ⚡ Quick winProvider-specific defaults are being overridden by local defaults.
External slug routes inherit
DEFAULT_WHISPER_MODEL/DEFAULT_PIPER_VOICEwhenmodel/voiceis empty, so provider registry defaults (or empty for factory fallback) are never used for bare slugs.Proposed fix
pub fn create_stt_provider( provider: &str, model: &str, config: &Config, ) -> anyhow::Result<Box<dyn SttProvider>> { debug!("{LOG_PREFIX} create_stt_provider provider={provider} model={model}"); - let model = if model.trim().is_empty() { - DEFAULT_WHISPER_MODEL - } else { - model - }; match provider.trim() { "cloud" | "openhuman" => Ok(Box::new(CloudSttProvider::new( super::cloud_transcribe_default_model(), ))), - "whisper" => Ok(Box::new(WhisperSttProvider::new(model))), + "whisper" => { + let whisper_model = if model.trim().is_empty() { DEFAULT_WHISPER_MODEL } else { model }; + Ok(Box::new(WhisperSttProvider::new(whisper_model))) + } other => { let (slug, slug_model) = split_slug_model(other); - let effective_model = if slug_model.is_empty() { - model - } else { - slug_model - }; + let effective_model = if slug_model.is_empty() { model } else { slug_model }; create_stt_provider_by_slug(slug, effective_model, config) } } } @@ pub fn create_tts_provider( provider: &str, voice: &str, config: &Config, ) -> anyhow::Result<Box<dyn TtsProvider>> { debug!("{LOG_PREFIX} create_tts_provider provider={provider} voice={voice}"); - let voice = if voice.trim().is_empty() { - DEFAULT_PIPER_VOICE - } else { - voice - }; match provider.trim() { - "cloud" | "openhuman" => Ok(Box::new(CloudTtsProvider::new(if voice.is_empty() { + "cloud" | "openhuman" => Ok(Box::new(CloudTtsProvider::new(if voice.trim().is_empty() { None } else { Some(voice.to_string()) }))), - "piper" => Ok(Box::new(PiperTtsProvider::new(voice))), + "piper" => { + let piper_voice = if voice.trim().is_empty() { DEFAULT_PIPER_VOICE } else { voice }; + Ok(Box::new(PiperTtsProvider::new(piper_voice))) + } other => { let (slug, slug_voice) = split_slug_model(other); - let effective_voice = if slug_voice.is_empty() { - voice - } else { - slug_voice - }; + let effective_voice = if slug_voice.is_empty() { voice } else { slug_voice }; create_tts_provider_by_slug(slug, effective_voice, config) } } }Also applies to: 917-936
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice/factory.rs` around lines 878 - 895, The factory currently forces local defaults (e.g., DEFAULT_WHISPER_MODEL / DEFAULT_PIPER_VOICE) whenever the incoming model/voice string is empty, which overrides provider-specific defaults for bare slugs; update the slug-handling flow in the provider match arms (the branch using split_slug_model and create_stt_provider_by_slug for STT and the analogous TTS branch) so that when you split a slug+model (using split_slug_model) you treat an empty incoming model/voice as "no override" (i.e., prefer slug_model if present, otherwise pass an empty string or None through to create_stt_provider_by_slug / create_tts_provider_by_slug so the provider registry can apply its own default), rather than substituting DEFAULT_WHISPER_MODEL or DEFAULT_PIPER_VOICE early. Ensure you only apply DEFAULT_WHISPER_MODEL / DEFAULT_PIPER_VOICE for direct provider identifiers like "whisper"/"piper", not for bare slug routes.app/src/components/settings/panels/VoicePanel.tsx (2)
247-276:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRouting save flow treats failed RPC writes as success.
persistProvidersswallows RPC errors, sosaveRoutingstill updatessavedSttProvider/savedTtsProviderand shows success even when persistence failed.🔧 Suggested fix
const persistProviders = async ( @@ ) => { @@ } catch (err) { const message = err instanceof Error ? err.message : t('voice.providers.failedToSave'); setError(message); + throw err; } finally { setIsSavingProviders(false); } }; @@ const saveRouting = useCallback(async () => { @@ try { await persistProviders({ stt_provider: sttProvider, tts_provider: ttsProvider }); setSavedSttProvider(sttProvider); setSavedTtsProvider(ttsProvider); setNotice(t('voice.providers.saved')); - void loadData(true); } catch (err) { setError(err instanceof Error ? err.message : t('voice.providers.failedToSave')); } finally { setIsSavingRouting(false); } }, [sttProvider, ttsProvider, persistProviders, t]);Also applies to: 294-307
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/components/settings/panels/VoicePanel.tsx` around lines 247 - 276, persistProviders currently catches RPC errors and sets local state as if save succeeded, letting saveRouting update savedSttProvider/savedTtsProvider and show success even when openhumanVoiceSetProviders failed; modify persistProviders to surface failures by rethrowing the caught error (or returning a failure flag) after setError so callers like saveRouting can detect failure, and update saveRouting to only update savedSttProvider/savedTtsProvider and show success when persistProviders (which calls openhumanVoiceSetProviders) actually resolves successfully; reference functions/variables: persistProviders, saveRouting, openhumanVoiceSetProviders, savedSttProvider, savedTtsProvider, and loadData to ensure loadData remains conditional on a successful save.
87-1238: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftSplit this panel into focused modules before merge.
This file now far exceeds the repo’s size target; extracting modal/chip/routing sections and provider helpers will reduce regression risk and simplify testing.
As per coding guidelines
**/*.{ts,tsx,rs}: file size should not exceed approximately 500 lines.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/components/settings/panels/VoicePanel.tsx` around lines 87 - 1238, The file is too large — split VoicePanel into focused modules to meet the ~500-line guideline. Extract the chip row UI and logic into a VoiceProviderChips component, the API-key / local-provider modal into a ProviderKeyModal component, and the routing controls into a VoiceRoutingPanel component; move provider-related helpers and RPC wrappers (persistProviders, handleEnableExternalProvider, handleRemoveProvider, loadData-related calls, installButtonLabel, handleInstallWhisper, handleInstallPiper, and any testVoiceProvider/ setVoiceProviderKey/clearVoiceProviderKey usage) into a shared hook or utils (e.g., useVoiceSettings) so state and side-effects can be reused. Make VoicePanel a thin coordinator that imports these components/hooks, passes down state (voiceSettings, sttProvider, ttsProvider, sttModel, ttsVoice, elevenlabsVoiceId, isSaving* flags, test results, set* callbacks) and preserves existing data-testid and public function names so behavior and tests continue to work.
🧹 Nitpick comments (5)
src/openhuman/inference/voice/local_speech.rs (1)
198-211: ⚡ Quick winAdd debug logs in error branches for better diagnostics.
The new specialized error detection at lines 202-207 and the generic error path at lines 208-211 lack debug logging. Per coding guidelines, new flows should log branches and errors with verbose diagnostics to aid troubleshooting.
📊 Proposed debug logging additions
if detail.contains("libespeak-ng") || detail.contains("Library not loaded") { + debug!("{LOG_PREFIX} detected missing libespeak-ng dependency in stderr"); return Err(format!( "{LOG_PREFIX} piper requires espeak-ng which is not installed. \ Run: brew install espeak-ng" )); } + debug!("{LOG_PREFIX} piper subprocess failed, stderr: {detail}"); return Err(format!( "{LOG_PREFIX} piper failed (exit={:?}): {detail}",Based on learnings from coding guidelines: Debug logging in Rust core must log branches and errors with stable grep-friendly prefixes for diagnosis.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/inference/voice/local_speech.rs` around lines 198 - 211, Add debug logging before both early-return error branches in the piper invocation block: log a stable, grep-friendly prefix (e.g. "DEBUG: local_speech:piper:error:") and include the variables detail, exit_code (if set), and out_path to aid diagnostics. Specifically, just before the espeak-ng detection branch that returns Err (uses variables stderr/detail and LOG_PREFIX) and just before the generic piper failure return (uses exit_code/detail/LOG_PREFIX), emit a debug log with the branch identifier, the full stderr/detail, the exit code, and the path removed so operators can correlate failures to files.app/src/services/api/voiceSettingsApi.ts (2)
11-16: ⚡ Quick winSplit
AuthProfileSummaryinto a standaloneimport type.The repo rule here is the explicit
import typeform, not a mixed value/type import list.♻️ Suggested cleanup
-import { - authListProviderCredentials, - type AuthProfileSummary, - authRemoveProviderCredentials, - authStoreProviderCredentials, -} from '../../utils/tauriCommands/auth'; +import { + authListProviderCredentials, + authRemoveProviderCredentials, + authStoreProviderCredentials, +} from '../../utils/tauriCommands/auth'; +import type { AuthProfileSummary } from '../../utils/tauriCommands/auth';As per coding guidelines,
**/*.{ts,tsx}: Use staticimport typefor TypeScript type imports instead ofimportto enable tree-shaking and faster type checking.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/services/api/voiceSettingsApi.ts` around lines 11 - 16, The import currently mixes values and a type (AuthProfileSummary) in one import list; change it so AuthProfileSummary is imported using a standalone "import type" statement and leave authListProviderCredentials, authRemoveProviderCredentials, and authStoreProviderCredentials in the regular import. Update the import(s) in voiceSettingsApi.ts to split out AuthProfileSummary into its own import type line to follow the repo rule and enable proper tree-shaking/type-only imports.
161-168: ⚡ Quick winUse the shared
debug()logger for these diagnostics.New app-side diagnostics should follow the repo's namespaced
debugpattern instead ofconsole.debug.♻️ Suggested pattern
+import createDebug from 'debug'; +const log = createDebug('openhuman:voice-settings-api'); ... - if (process.env.NODE_ENV !== 'production') { - console.debug('[voiceSettingsApi] loaded', { - providerCount: voiceProviders.length, - slugs: voiceProviders.map(p => p.slug), - stt: configResult.stt_provider, - tts: configResult.tts_provider, - }); - } + log('loaded %O', { + providerCount: voiceProviders.length, + slugs: voiceProviders.map(p => p.slug), + stt: configResult.stt_provider, + tts: configResult.tts_provider, + }); ... - if (process.env.NODE_ENV !== 'production') { - console.debug('[voiceSettingsApi] saving patch', patch); - } + log('saving patch %O', patch);As per coding guidelines,
app/src/**/*.{ts,tsx}: Debug logging in the app should be namespaced usingdebug()from a named logger instance and include dev-only detail.Also applies to: 216-218
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/services/api/voiceSettingsApi.ts` around lines 161 - 168, Replace the direct console.debug call in voiceSettingsApi with the repo's shared namespaced debug logger: import the shared debug() factory, create a logger namespace for this module (e.g., const log = debug('app:voiceSettingsApi')), and call log(...) instead of console.debug, passing the same diagnostic object (voiceProviders.length, voiceProviders.map(p=>p.slug), configResult.stt_provider, configResult.tts_provider). Apply the same change to the other console.debug usage later in the file (the block around the second occurrence) so all development-only diagnostics use the shared debug logger and consistent namespace.app/src/services/api/voiceSettingsApi.test.ts (1)
3-7: ⚡ Quick winSplit
VoiceProviderRefinto a standaloneimport type.This file is also under the repo-wide TypeScript import-style rule.
♻️ Suggested cleanup
import { parseVoiceProviderString, serializeVoiceProviderRef, - type VoiceProviderRef, } from './voiceSettingsApi'; +import type { VoiceProviderRef } from './voiceSettingsApi';As per coding guidelines,
**/*.{ts,tsx}: Use staticimport typefor TypeScript type imports instead ofimportto enable tree-shaking and faster type checking.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/services/api/voiceSettingsApi.test.ts` around lines 3 - 7, Update the import so the TypeScript type VoiceProviderRef is imported using a standalone `import type` while keeping the value imports for parseVoiceProviderString and serializeVoiceProviderRef; specifically, replace the combined import from './voiceSettingsApi' with a value import for parseVoiceProviderString and serializeVoiceProviderRef and a separate `import type { VoiceProviderRef }` to comply with the repo's static type-import rule.app/src/components/settings/panels/VoicePanel.tsx (1)
157-165: ⚡ Quick winUse namespaced
debug()logging instead of directconsole.*in app code.These changed diagnostics are useful, but they should follow the app’s
debuglogger pattern for consistent filtering and dev-only verbosity.As per coding guidelines
app/src/**/*.{ts,tsx}: debug logging in the app should be namespaced usingdebug()with dev-only detail.Also applies to: 221-223, 264-266, 363-365, 431-432, 455-456, 1101-1104, 1127-1130
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/components/settings/panels/VoicePanel.tsx` around lines 157 - 165, Replace direct console.debug calls inside the whisperInstallStatus().catch and piperInstallStatus().catch handlers with the app's namespaced debug logger (e.g., import/obtain a debug instance and call debug('voice-install:whisper', err) and debug('voice-install:piper', err)); if a debug instance isn't already present in VoicePanel.tsx, add a namespaced logger (e.g., const debug = createDebug('app:voice-install') or follow the existing app pattern) and use it for all the other similar occurrences mentioned (the other catch/log sites) so dev-only diagnostics use the standard namespaced debug API instead of console.*
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 882-883: Replace hard-coded user-visible strings in VoicePanel.tsx
(e.g., the 'Test failed' fallback and the ElevenLabs placeholder used in UI
paths) with localized messages obtained via useT(); add corresponding
translation keys to your locale files (app/src/lib/i18n/en.ts and other locale
chunks) and reference those keys from the component (useT('voice.<key>')).
Update all occurrences called out in the review (around the existing detail: err
instanceof Error ? err.message : 'Test failed' and the ElevenLabs placeholder
locations) so that the component reads from useT() instead of inline literals
and ensure the new keys are added to en.ts and locale chunks.
In `@app/src/services/api/voiceSettingsApi.ts`:
- Around line 268-269: clearVoiceProviderKey currently only removes the prefixed
credential key (authKeyForSlug(slug)) which leaves the legacy bare-slug
credential and causes loadVoiceSettings to still detect has_api_key; update
clearVoiceProviderKey to also remove the bare slug alias by calling
authRemoveProviderCredentials for both authKeyForSlug(slug) and the raw slug
(ensuring both removals are awaited/completed) so migrated/shared credentials
are fully cleared before returning.
- Around line 106-114: serializeVoiceProviderRef currently outputs a bare
providerSlug for kind === 'external' when ref.model is empty, which later causes
parseVoiceProviderString('deepgram') to be treated as { kind: 'cloud' } and drop
the external selection; update serializeVoiceProviderRef (the external branch)
to never return a bare slug: either throw/return an error when ref.model is
missing or populate a fallback model before serializing (e.g., call your
default-model lookup like getDefaultModelForProvider(ref.providerSlug) or use
ref.defaultModel) and return `${ref.providerSlug}:${defaultModel}`; keep
references to serializeVoiceProviderRef and parseVoiceProviderString so
reviewers can verify behavior.
In `@src/openhuman/config/schema/voice_providers.rs`:
- Around line 50-51: Adjust the serde discriminators on the STT/TTS enums so
wire values match the update handlers: keep the existing #[serde(rename_all =
"lowercase")] on the enums but add per-variant renames—on SttApiStyle add
#[serde(rename = "openai_audio")] to the OpenaiAudio variant; on TtsApiStyle add
#[serde(rename = "elevenlabs")] to the ElevenLabs variant and #[serde(rename =
"openai_audio")] to the OpenaiAudio variant—this ensures serialized names match
the expected wire values while leaving other variants unchanged.
In `@src/openhuman/inference/voice/local_speech.rs`:
- Around line 202-207: The error message in the detail check (the if block using
detail.contains and returning Err with LOG_PREFIX) hardcodes "brew install
espeak-ng"; change this to a platform-aware or generic message: detect OS using
Rust cfg!(target_os = "...") (e.g., "macos", "linux", "windows") and return a
tailored suggestion ("brew install espeak-ng" for macOS, "apt install espeak-ng"
or "dnf install espeak-ng" for common Linux variants, or a Windows note) or
replace the hardcoded command with a concise generic instruction directing users
to install espeak-ng or consult the project's documentation; update the Err
returned by the same return Err(...) in local_speech.rs so the log includes
LOG_PREFIX plus the new platform-aware/generic guidance.
In `@src/openhuman/voice/factory.rs`:
- Around line 420-427: The outbound reqwest calls currently use
reqwest::Client::new() and then call client.post(&url)... .send().await without
any explicit timeout; change these to create a client with an explicit timeout
(e.g., via reqwest::Client::builder().timeout(Duration::from_secs(...)).build()
) or wrap the .send() future with a per-request tokio::time::timeout, and
replace instances at the shown call sites (the client variable created via
reqwest::Client::new(), the client.post(&url)... .send().await sequence and
equivalent TTS/STT request sites) so that each external STT/TTS request fails
fast on upstream stalls; apply the same fix to the other occurrences called out
(the similar blocks at the other ranges).
In `@src/openhuman/voice/schemas.rs`:
- Around line 1242-1249: The validator currently returns Ok for any non-empty or
colon-containing string, allowing malformed providers; update the check around
the variable other to only accept "cloud" or "whisper" or a well-formed
slug:model by: if other == "cloud" || other == "whisper" return Ok, else if
other.contains(':') split_once(':') and ensure both left (slug) and right
(model) are non-empty and the slug matches the expected pattern (e.g.,
alphanumeric, dashes/underscores) before returning Ok, otherwise return Err with
the existing message; apply the same stricter validation logic to the second
occurrence noted (lines ~1259-1264).
- Around line 862-944: The update loop allows duplicate slugs in the incoming
providers list which makes provider resolution nondeterministic; fix by
validating uniqueness of the normalized slug before accepting entries: when
iterating providers in the block that constructs new_entries (the loop creating
slug and building VoiceProviderCreds), maintain a HashSet<String> (or similar)
of seen slugs and if you encounter a slug already in the set return Err with a
clear message about duplicate slug; add the slug to the set only after it passes
the reserved check and normalization; ensure this check references the same slug
variable and affects the config.voice_providers assignment so duplicate entries
are rejected rather than stored.
- Around line 1178-1198: The call in validate_tts_provider_key uses
reqwest::Client::new() and then sends req without a timeout; change to set an
explicit timeout (e.g. a few seconds) either by constructing the client with
Client::builder().timeout(Duration::from_secs(...)).build()? or by applying
RequestBuilder::timeout(...) on req before .send(), and import
std::time::Duration; ensure the timeout is applied to the request that creates
resp so the .send().await cannot hang indefinitely.
---
Outside diff comments:
In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 247-276: persistProviders currently catches RPC errors and sets
local state as if save succeeded, letting saveRouting update
savedSttProvider/savedTtsProvider and show success even when
openhumanVoiceSetProviders failed; modify persistProviders to surface failures
by rethrowing the caught error (or returning a failure flag) after setError so
callers like saveRouting can detect failure, and update saveRouting to only
update savedSttProvider/savedTtsProvider and show success when persistProviders
(which calls openhumanVoiceSetProviders) actually resolves successfully;
reference functions/variables: persistProviders, saveRouting,
openhumanVoiceSetProviders, savedSttProvider, savedTtsProvider, and loadData to
ensure loadData remains conditional on a successful save.
- Around line 87-1238: The file is too large — split VoicePanel into focused
modules to meet the ~500-line guideline. Extract the chip row UI and logic into
a VoiceProviderChips component, the API-key / local-provider modal into a
ProviderKeyModal component, and the routing controls into a VoiceRoutingPanel
component; move provider-related helpers and RPC wrappers (persistProviders,
handleEnableExternalProvider, handleRemoveProvider, loadData-related calls,
installButtonLabel, handleInstallWhisper, handleInstallPiper, and any
testVoiceProvider/ setVoiceProviderKey/clearVoiceProviderKey usage) into a
shared hook or utils (e.g., useVoiceSettings) so state and side-effects can be
reused. Make VoicePanel a thin coordinator that imports these components/hooks,
passes down state (voiceSettings, sttProvider, ttsProvider, sttModel, ttsVoice,
elevenlabsVoiceId, isSaving* flags, test results, set* callbacks) and preserves
existing data-testid and public function names so behavior and tests continue to
work.
In `@src/openhuman/voice/factory.rs`:
- Around line 878-895: The factory currently forces local defaults (e.g.,
DEFAULT_WHISPER_MODEL / DEFAULT_PIPER_VOICE) whenever the incoming model/voice
string is empty, which overrides provider-specific defaults for bare slugs;
update the slug-handling flow in the provider match arms (the branch using
split_slug_model and create_stt_provider_by_slug for STT and the analogous TTS
branch) so that when you split a slug+model (using split_slug_model) you treat
an empty incoming model/voice as "no override" (i.e., prefer slug_model if
present, otherwise pass an empty string or None through to
create_stt_provider_by_slug / create_tts_provider_by_slug so the provider
registry can apply its own default), rather than substituting
DEFAULT_WHISPER_MODEL or DEFAULT_PIPER_VOICE early. Ensure you only apply
DEFAULT_WHISPER_MODEL / DEFAULT_PIPER_VOICE for direct provider identifiers like
"whisper"/"piper", not for bare slug routes.
---
Nitpick comments:
In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 157-165: Replace direct console.debug calls inside the
whisperInstallStatus().catch and piperInstallStatus().catch handlers with the
app's namespaced debug logger (e.g., import/obtain a debug instance and call
debug('voice-install:whisper', err) and debug('voice-install:piper', err)); if a
debug instance isn't already present in VoicePanel.tsx, add a namespaced logger
(e.g., const debug = createDebug('app:voice-install') or follow the existing app
pattern) and use it for all the other similar occurrences mentioned (the other
catch/log sites) so dev-only diagnostics use the standard namespaced debug API
instead of console.*
In `@app/src/services/api/voiceSettingsApi.test.ts`:
- Around line 3-7: Update the import so the TypeScript type VoiceProviderRef is
imported using a standalone `import type` while keeping the value imports for
parseVoiceProviderString and serializeVoiceProviderRef; specifically, replace
the combined import from './voiceSettingsApi' with a value import for
parseVoiceProviderString and serializeVoiceProviderRef and a separate `import
type { VoiceProviderRef }` to comply with the repo's static type-import rule.
In `@app/src/services/api/voiceSettingsApi.ts`:
- Around line 11-16: The import currently mixes values and a type
(AuthProfileSummary) in one import list; change it so AuthProfileSummary is
imported using a standalone "import type" statement and leave
authListProviderCredentials, authRemoveProviderCredentials, and
authStoreProviderCredentials in the regular import. Update the import(s) in
voiceSettingsApi.ts to split out AuthProfileSummary into its own import type
line to follow the repo rule and enable proper tree-shaking/type-only imports.
- Around line 161-168: Replace the direct console.debug call in voiceSettingsApi
with the repo's shared namespaced debug logger: import the shared debug()
factory, create a logger namespace for this module (e.g., const log =
debug('app:voiceSettingsApi')), and call log(...) instead of console.debug,
passing the same diagnostic object (voiceProviders.length,
voiceProviders.map(p=>p.slug), configResult.stt_provider,
configResult.tts_provider). Apply the same change to the other console.debug
usage later in the file (the block around the second occurrence) so all
development-only diagnostics use the shared debug logger and consistent
namespace.
In `@src/openhuman/inference/voice/local_speech.rs`:
- Around line 198-211: Add debug logging before both early-return error branches
in the piper invocation block: log a stable, grep-friendly prefix (e.g. "DEBUG:
local_speech:piper:error:") and include the variables detail, exit_code (if
set), and out_path to aid diagnostics. Specifically, just before the espeak-ng
detection branch that returns Err (uses variables stderr/detail and LOG_PREFIX)
and just before the generic piper failure return (uses
exit_code/detail/LOG_PREFIX), emit a debug log with the branch identifier, the
full stderr/detail, the exit code, and the path removed so operators can
correlate failures to files.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: eff00ca7-c689-432b-a4a6-68e42ebcac71
📒 Files selected for processing (28)
app/src/components/settings/panels/VoicePanel.tsxapp/src/components/settings/panels/__tests__/VoicePanel.test.tsxapp/src/lib/i18n/chunks/ar-1.tsapp/src/lib/i18n/chunks/bn-1.tsapp/src/lib/i18n/chunks/de-1.tsapp/src/lib/i18n/chunks/en-1.tsapp/src/lib/i18n/chunks/es-1.tsapp/src/lib/i18n/chunks/fr-1.tsapp/src/lib/i18n/chunks/hi-1.tsapp/src/lib/i18n/chunks/id-1.tsapp/src/lib/i18n/chunks/it-1.tsapp/src/lib/i18n/chunks/ko-1.tsapp/src/lib/i18n/chunks/pt-1.tsapp/src/lib/i18n/chunks/ru-1.tsapp/src/lib/i18n/chunks/zh-CN-1.tsapp/src/lib/i18n/en.tsapp/src/services/api/voiceSettingsApi.test.tsapp/src/services/api/voiceSettingsApi.tsapp/src/utils/tauriCommands/voice.tssrc/openhuman/config/ops.rssrc/openhuman/config/schema/mod.rssrc/openhuman/config/schema/types.rssrc/openhuman/config/schema/voice_providers.rssrc/openhuman/inference/voice/local_speech.rssrc/openhuman/voice/factory.rssrc/openhuman/voice/mod.rssrc/openhuman/voice/schemas.rssrc/openhuman/voice/schemas_tests.rs
| detail: err instanceof Error ? err.message : 'Test failed', | ||
| }); |
There was a problem hiding this comment.
User-visible fallback text is hard-coded instead of localized.
'Test failed' and the ElevenLabs placeholder literal are rendered in UI paths but bypass useT().
🌐 Suggested fix
- detail: err instanceof Error ? err.message : 'Test failed',
+ detail: err instanceof Error ? err.message : t('voice.modal.testFailed'),- placeholder="JBFqnCBsd6RMkjVDRZzb"
+ placeholder={t('voice.routing.elevenlabsVoicePlaceholder')}Add corresponding keys in app/src/lib/i18n/en.ts and locale chunks.
As per coding guidelines **/*.{ts,tsx}: all user-visible UI strings must be internationalized via useT() and hard-coded literals in JSX/props are not permitted.
Also applies to: 963-964, 1061-1062, 1174-1174
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/src/components/settings/panels/VoicePanel.tsx` around lines 882 - 883,
Replace hard-coded user-visible strings in VoicePanel.tsx (e.g., the 'Test
failed' fallback and the ElevenLabs placeholder used in UI paths) with localized
messages obtained via useT(); add corresponding translation keys to your locale
files (app/src/lib/i18n/en.ts and other locale chunks) and reference those keys
from the component (useT('voice.<key>')). Update all occurrences called out in
the review (around the existing detail: err instanceof Error ? err.message :
'Test failed' and the ElevenLabs placeholder locations) so that the component
reads from useT() instead of inline literals and ensure the new keys are added
to en.ts and locale chunks.
| export function serializeVoiceProviderRef(ref: VoiceProviderRef): string { | ||
| switch (ref.kind) { | ||
| case 'cloud': | ||
| return 'cloud'; | ||
| case 'local': | ||
| return ref.model ? `${ref.engine}:${ref.model}` : ref.engine; | ||
| case 'external': | ||
| return ref.model ? `${ref.providerSlug}:${ref.model}` : ref.providerSlug; | ||
| } |
There was a problem hiding this comment.
Don't serialize external providers as bare slugs.
parseVoiceProviderString('deepgram') falls back to { kind: 'cloud' }, so this branch can drop an external provider selection on reload whenever model is empty. Please either reject empty external models here or populate a default model/voice before serializing.
🐛 Minimal local fix
case 'external':
- return ref.model ? `${ref.providerSlug}:${ref.model}` : ref.providerSlug;
+ if (!ref.model) {
+ throw new Error(`External voice provider "${ref.providerSlug}" is missing a model/voice`);
+ }
+ return `${ref.providerSlug}:${ref.model}`;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export function serializeVoiceProviderRef(ref: VoiceProviderRef): string { | |
| switch (ref.kind) { | |
| case 'cloud': | |
| return 'cloud'; | |
| case 'local': | |
| return ref.model ? `${ref.engine}:${ref.model}` : ref.engine; | |
| case 'external': | |
| return ref.model ? `${ref.providerSlug}:${ref.model}` : ref.providerSlug; | |
| } | |
| export function serializeVoiceProviderRef(ref: VoiceProviderRef): string { | |
| switch (ref.kind) { | |
| case 'cloud': | |
| return 'cloud'; | |
| case 'local': | |
| return ref.model ? `${ref.engine}:${ref.model}` : ref.engine; | |
| case 'external': | |
| if (!ref.model) { | |
| throw new Error(`External voice provider "${ref.providerSlug}" is missing a model/voice`); | |
| } | |
| return `${ref.providerSlug}:${ref.model}`; | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/src/services/api/voiceSettingsApi.ts` around lines 106 - 114,
serializeVoiceProviderRef currently outputs a bare providerSlug for kind ===
'external' when ref.model is empty, which later causes
parseVoiceProviderString('deepgram') to be treated as { kind: 'cloud' } and drop
the external selection; update serializeVoiceProviderRef (the external branch)
to never return a bare slug: either throw/return an error when ref.model is
missing or populate a fallback model before serializing (e.g., call your
default-model lookup like getDefaultModelForProvider(ref.providerSlug) or use
ref.defaultModel) and return `${ref.providerSlug}:${defaultModel}`; keep
references to serializeVoiceProviderRef and parseVoiceProviderString so
reviewers can verify behavior.
| export async function clearVoiceProviderKey(slug: string): Promise<void> { | ||
| await authRemoveProviderCredentials({ provider: authKeyForSlug(slug) }); |
There was a problem hiding this comment.
Clear the legacy bare-slug credential alias too.
loadVoiceSettings() treats both provider:${slug} and the bare slug as valid matches. Removing only the prefixed key leaves migrated/shared credentials behind, so has_api_key comes back true after the next refresh.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/src/services/api/voiceSettingsApi.ts` around lines 268 - 269,
clearVoiceProviderKey currently only removes the prefixed credential key
(authKeyForSlug(slug)) which leaves the legacy bare-slug credential and causes
loadVoiceSettings to still detect has_api_key; update clearVoiceProviderKey to
also remove the bare slug alias by calling authRemoveProviderCredentials for
both authKeyForSlug(slug) and the raw slug (ensuring both removals are
awaited/completed) so migrated/shared credentials are fully cleared before
returning.
| #[serde(rename_all = "lowercase")] | ||
| pub enum SttApiStyle { |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify enum serialization strategy vs accepted wire values
rg -n 'rename_all = "lowercase"|enum SttApiStyle|enum TtsApiStyle|openai_audio|stt_api_style|tts_api_style' \
src/openhuman/config/schema/voice_providers.rs \
src/openhuman/voice/schemas.rs \
src/openhuman/config/ops.rsRepository: tinyhumansai/openhuman
Length of output: 3376
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the enum variant definitions
sed -n '1,220p' src/openhuman/config/schema/voice_providers.rs | nl -ba | sed -n '1,220p'
# Show the update parsing branches that define accepted wire values
sed -n '820,1020p' src/openhuman/voice/schemas.rs | nl -ba | sed -n '820,1020p'
# (Optional) show how stt_api_style / tts_api_style get emitted from provider configs
sed -n '940,1010p' src/openhuman/voice/schemas.rs | nl -ba | sed -n '940,1010p'Repository: tinyhumansai/openhuman
Length of output: 109
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the enum variant definitions (small range)
sed -n '1,220p' src/openhuman/config/schema/voice_providers.rs | cat -n | sed -n '1,220p'
# Show the update parsing branches that define accepted wire values
sed -n '820,1020p' src/openhuman/voice/schemas.rs | cat -n | sed -n '1,220p'
# Show emission/serialization from provider configs (small range)
sed -n '940,1010p' src/openhuman/voice/schemas.rs | cat -n | sed -n '1,120p'Repository: tinyhumansai/openhuman
Length of output: 21804
Fix serde discriminators for STT/TTS api-style enums to match the wire values
SttApiStylecurrently serializesOpenaiAudioasopenaiaudio(rename_all = "lowercase"), but update handlers acceptopenai_audio—so snapshot → UI → update won’t round-trip.TtsApiStylecan’t be switched torename_all = "snake_case"wholesale: it would serializeElevenLabsaseleven_labs, while the update handlers acceptelevenlabs. Use a per-variant#[serde(rename = "elevenlabs")]forElevenLabs(and snake_case forOpenaiAudio).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/config/schema/voice_providers.rs` around lines 50 - 51, Adjust
the serde discriminators on the STT/TTS enums so wire values match the update
handlers: keep the existing #[serde(rename_all = "lowercase")] on the enums but
add per-variant renames—on SttApiStyle add #[serde(rename = "openai_audio")] to
the OpenaiAudio variant; on TtsApiStyle add #[serde(rename = "elevenlabs")] to
the ElevenLabs variant and #[serde(rename = "openai_audio")] to the OpenaiAudio
variant—this ensures serialized names match the expected wire values while
leaving other variants unchanged.
| if detail.contains("libespeak-ng") || detail.contains("Library not loaded") { | ||
| return Err(format!( | ||
| "{LOG_PREFIX} piper requires espeak-ng which is not installed. \ | ||
| Run: brew install espeak-ng" | ||
| )); | ||
| } |
There was a problem hiding this comment.
Make installation instructions platform-aware or generic.
The error message hardcodes brew install espeak-ng, which only helps macOS users. Linux users need apt install espeak-ng (Debian/Ubuntu), dnf install espeak-ng (Fedora), or equivalent; Windows requires a different installation method.
🛠️ Proposed fix with platform detection
if detail.contains("libespeak-ng") || detail.contains("Library not loaded") {
+ let install_hint = if cfg!(target_os = "macos") {
+ "brew install espeak-ng"
+ } else if cfg!(target_os = "linux") {
+ "apt install espeak-ng (Debian/Ubuntu) or dnf install espeak-ng (Fedora)"
+ } else if cfg!(target_os = "windows") {
+ "download and install espeak-ng from https://github.com/espeak-ng/espeak-ng/releases"
+ } else {
+ "install espeak-ng for your platform"
+ };
return Err(format!(
"{LOG_PREFIX} piper requires espeak-ng which is not installed. \
- Run: brew install espeak-ng"
+ {install_hint}"
));
}Alternatively, use a generic message pointing to documentation:
return Err(format!(
"{LOG_PREFIX} piper requires espeak-ng which is not installed. \
- Run: brew install espeak-ng"
+ Install espeak-ng using your platform's package manager."
));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if detail.contains("libespeak-ng") || detail.contains("Library not loaded") { | |
| return Err(format!( | |
| "{LOG_PREFIX} piper requires espeak-ng which is not installed. \ | |
| Run: brew install espeak-ng" | |
| )); | |
| } | |
| if detail.contains("libespeak-ng") || detail.contains("Library not loaded") { | |
| let install_hint = if cfg!(target_os = "macos") { | |
| "brew install espeak-ng" | |
| } else if cfg!(target_os = "linux") { | |
| "apt install espeak-ng (Debian/Ubuntu) or dnf install espeak-ng (Fedora)" | |
| } else if cfg!(target_os = "windows") { | |
| "download and install espeak-ng from https://github.com/espeak-ng/espeak-ng/releases" | |
| } else { | |
| "install espeak-ng for your platform" | |
| }; | |
| return Err(format!( | |
| "{LOG_PREFIX} piper requires espeak-ng which is not installed. \ | |
| {install_hint}" | |
| )); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/inference/voice/local_speech.rs` around lines 202 - 207, The
error message in the detail check (the if block using detail.contains and
returning Err with LOG_PREFIX) hardcodes "brew install espeak-ng"; change this
to a platform-aware or generic message: detect OS using Rust cfg!(target_os =
"...") (e.g., "macos", "linux", "windows") and return a tailored suggestion
("brew install espeak-ng" for macOS, "apt install espeak-ng" or "dnf install
espeak-ng" for common Linux variants, or a Windows note) or replace the
hardcoded command with a concise generic instruction directing users to install
espeak-ng or consult the project's documentation; update the Err returned by the
same return Err(...) in local_speech.rs so the log includes LOG_PREFIX plus the
new platform-aware/generic guidance.
| let client = reqwest::Client::new(); | ||
| let resp = client | ||
| .post(&url) | ||
| .header("Authorization", format!("Bearer {}", self.api_key)) | ||
| .multipart(form) | ||
| .send() | ||
| .await | ||
| .map_err(|e| format!("[voice-stt] external STT request failed: {e}"))?; |
There was a problem hiding this comment.
External provider HTTP calls are missing explicit timeouts.
All new outbound STT/TTS requests use default reqwest client timeouts, which can hang request paths during upstream stalls and degrade core responsiveness.
Proposed fix pattern
- let client = reqwest::Client::new();
+ let client = reqwest::Client::builder()
+ .timeout(std::time::Duration::from_secs(20))
+ .build()
+ .map_err(|e| format!("[voice-factory] http client init failed: {e}"))?;Also applies to: 461-469, 601-608, 648-655
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/voice/factory.rs` around lines 420 - 427, The outbound reqwest
calls currently use reqwest::Client::new() and then call client.post(&url)...
.send().await without any explicit timeout; change these to create a client with
an explicit timeout (e.g., via
reqwest::Client::builder().timeout(Duration::from_secs(...)).build() ) or wrap
the .send() future with a per-request tokio::time::timeout, and replace
instances at the shown call sites (the client variable created via
reqwest::Client::new(), the client.post(&url)... .send().await sequence and
equivalent TTS/STT request sites) so that each external STT/TTS request fails
fast on upstream stalls; apply the same fix to the other occurrences called out
(the similar blocks at the other ranges).
| if let Some(providers) = p.voice_providers { | ||
| let mut new_entries = Vec::with_capacity(providers.len()); | ||
| for update in providers { | ||
| let slug = update.slug.trim().to_lowercase(); | ||
| if is_voice_slug_reserved(&slug) { | ||
| return Err(format!( | ||
| "slug '{}' is reserved and cannot be used for a voice provider", | ||
| slug | ||
| )); | ||
| } | ||
|
|
||
| let capability = match update.capability.as_deref() { | ||
| Some("stt") => VoiceCapability::Stt, | ||
| Some("tts") => VoiceCapability::Tts, | ||
| Some("both") | None => VoiceCapability::Both, | ||
| Some(other) => { | ||
| return Err(format!( | ||
| "invalid capability '{other}' (valid: 'stt', 'tts', 'both')" | ||
| )) | ||
| } | ||
| }; | ||
|
|
||
| let auth_style = match update.auth_style.as_deref() { | ||
| Some("bearer") | None => crate::openhuman::config::schema::AuthStyle::Bearer, | ||
| Some("none") => crate::openhuman::config::schema::AuthStyle::None, | ||
| Some(other) => { | ||
| return Err(format!( | ||
| "invalid auth_style '{other}' for voice provider (valid: 'bearer', 'none')" | ||
| )) | ||
| } | ||
| }; | ||
|
|
||
| let stt_api_style = match update.stt_api_style.as_deref() { | ||
| Some("deepgram") => SttApiStyle::Deepgram, | ||
| Some("openai_audio") | None => SttApiStyle::OpenaiAudio, | ||
| Some(other) => { | ||
| return Err(format!( | ||
| "invalid stt_api_style '{other}' (valid: 'openai_audio', 'deepgram')" | ||
| )) | ||
| } | ||
| }; | ||
|
|
||
| let tts_api_style = match update.tts_api_style.as_deref() { | ||
| Some("elevenlabs") => TtsApiStyle::ElevenLabs, | ||
| Some("openai_audio") | None => TtsApiStyle::OpenaiAudio, | ||
| Some(other) => { | ||
| return Err(format!( | ||
| "invalid tts_api_style '{other}' (valid: 'openai_audio', 'elevenlabs')" | ||
| )) | ||
| } | ||
| }; | ||
|
|
||
| let id = update | ||
| .id | ||
| .filter(|id| !id.trim().is_empty()) | ||
| .or_else(|| { | ||
| config | ||
| .voice_providers | ||
| .iter() | ||
| .find(|e| e.slug == slug) | ||
| .map(|e| e.id.clone()) | ||
| }) | ||
| .unwrap_or_else(|| generate_voice_provider_id(&slug)); | ||
|
|
||
| let label = update.label.unwrap_or_else(|| slug.clone()); | ||
|
|
||
| let endpoint = update.endpoint.unwrap_or_default(); | ||
|
|
||
| new_entries.push(VoiceProviderCreds { | ||
| id, | ||
| slug, | ||
| label, | ||
| endpoint, | ||
| auth_style, | ||
| capability, | ||
| stt_api_style, | ||
| tts_api_style, | ||
| default_stt_model: update.default_stt_model, | ||
| default_tts_voice: update.default_tts_voice, | ||
| }); | ||
| } | ||
| config.voice_providers = new_entries; | ||
| } |
There was a problem hiding this comment.
Reject duplicate voice-provider slugs during registry updates.
The update path currently allows duplicate slug entries. Provider resolution uses first-match lookup, so duplicates make routing nondeterministic and fragile.
Proposed fix
if let Some(providers) = p.voice_providers {
+ let mut seen_slugs = std::collections::HashSet::new();
let mut new_entries = Vec::with_capacity(providers.len());
for update in providers {
let slug = update.slug.trim().to_lowercase();
+ if !seen_slugs.insert(slug.clone()) {
+ return Err(format!("duplicate voice provider slug '{slug}'"));
+ }
if is_voice_slug_reserved(&slug) {
return Err(format!(
"slug '{}' is reserved and cannot be used for a voice provider",
slug
));
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if let Some(providers) = p.voice_providers { | |
| let mut new_entries = Vec::with_capacity(providers.len()); | |
| for update in providers { | |
| let slug = update.slug.trim().to_lowercase(); | |
| if is_voice_slug_reserved(&slug) { | |
| return Err(format!( | |
| "slug '{}' is reserved and cannot be used for a voice provider", | |
| slug | |
| )); | |
| } | |
| let capability = match update.capability.as_deref() { | |
| Some("stt") => VoiceCapability::Stt, | |
| Some("tts") => VoiceCapability::Tts, | |
| Some("both") | None => VoiceCapability::Both, | |
| Some(other) => { | |
| return Err(format!( | |
| "invalid capability '{other}' (valid: 'stt', 'tts', 'both')" | |
| )) | |
| } | |
| }; | |
| let auth_style = match update.auth_style.as_deref() { | |
| Some("bearer") | None => crate::openhuman::config::schema::AuthStyle::Bearer, | |
| Some("none") => crate::openhuman::config::schema::AuthStyle::None, | |
| Some(other) => { | |
| return Err(format!( | |
| "invalid auth_style '{other}' for voice provider (valid: 'bearer', 'none')" | |
| )) | |
| } | |
| }; | |
| let stt_api_style = match update.stt_api_style.as_deref() { | |
| Some("deepgram") => SttApiStyle::Deepgram, | |
| Some("openai_audio") | None => SttApiStyle::OpenaiAudio, | |
| Some(other) => { | |
| return Err(format!( | |
| "invalid stt_api_style '{other}' (valid: 'openai_audio', 'deepgram')" | |
| )) | |
| } | |
| }; | |
| let tts_api_style = match update.tts_api_style.as_deref() { | |
| Some("elevenlabs") => TtsApiStyle::ElevenLabs, | |
| Some("openai_audio") | None => TtsApiStyle::OpenaiAudio, | |
| Some(other) => { | |
| return Err(format!( | |
| "invalid tts_api_style '{other}' (valid: 'openai_audio', 'elevenlabs')" | |
| )) | |
| } | |
| }; | |
| let id = update | |
| .id | |
| .filter(|id| !id.trim().is_empty()) | |
| .or_else(|| { | |
| config | |
| .voice_providers | |
| .iter() | |
| .find(|e| e.slug == slug) | |
| .map(|e| e.id.clone()) | |
| }) | |
| .unwrap_or_else(|| generate_voice_provider_id(&slug)); | |
| let label = update.label.unwrap_or_else(|| slug.clone()); | |
| let endpoint = update.endpoint.unwrap_or_default(); | |
| new_entries.push(VoiceProviderCreds { | |
| id, | |
| slug, | |
| label, | |
| endpoint, | |
| auth_style, | |
| capability, | |
| stt_api_style, | |
| tts_api_style, | |
| default_stt_model: update.default_stt_model, | |
| default_tts_voice: update.default_tts_voice, | |
| }); | |
| } | |
| config.voice_providers = new_entries; | |
| } | |
| if let Some(providers) = p.voice_providers { | |
| let mut seen_slugs = std::collections::HashSet::new(); | |
| let mut new_entries = Vec::with_capacity(providers.len()); | |
| for update in providers { | |
| let slug = update.slug.trim().to_lowercase(); | |
| if !seen_slugs.insert(slug.clone()) { | |
| return Err(format!("duplicate voice provider slug '{slug}'")); | |
| } | |
| if is_voice_slug_reserved(&slug) { | |
| return Err(format!( | |
| "slug '{}' is reserved and cannot be used for a voice provider", | |
| slug | |
| )); | |
| } | |
| let capability = match update.capability.as_deref() { | |
| Some("stt") => VoiceCapability::Stt, | |
| Some("tts") => VoiceCapability::Tts, | |
| Some("both") | None => VoiceCapability::Both, | |
| Some(other) => { | |
| return Err(format!( | |
| "invalid capability '{other}' (valid: 'stt', 'tts', 'both')" | |
| )) | |
| } | |
| }; | |
| let auth_style = match update.auth_style.as_deref() { | |
| Some("bearer") | None => crate::openhuman::config::schema::AuthStyle::Bearer, | |
| Some("none") => crate::openhuman::config::schema::AuthStyle::None, | |
| Some(other) => { | |
| return Err(format!( | |
| "invalid auth_style '{other}' for voice provider (valid: 'bearer', 'none')" | |
| )) | |
| } | |
| }; | |
| let stt_api_style = match update.stt_api_style.as_deref() { | |
| Some("deepgram") => SttApiStyle::Deepgram, | |
| Some("openai_audio") | None => SttApiStyle::OpenaiAudio, | |
| Some(other) => { | |
| return Err(format!( | |
| "invalid stt_api_style '{other}' (valid: 'openai_audio', 'deepgram')" | |
| )) | |
| } | |
| }; | |
| let tts_api_style = match update.tts_api_style.as_deref() { | |
| Some("elevenlabs") => TtsApiStyle::ElevenLabs, | |
| Some("openai_audio") | None => TtsApiStyle::OpenaiAudio, | |
| Some(other) => { | |
| return Err(format!( | |
| "invalid tts_api_style '{other}' (valid: 'openai_audio', 'elevenlabs')" | |
| )) | |
| } | |
| }; | |
| let id = update | |
| .id | |
| .filter(|id| !id.trim().is_empty()) | |
| .or_else(|| { | |
| config | |
| .voice_providers | |
| .iter() | |
| .find(|e| e.slug == slug) | |
| .map(|e| e.id.clone()) | |
| }) | |
| .unwrap_or_else(|| generate_voice_provider_id(&slug)); | |
| let label = update.label.unwrap_or_else(|| slug.clone()); | |
| let endpoint = update.endpoint.unwrap_or_default(); | |
| new_entries.push(VoiceProviderCreds { | |
| id, | |
| slug, | |
| label, | |
| endpoint, | |
| auth_style, | |
| capability, | |
| stt_api_style, | |
| tts_api_style, | |
| default_stt_model: update.default_stt_model, | |
| default_tts_voice: update.default_tts_voice, | |
| }); | |
| } | |
| config.voice_providers = new_entries; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/voice/schemas.rs` around lines 862 - 944, The update loop
allows duplicate slugs in the incoming providers list which makes provider
resolution nondeterministic; fix by validating uniqueness of the normalized slug
before accepting entries: when iterating providers in the block that constructs
new_entries (the loop creating slug and building VoiceProviderCreds), maintain a
HashSet<String> (or similar) of seen slugs and if you encounter a slug already
in the set return Err with a clear message about duplicate slug; add the slug to
the set only after it passes the reserved check and normalization; ensure this
check references the same slug variable and affects the config.voice_providers
assignment so duplicate entries are rejected rather than stored.
| let client = reqwest::Client::new(); | ||
|
|
||
| // ElevenLabs: GET /user/subscription requires only basic auth (no | ||
| // extra scopes like voices_read). OpenAI / generic: GET /models. | ||
| let url = if slug == "elevenlabs" { | ||
| format!("{endpoint}/user/subscription") | ||
| } else { | ||
| format!("{endpoint}/models") | ||
| }; | ||
|
|
||
| let mut req = client.get(&url); | ||
| if slug == "elevenlabs" { | ||
| req = req.header("xi-api-key", &api_key); | ||
| } else { | ||
| req = req.header("Authorization", format!("Bearer {api_key}")); | ||
| } | ||
|
|
||
| let resp = req | ||
| .send() | ||
| .await | ||
| .map_err(|e| format!("request failed: {e}"))?; |
There was a problem hiding this comment.
Key-validation HTTP call should have a timeout.
validate_tts_provider_key performs an external call without an explicit timeout, which can stall the RPC path under upstream latency/outage.
Proposed fix pattern
- let client = reqwest::Client::new();
+ let client = reqwest::Client::builder()
+ .timeout(std::time::Duration::from_secs(10))
+ .build()
+ .map_err(|e| format!("http client init failed: {e}"))?;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let client = reqwest::Client::new(); | |
| // ElevenLabs: GET /user/subscription requires only basic auth (no | |
| // extra scopes like voices_read). OpenAI / generic: GET /models. | |
| let url = if slug == "elevenlabs" { | |
| format!("{endpoint}/user/subscription") | |
| } else { | |
| format!("{endpoint}/models") | |
| }; | |
| let mut req = client.get(&url); | |
| if slug == "elevenlabs" { | |
| req = req.header("xi-api-key", &api_key); | |
| } else { | |
| req = req.header("Authorization", format!("Bearer {api_key}")); | |
| } | |
| let resp = req | |
| .send() | |
| .await | |
| .map_err(|e| format!("request failed: {e}"))?; | |
| let client = reqwest::Client::builder() | |
| .timeout(std::time::Duration::from_secs(10)) | |
| .build() | |
| .map_err(|e| format!("http client init failed: {e}"))?; | |
| // ElevenLabs: GET /user/subscription requires only basic auth (no | |
| // extra scopes like voices_read). OpenAI / generic: GET /models. | |
| let url = if slug == "elevenlabs" { | |
| format!("{endpoint}/user/subscription") | |
| } else { | |
| format!("{endpoint}/models") | |
| }; | |
| let mut req = client.get(&url); | |
| if slug == "elevenlabs" { | |
| req = req.header("xi-api-key", &api_key); | |
| } else { | |
| req = req.header("Authorization", format!("Bearer {api_key}")); | |
| } | |
| let resp = req | |
| .send() | |
| .await | |
| .map_err(|e| format!("request failed: {e}"))?; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/voice/schemas.rs` around lines 1178 - 1198, The call in
validate_tts_provider_key uses reqwest::Client::new() and then sends req without
a timeout; change to set an explicit timeout (e.g. a few seconds) either by
constructing the client with
Client::builder().timeout(Duration::from_secs(...)).build()? or by applying
RequestBuilder::timeout(...) on req before .send(), and import
std::time::Duration; ensure the timeout is applied to the request that creates
resp so the .send().await cannot hang indefinitely.
| // Accept slug:model grammar or bare slug — the factory will | ||
| // validate against the voice_providers registry at dispatch time. | ||
| if other.contains(':') || !other.is_empty() { | ||
| Ok(()) | ||
| } else { | ||
| Err(format!( | ||
| "invalid stt_provider '{other}' (valid: 'cloud', 'whisper', or '<slug>:<model>')" | ||
| )) |
There was a problem hiding this comment.
Routing validation accepts malformed provider strings.
Current checks allow any non-empty value (including malformed forms), so invalid provider strings can be persisted and fail later at dispatch time.
Proposed fix
fn validate_stt_provider(provider: &str) -> Result<(), String> {
- match provider {
+ let provider = provider.trim();
+ match provider {
"cloud" | "openhuman" | "whisper" => Ok(()),
other => {
- // Accept slug:model grammar or bare slug — the factory will
- // validate against the voice_providers registry at dispatch time.
- if other.contains(':') || !other.is_empty() {
- Ok(())
- } else {
- Err(format!(
- "invalid stt_provider '{other}' (valid: 'cloud', 'whisper', or '<slug>:<model>')"
- ))
- }
+ let (slug, model) = other.split_once(':').unwrap_or((other, ""));
+ if slug.trim().is_empty() || (other.contains(':') && model.trim().is_empty()) {
+ Err(format!(
+ "invalid stt_provider '{other}' (valid: 'cloud', 'whisper', or '<slug>:<model>')"
+ ))
+ } else {
+ Ok(())
+ }
}
}
}Also applies to: 1259-1264
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/voice/schemas.rs` around lines 1242 - 1249, The validator
currently returns Ok for any non-empty or colon-containing string, allowing
malformed providers; update the check around the variable other to only accept
"cloud" or "whisper" or a well-formed slug:model by: if other == "cloud" ||
other == "whisper" return Ok, else if other.contains(':') split_once(':') and
ensure both left (slug) and right (model) are non-empty and the slug matches the
expected pattern (e.g., alphanumeric, dashes/underscores) before returning Ok,
otherwise return Err with the existing message; apply the same stricter
validation logic to the second occurrence noted (lines ~1259-1264).
Summary
voice_providersconfig, slug:model routing grammar, shared auth-profiles.json keys.VoiceProviderCreds,ExternalSttProvider(OpenAI-compat + Deepgram),ExternalTtsProvider(OpenAI-compat + ElevenLabs), 3 new RPCs, 240+ tests.Problem
Solution
cloud_providers+ per-workload routing pattern for voice.voice_providers: Vec<VoiceProviderCreds>+stt_provider/tts_providerrouting strings onConfig.create_stt_provider/create_tts_providerwith<slug>:<model>grammar and slug lookup.voice_update_provider_settings,voice_list_models,voice_test_provider(withvalidate_onlyflag).voiceSettingsApi.tsfacade,VoicePanelredesigned with chip toggles + modal.effective_stt_provider/effective_tts_providerfall back to legacylocal_ai.*fields; no migration needed.Submission Checklist
Impact
None/empty, existing configs work via legacy fallback.openaikey works for both LLM and voice.Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
Validation Run
pnpm --filter openhuman-app format:checkpnpm typecheckValidation Blocked
command:N/Aerror:N/Aimpact:N/ABehavior Changes
Parity Contract
effective_stt_provider/effective_tts_providerfallback chain.voice_update_provider_settingssyncs both top-level and legacylocal_ai.*fields.Duplicate / Superseded PR Handling
Summary by CodeRabbit
Release Notes