tts: Improve multilingual phonemization with explicit --lang support
Problem
hyperframes tts currently provides no way to explicitly set phonemization language from CLI.
For multilingual use cases, this can lead to non-English text sounding English-like even when a non-English voice ID is selected.
From current source behavior:
packages/cli/src/commands/tts.ts exposes --voice and --speed, but no --lang option.
packages/cli/src/tts/synthesize.ts calls Kokoro like:
model.create(text, voice=voice, speed=speed)
No explicit language is forwarded in the synth call.
This makes multilingual output less predictable and increases confusion for users who expect voice locale + text locale to produce language-correct phonemization.
Proposed solution
Add explicit language control to CLI TTS and pass it through to synthesis.
Proposed changes
- Add
--lang <code> to hyperframes tts.
- Extend synth options to include
lang.
- Pass
lang through to Kokoro synthesis call.
- Preserve backward compatibility when
--lang is omitted.
- (Optional) Add a warning when
--voice and --lang appear mismatched.
Example usage
npx hyperframes tts "La reunión empieza a las nueve y media." --voice ef_dora --lang es --output speech-es.wav
Why this is valuable
- Improves multilingual output quality and predictability.
- Reduces support confusion around “voice exists but pronunciation sounds wrong”.
- Keeps current English-first workflows intact.
Alternatives considered
-
Documentation-only workaround
- Ask users to bypass CLI and call
kokoro_onnx directly with explicit language.
-
Voice-prefix auto-inference only (no --lang)
- Infer language from voice IDs automatically.
Additional context
- This request is intentionally backward-compatible:
--lang can be optional.
- The issue is about explicit multilingual control, not replacing current defaults.
- Related but different: existing TTS discussions around external providers (e.g., ElevenLabs) do not address this CLI language-control gap.
tts: Improve multilingual phonemization with explicit
--langsupportProblem
hyperframes ttscurrently provides no way to explicitly set phonemization language from CLI.For multilingual use cases, this can lead to non-English text sounding English-like even when a non-English voice ID is selected.
From current source behavior:
packages/cli/src/commands/tts.tsexposes--voiceand--speed, but no--langoption.packages/cli/src/tts/synthesize.tscalls Kokoro like:No explicit language is forwarded in the synth call.
This makes multilingual output less predictable and increases confusion for users who expect voice locale + text locale to produce language-correct phonemization.
Proposed solution
Add explicit language control to CLI TTS and pass it through to synthesis.
Proposed changes
--lang <code>tohyperframes tts.lang.langthrough to Kokoro synthesis call.--langis omitted.--voiceand--langappear mismatched.Example usage
npx hyperframes tts "La reunión empieza a las nueve y media." --voice ef_dora --lang es --output speech-es.wavWhy this is valuable
Alternatives considered
Documentation-only workaround
kokoro_onnxdirectly with explicit language.Voice-prefix auto-inference only (no
--lang)Additional context
--langcan be optional.