Skip to content

tts: Improve multilingual phonemization with explicit --lang support #349

@BaltasarOrtiz

Description

@BaltasarOrtiz

tts: Improve multilingual phonemization with explicit --lang support

Problem

hyperframes tts currently provides no way to explicitly set phonemization language from CLI.

For multilingual use cases, this can lead to non-English text sounding English-like even when a non-English voice ID is selected.

From current source behavior:

  • packages/cli/src/commands/tts.ts exposes --voice and --speed, but no --lang option.
  • packages/cli/src/tts/synthesize.ts calls Kokoro like:
model.create(text, voice=voice, speed=speed)

No explicit language is forwarded in the synth call.

This makes multilingual output less predictable and increases confusion for users who expect voice locale + text locale to produce language-correct phonemization.


Proposed solution

Add explicit language control to CLI TTS and pass it through to synthesis.

Proposed changes

  1. Add --lang <code> to hyperframes tts.
  2. Extend synth options to include lang.
  3. Pass lang through to Kokoro synthesis call.
  4. Preserve backward compatibility when --lang is omitted.
  5. (Optional) Add a warning when --voice and --lang appear mismatched.

Example usage

npx hyperframes tts "La reunión empieza a las nueve y media." --voice ef_dora --lang es --output speech-es.wav

Why this is valuable

  • Improves multilingual output quality and predictability.
  • Reduces support confusion around “voice exists but pronunciation sounds wrong”.
  • Keeps current English-first workflows intact.

Alternatives considered

  1. Documentation-only workaround

    • Ask users to bypass CLI and call kokoro_onnx directly with explicit language.
  2. Voice-prefix auto-inference only (no --lang)

    • Infer language from voice IDs automatically.

Additional context

  • This request is intentionally backward-compatible: --lang can be optional.
  • The issue is about explicit multilingual control, not replacing current defaults.
  • Related but different: existing TTS discussions around external providers (e.g., ElevenLabs) do not address this CLI language-control gap.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions