From 4275cfe25d0a43be4739c3c3c1b382d1f59f93d0 Mon Sep 17 00:00:00 2001 From: Clarence Etnel Date: Thu, 23 Apr 2026 07:47:12 +0200 Subject: [PATCH] docs: add ElevenLabs TTS integration guide Adds documentation for using ElevenLabs as an alternative TTS provider, including setup, voice cloning, and a comparison table with Kokoro. Closes #337 --- skills/hyperframes/references/tts.md | 83 ++++++++++++++++++++++++++++ 1 file changed, 83 insertions(+) diff --git a/skills/hyperframes/references/tts.md b/skills/hyperframes/references/tts.md index c403564d8..6fafd8f03 100644 --- a/skills/hyperframes/references/tts.md +++ b/skills/hyperframes/references/tts.md @@ -69,7 +69,90 @@ npx hyperframes tts script.txt --voice af_heart --output narration.wav npx hyperframes transcribe narration.wav # → transcript.json with word-level timestamps ``` +## Alternative: ElevenLabs API + +For production-quality voices with custom voice cloning, use [ElevenLabs](https://elevenlabs.io) as an external TTS provider. + +### Setup + +1. Create an ElevenLabs account and get your API key from [elevenlabs.io/app/settings](https://elevenlabs.io/app/settings) +2. Set the environment variable: + +```bash +export ELEVENLABS_API_KEY=your_api_key_here +``` + +3. Install the ElevenLabs Python SDK: + +```bash +pip install elevenlabs +``` + +### Generate Speech + +```bash +# List available voices +elevenlabs voices list + +# Generate speech with a specific voice +elevenlabs text-to-speech "Your narration script here" --voice Rachel --output narration.wav +``` + +### Voice Cloning + +ElevenLabs supports instant voice cloning from a 30-second audio sample: + +```bash +# Clone a voice from an audio sample +elevenlabs voices add --name "MyVoice" --file reference_audio.wav + +# Use the cloned voice +python3 -c " +from elevenlabs import generate, play +audio = generate(text='Hello world', voice='MyVoice') +play(audio) +" +``` + +### Integration with HyperFrames + +Generate the narration with ElevenLabs, then use it in your composition: + +```bash +# Step 1: Generate narration +elevenlabs text-to-speech -f script.txt --voice Rachel --output narration.wav + +# Step 2: Transcribe for captions +npx hyperframes transcribe narration.wav # → transcript.json + +# Step 3: Use in your composition +``` + +```html + +``` + +### Kokoro vs ElevenLabs + +| Feature | Kokoro (local) | ElevenLabs (API) | +|---------|---------------|-------------------| +| Cost | Free | $5+/month | +| Latency | ~2s | ~0.5s | +| Voice quality | Good | Excellent | +| Voice cloning | No | Yes | +| Languages | 8 | 29+ | +| Offline | Yes | No | +| Setup | pip install | API key | + ## Requirements - Python 3.8+ with `kokoro-onnx` and `soundfile` - Model downloads on first use (~311 MB + ~27 MB voices, cached in `~/.cache/hyperframes/tts/`) +- For ElevenLabs: `pip install elevenlabs` and `ELEVENLABS_API_KEY`