Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions skills/hyperframes/references/tts.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,90 @@ npx hyperframes tts script.txt --voice af_heart --output narration.wav
npx hyperframes transcribe narration.wav # → transcript.json with word-level timestamps
```

## Alternative: ElevenLabs API

For production-quality voices with custom voice cloning, use [ElevenLabs](https://elevenlabs.io) as an external TTS provider.

### Setup

1. Create an ElevenLabs account and get your API key from [elevenlabs.io/app/settings](https://elevenlabs.io/app/settings)
2. Set the environment variable:

```bash
export ELEVENLABS_API_KEY=your_api_key_here
```

3. Install the ElevenLabs Python SDK:

```bash
pip install elevenlabs
```

### Generate Speech

```bash
# List available voices
elevenlabs voices list

# Generate speech with a specific voice
elevenlabs text-to-speech "Your narration script here" --voice Rachel --output narration.wav
```

### Voice Cloning

ElevenLabs supports instant voice cloning from a 30-second audio sample:

```bash
# Clone a voice from an audio sample
elevenlabs voices add --name "MyVoice" --file reference_audio.wav

# Use the cloned voice
python3 -c "
from elevenlabs import generate, play
audio = generate(text='Hello world', voice='MyVoice')
play(audio)
"
```

### Integration with HyperFrames

Generate the narration with ElevenLabs, then use it in your composition:

```bash
# Step 1: Generate narration
elevenlabs text-to-speech -f script.txt --voice Rachel --output narration.wav

# Step 2: Transcribe for captions
npx hyperframes transcribe narration.wav # → transcript.json

# Step 3: Use in your composition
```

```html
<audio
id="narration"
data-start="0"
data-duration="auto"
data-track-index="2"
src="narration.wav"
data-volume="1"
></audio>
```

### Kokoro vs ElevenLabs

| Feature | Kokoro (local) | ElevenLabs (API) |
|---------|---------------|-------------------|
| Cost | Free | $5+/month |
| Latency | ~2s | ~0.5s |
| Voice quality | Good | Excellent |
| Voice cloning | No | Yes |
| Languages | 8 | 29+ |
| Offline | Yes | No |
| Setup | pip install | API key |

## Requirements

- Python 3.8+ with `kokoro-onnx` and `soundfile`
- Model downloads on first use (~311 MB + ~27 MB voices, cached in `~/.cache/hyperframes/tts/`)
- For ElevenLabs: `pip install elevenlabs` and `ELEVENLABS_API_KEY`