A Rust library for text-to-speech synthesis using the Kokoro neural TTS model via ONNX inference.
- Kokoro TTS engine — natural-sounding neural speech via ONNX Runtime
- Multiple voices — 26 voices across 9 languages (English US & UK, Spanish, French, Hindi, Italian, Japanese, Portuguese Brazilian, Chinese Mandarin)
- Streaming synthesis — audio playback begins before the full text is synthesized
- CPU-only — no GPU required; runs efficiently on any modern CPU
- Three precision levels — f32, f16, and int8 model variants
[dependencies]
tts-rs = { version = "2026.2.1", features = ["kokoro"] }| Feature | Description | Dependencies |
|---|---|---|
kokoro |
Kokoro neural TTS (ONNX) | ort, ndarray, zip |
No features are enabled by default. You must opt in explicitly.
Download the following files from the taylorchu/kokoro-onnx v0.2.0 release:
| File | Size | Description |
|---|---|---|
kokoro-v1.0.onnx |
310 MB | Full precision (f32) |
kokoro-v1.0.fp16.onnx |
169 MB | Half precision (f16) |
kokoro-v1.0.int8.onnx |
88 MB | Quantized (int8) — recommended |
voices-v1.0.bin |
— | Style vectors for all 26 voices (required) |
The voices-v1.0.bin file is required regardless of which model variant you use. Place all downloaded files in the same directory and pass that path to load_model.
use tts_rs::engines::kokoro::KokoroEngine;
use std::path::PathBuf;
let mut engine = KokoroEngine::new();
engine.load_model(&PathBuf::from("models/kokoro"))?;
let audio = engine.synthesize("Hello, world!", Some("af_heart"), None)?;
// audio is a Vec<f32> of PCM samples at 24 kHzcargo run --example kokoro --features kokoroThis library is derived from transcribe-rs by CJ Pais, which was itself built as the inference backend for the Handy project. The original library supported multiple speech-to-text (ASR) engines; this fork removes those entirely and repurposes the codebase to focus exclusively on Kokoro TTS synthesis.
ONNX model files are provided by taylorchu/kokoro-onnx. Additional reference and inspiration from thewh1teagle/kokoro-onnx. The underlying TTS model is Kokoro-82M by hexgrad.