Skip to content

rishiskhare/tts-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tts-rs

A Rust library for text-to-speech synthesis using the Kokoro neural TTS model via ONNX inference.

Features

  • Kokoro TTS engine — natural-sounding neural speech via ONNX Runtime
  • Multiple voices — 26 voices across 9 languages (English US & UK, Spanish, French, Hindi, Italian, Japanese, Portuguese Brazilian, Chinese Mandarin)
  • Streaming synthesis — audio playback begins before the full text is synthesized
  • CPU-only — no GPU required; runs efficiently on any modern CPU
  • Three precision levels — f32, f16, and int8 model variants

Installation

[dependencies]
tts-rs = { version = "2026.2.1", features = ["kokoro"] }

Available Features

Feature Description Dependencies
kokoro Kokoro neural TTS (ONNX) ort, ndarray, zip

No features are enabled by default. You must opt in explicitly.

Model Files

Download the following files from the taylorchu/kokoro-onnx v0.2.0 release:

File Size Description
kokoro-v1.0.onnx 310 MB Full precision (f32)
kokoro-v1.0.fp16.onnx 169 MB Half precision (f16)
kokoro-v1.0.int8.onnx 88 MB Quantized (int8) — recommended
voices-v1.0.bin Style vectors for all 26 voices (required)

The voices-v1.0.bin file is required regardless of which model variant you use. Place all downloaded files in the same directory and pass that path to load_model.

Usage

use tts_rs::engines::kokoro::KokoroEngine;
use std::path::PathBuf;

let mut engine = KokoroEngine::new();
engine.load_model(&PathBuf::from("models/kokoro"))?;

let audio = engine.synthesize("Hello, world!", Some("af_heart"), None)?;
// audio is a Vec<f32> of PCM samples at 24 kHz

Running the Example

cargo run --example kokoro --features kokoro

Acknowledgements

This library is derived from transcribe-rs by CJ Pais, which was itself built as the inference backend for the Handy project. The original library supported multiple speech-to-text (ASR) engines; this fork removes those entirely and repurposes the codebase to focus exclusively on Kokoro TTS synthesis.

ONNX model files are provided by taylorchu/kokoro-onnx. Additional reference and inspiration from thewh1teagle/kokoro-onnx. The underlying TTS model is Kokoro-82M by hexgrad.

License

MIT

About

Kokoro Text-To-Speech Inference on Rust + ONNX

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages