Skip to content

Latest commit

 

History

History
167 lines (129 loc) · 5.08 KB

File metadata and controls

167 lines (129 loc) · 5.08 KB

API Reference

This page lists the main public APIs. The TypeScript types in the package are the source of truth.

KittenTTS.create(options?, onProgress?)

Creates and initializes a TTS instance. It resolves config, prepares the phonemizer, loads or downloads model assets, reads voices.npz, and creates the ONNX Runtime session.

const tts = await KittenTTS.create({
  model: KittenModel.NanoInt8,
  defaultVoice: KittenVoice.Luna,
  speed: 1.1,
  player: createExpoAudioPlayer(ExpoAudio),
});

The progress callback receives a number from 0 to 1. The optional second argument describes the current stage:

const tts = await KittenTTS.create(options, (progress, info) => {
  if (info?.stage === 'cached') {
    console.log('model is already downloaded');
  }

  console.log(Math.round(progress * 100));
});

Common Options

Option Default Description
model KittenModel.Nano Model variant
defaultVoice KittenVoice.Bella Voice used when omitted
speed 1.0 Speech speed from 0.5 to 2.0
storageDirectory Document directory Custom model cache root
modelBaseURL Hugging Face URL Custom mirror/self-hosted model directory
modelFiles none Local ONNX and voices.npz paths/data
downloadRetries 4 Total download attempts per model file
ortNumThreads 4 ONNX Runtime thread count
maxTokensPerChunk 400 Long-text chunk size
trimTrailingSilence true Trim near-silent audio at chunk ends
silenceThreshold 0.005 Amplitude threshold for silence trimming
maxSilenceTrimMs 250 Maximum trailing silence removed per chunk
phonemizer CEPhonemizer Custom text-to-IPA converter
forceRedownload false Redownload model files before creating
player none Required for speak() and play()

tts.generate(text, voice?, speed?)

Synthesizes speech and returns a KittenTTSResult without playing it.

const result = await tts.generate('Save this as audio.', KittenVoice.Jasper);

wordTimings may be empty when duration output is unavailable or when the text is split across multiple model chunks.

tts.generateStreaming(text, voice?, speed?)

Synthesizes long text sentence by sentence.

for await (const chunk of tts.generateStreaming(longText, KittenVoice.Luna)) {
  await tts.play(chunk);
}

tts.speak(text, voice?, speed?)

Synthesizes speech and plays it through the configured player.

await tts.speak('Play this sentence.', KittenVoice.Rosie, 1.1);

tts.play(result, options?)

Plays a previously generated result.

const result = await tts.generate('Highlight words while this plays.');

await tts.play(result, {
  onPlaybackStart: () => startWordHighlighting(result.wordTimings),
});

KittenTTSResult

Property or method Description
samples Raw mono Float32Array PCM
sampleRate Always 24000
duration Audio duration in seconds
voice Voice used for generation
effectiveSpeed Speed after model-specific adjustments
inputText Input text that was synthesized
wordTimings Per-word { wordIndex, word, startTime, endTime }[]
wavData() Complete 16-bit PCM WAV as Uint8Array
wavBase64() Complete WAV as a base64 string

Cache Methods

Method Description
KittenTTS.isModelCached(config?) Checks whether model files exist locally
KittenTTS.isModelDownloaded(config?) App-facing alias for model cache checks
KittenTTS.getModelCacheInfo(config?) Returns cache paths and file existence
KittenTTS.predownload(config?, onProgress?) Downloads model and phonemizer assets
KittenTTS.prewarm(config?, onProgress?) Deprecated alias for predownload()
KittenTTS.redownloadModel(config?, onProgress?) Deletes and downloads the selected model
KittenTTS.clearModelCache(config?) Deletes cached files for the selected model

Bundled Asset Helper

createBundledAssetConfig(manifest, options) creates a KittenTTSConfig from the manifest generated by:

npx @kittentts/react-native bundle-assets

Use it with the Expo config plugin or pass basePath when files are available as normal filesystem paths.

const config = await createBundledAssetConfig(manifest, {
  model: KittenModel.NanoInt8,
});

See bundled offline assets.

Errors

SDK errors are surfaced as KittenTTSError when possible.

import {
  KittenTTSErrorCode,
  isKittenTTSError,
} from '@kittentts/react-native';

try {
  await tts.speak('Hello.');
} catch (error) {
  if (isKittenTTSError(error)) {
    console.log(error.code, error.message);
  }
}
Code Meaning
EMPTY_INPUT Text was empty
DOWNLOAD_FAILED Model or phonemizer download failed
INVALID_MODEL_DATA Cached model data could not be parsed
PHONEMIZER_FAILED Text-to-phoneme conversion failed
INFERENCE_FAILED ONNX Runtime setup or inference failed
PLAYBACK_FAILED Audio playback failed