Skip to content

nillebco/stts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice

Text-to-speech, speech-to-text, and voice cloning CLI for Apple Silicon, built on mlx-audio.

Requirements

  • macOS with Apple Silicon (M-series)
  • uv
  • ffmpeg (for non-WAV audio conversion in the API server)

Setup

uv sync

CLI Usage

Text-to-Speech

# Use default model (Kokoro) and voice
uv run voice.py say "Hello world!"

# Choose a model and voice
uv run voice.py say "Bonjour !" -m voxtral -v fr_male

# Save without playing
uv run voice.py say "Hello" -o greeting.wav --no-play

Voice Cloning

uv run voice.py clone "Text to speak" reference.wav
uv run voice.py clone "Text to speak" reference.wav -m voxtral

Speech-to-Text

uv run voice.py transcribe audio.wav
uv run voice.py transcribe audio.wav --stream

List Voices and Models

uv run voice.py voices              # all voices
uv run voice.py voices -m kokoro    # kokoro voices only
uv run voice.py models              # available model shortcuts

Available Models

Shortcut Model ID
kokoro, kokoro-tts (default) mlx-community/Kokoro-82M-bf16
voxtral, voxtral-tts mlx-community/Voxtral-4B-TTS-2603-mlx-4bit

You can also pass any full Hugging Face model ID with -m.

API Server

Starts an OpenAI-compatible transcription API on port 4444.

uv run voice.py serve
uv run voice.py serve -p 8080    # custom port

Endpoint

POST /v1/audio/transcriptions
Parameter Type Default Description
file UploadFile required Audio file (WAV, WebM, MP3, MP4, OGG, FLAC, AAC)
model string "base" Model identifier
language string "en" Language code

Response:

{"text": "transcribed text"}

Example:

curl -X POST http://localhost:4444/v1/audio/transcriptions \
  -F "file=@recording.webm" \
  -F "language=en"

macOS Service

Use ./cli to manage a persistent background service via launchd.

./cli install     # install and start on login
./cli status      # check if running
./cli logs        # tail logs
./cli restart     # restart the service
./cli stop        # stop the service
./cli uninstall   # stop and remove the service

Logs are written to /tmp/voice-tts.log and /tmp/voice-tts.err.

About

Speech to text (STT) and Text to speech (TTS) CLI and server, optimized for MacOS.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors