中文 | English
A local-first desktop app for voice cloning. Download and use — high-quality speech synthesis and voice cloning run entirely on your machine!
- Fully Offline — After model download, all inference runs locally with no network required and no privacy concerns
- Zero Configuration — First launch automatically handles environment detection, runtime download, model download & warm-up
- High-Quality Voice Cloning — Powered by the VoxCPM engine, supporting bilingual (Chinese & English) speech synthesis and voice cloning
- Fine-Grained Control — Adjustable CFG guidance scale, inference steps, seed, text normalization, post-processing denoising, and more
- Extreme Clone Mode — Uses reference audio transcription to further improve voice fidelity
- Built-in ASR — Automatically transcribes reference audio with the SenseVoice ASR engine, with manual editing support
- Dual Model Sources — Download models from Hugging Face or ModelScope, with automatic source recommendation
- Bilingual UI — Chinese and English interface
| Item | Requirement |
|---|---|
| OS | macOS 14.0 (Sonoma) or later |
| Chip | Apple Silicon (M1/M2/M3/M4) |
| Disk Space | ~6 GB (app + models) |
- Go to the Releases page and download the latest
.dmgfile - Open the DMG and drag Voca into the Applications folder
- On first launch, follow the guided setup to download models and start using the app
About App Signing & Notarization
Voca is signed with an Apple Developer ID and has been successfully notarized by Apple, so it is safe to run on macOS.
If you still hit a Gatekeeper warning on first launch (e.g. "Voca" cannot be opened, "Voca is damaged and can't be opened", or "cannot verify the developer"), it's usually because macOS has attached a quarantine attribute to files downloaded via the browser. You can remove the quarantine flag by running the following command in Terminal:
sudo xattr -dr com.apple.quarantine /Applications/Voca.appThen reopen Voca. Alternatively, open System Settings → Privacy & Security and click Open Anyway.
Voca includes a complete onboarding flow:
Environment Check → Runtime Download → Model Download & Verification → Model Warm-up → Ready to Use
Just follow the on-screen instructions — no manual configuration needed.
Enter text, select a model and voice, and generate high-quality speech with one click. Supports queued task management for submitting multiple generation requests simultaneously.
Adjustable generation parameters:
| Parameter | Description |
|---|---|
| CFG Scale | Controls generation guidance strength |
| Inference Steps | Balance between quality and speed |
| Seed | Fix seed for reproducible results, or randomize |
| Text Normalization | Automatically handles numbers, abbreviations, etc. |
| Post-Processing Denoise | Removes background noise after generation |
| Extreme Clone Mode | Uses reference audio transcription to improve voice cloning fidelity |
Manage preset and custom voices. When creating custom voices, upload reference audio and the built-in SenseVoice ASR engine will automatically transcribe the text, with support for manual editing.
View all task statuses (queued / generating / completed / failed / cancelled). Completed tasks can be played back and exported as audio files.
Built-in model catalog with support for downloading from Hugging Face or ModelScope, with automatic recommendation of the optimal source based on your network. Manage TTS models and auxiliary models (ASR, audio enhancement).
Check for new versions in Settings. When an update is available, the app opens the corresponding Release page for download.
| Layer | Technology |
|---|---|
| Desktop Framework | Tauri 2 (Rust) |
| Frontend | React 19 + TypeScript + Vite |
| Inference Service | Python (FastAPI + Uvicorn) sidecar |
| Speech Engine | VoxCPM |
| Runtime | Python 3.11+ |
| Platform | macOS 14.0+ (Apple Silicon) |
Upcoming development directions. Priorities may shift based on community feedback.
- Lighter inference backend — Migrate ASR from PyTorch/FunASR to ONNX Runtime, significantly reducing app size and model download size
- Quantized model support — INT8 and other quantized inference to lower memory and disk usage
- Richer TTS capabilities — Support for more TTS models and expanded speech synthesis features
- Windows support
Have ideas or suggestions? Let us know via Issues.
Note: Voca is still in its early stages. The engineering experience (build process, developer docs, code structure, etc.) may not be fully polished yet. If you run into any issues while using or developing, we'd love for you to open an Issue or contribute directly — let's make it better together.
Ways to get involved:
- Submit bug reports or feature requests → Issues
- Submit code improvements → Pull Request
- Improve documentation or translations
- Currently macOS (Apple Silicon) only; Windows support is planned
- First launch requires an internet connection to download models (~1–2 GB); fully offline after that
- Voice cloning quality depends heavily on reference audio quality — clean audio with no background noise is recommended
- VoxCPM — Speech synthesis engine
- Tauri — Desktop application framework
- SenseVoice — Speech recognition model
- Model: Claude Opus 4.6 & GPT-5.4
This project is licensed under the Apache License 2.0.


