Real-time AI voice translation with on-device speech-to-text, LLM translation, and neural text-to-speech — running entirely in your browser via WebAssembly. No server, no API key, full privacy.
| Feature | Details |
|---|---|
| 🎙️ Live voice recording | Tap the mic, speak — VAD auto-detects when you've finished |
| 🧠 On-device STT | Whisper Tiny EN (sherpa-onnx WASM) |
| 🤖 On-device LLM translation | LFM2 350M (llama.cpp WASM, Liquid AI) |
| 🔊 On-device TTS | Piper TTS EN Lessac (sherpa-onnx WASM) |
| 🌍 Language selector | Source + Target language (8 languages) |
| 🎭 Voice clone toggle | UI ready (pipeline hook available) |
| 📊 Live metrics | Real latency, accuracy, words/sec from each turn |
| 🌙 Dark / Light mode | System-aware toggle |
| 📱 PWA | Installable on mobile & desktop |
| ⚡ Vite + Bun | Sub-200ms builds, fast HMR |
User speaks
│
▼
AudioCapture (16 kHz mic)
│
▼
VAD — Silero VAD v5 (detects speech end)
│ ← popSpeechSegment()
▼
VoicePipeline.processTurn(audioSamples)
│
├── STT ——→ Whisper Tiny EN ——→ transcript text
│
├── LLM ——→ LFM2 350M ——→ translation (streamed token by token)
│
└── TTS ——→ Piper EN Lessac ——→ Float32Array audio → AudioPlayback
All 4 models run in-browser using WebAssembly (llama.cpp + sherpa-onnx). Zero network calls after the one-time model download.
- Bun ≥ 1.x
- Chrome 96+ or Edge 96+ (required for WebGPU / SharedArrayBuffer)
# Clone / enter the project
cd WebVoiceAgent
# Install dependencies
bun install
# Start dev server (with COOP/COEP headers for SharedArrayBuffer)
bun run devOpen http://localhost:5173 and tap the microphone button.
⚠️ First run: The app will download ~425 MB of AI models (VAD 5MB + STT 105MB + LLM 250MB + TTS 65MB). These are stored permanently in your browser's OPFS — subsequent starts load instantly from cache.
WebVoiceAgent/
├── public/ # Static assets (favicon, PWA icons)
├── src/
│ ├── components/ # Reusable UI components
│ │ ├── Header.jsx # App bar, theme, and status
│ │ ├── RecordingHub.jsx # Mic & Upload interaction center
│ │ ├── ConfigPanel.jsx # Language & download settings
│ │ ├── SourceTranscript.jsx # User speech transcription
│ │ ├── TranslatedAudio.jsx # Translation playback & text
│ │ ├── MetricsDisplay.jsx # Latency & speed metrics
│ │ └── CustomAudioPlayer.jsx # Premium player with progress
│ ├── utils/ # Logic helpers
│ │ ├── audio.js # WAV encoding utilities
│ │ └── time.js # Timestamp formatting
│ ├── App.jsx # Main orchestrator (state & pipeline)
│ ├── runanywhere.js # SDK init & model registration
│ ├── constants.js # Language lists & stage labels
│ ├── index.css # Tailwind + custom brand styles
│ └── main.jsx # React entry point
├── .env # Model IDs & LLM settings
├── vite.config.js # Vite + PWA + WASM config
└── RunanywhereGuide.md # SDK reference docs
All model IDs and LLM settings live in .env. Change these to swap models without touching code.
# RunAnywhere Model IDs
VITE_MODEL_LLM_ID=lfm2-350m-q4_k_m
VITE_MODEL_STT_ID=sherpa-onnx-whisper-tiny.en
VITE_MODEL_TTS_ID=vits-piper-en_US-lessac-medium
VITE_MODEL_VAD_ID=silero-vad-v5
# LLM generation settings
VITE_LLM_MAX_TOKENS=80
VITE_LLM_TEMPERATURE=0.7| Slot | Current | Upgrade option |
|---|---|---|
| LLM | lfm2-350m-q4_k_m |
lfm2-1.2b-tool-q4_k_m (800MB, better quality) |
| STT | sherpa-onnx-whisper-tiny.en |
Whisper Base EN (larger, more accurate) |
| TTS | vits-piper-en_US-lessac-medium |
Any Piper voice |
| VAD | silero-vad-v5 |
Same (no alternatives needed) |
This file initialises the RunAnywhere SDK once (idempotent cached-promise pattern) and registers all 4 AI models pulled from .env.
await initSDK(); // safe to call from multiple components — runs only onceIt exports:
initSDK()— async, idempotent SDK initModelManager— download/load/query modelsModelCategory— enum for VAD, STT, LLM, TTSMODEL_IDS— model IDs read from.env
The main App component acts as the central state machine and orchestrator. It uses specific components from src/components to keep the UI modular and clean.
| Stage | Meaning |
|---|---|
idle |
Waiting for user to tap mic |
sdk_init |
Initialising WASM backends |
downloading |
Downloading model files (cached in OPFS) |
loading |
Loading models into memory |
listening |
Mic active, VAD processing audio |
processing |
Speech segment detected, running STT |
generating |
LLM generating translation |
speaking |
TTS audio playing |
error |
Error state shown in RecordingHub |
The UI is broken down into small, focused components:
RecordingHub: Handles microphone capture, file uploads, and status animations.SourceTranscript: Manages the list of source speech with quick playback.TranslatedAudio: Displays translated text and handles the target audio playback with a progress bar.ConfigPanel: Manages source/target languages and voice cloning preferences.CustomAudioPlayer: A premium audio interaction module used for both source and target playback.MetricsDisplay: Displays latency, accuracy, and speed at a glance.
Critical settings for RunAnywhere to work:
server: {
headers: {
'Cross-Origin-Opener-Policy': 'same-origin',
'Cross-Origin-Embedder-Policy': 'credentialless',
}
}
// ↑ Required for SharedArrayBuffer (multi-threaded WASM)
optimizeDeps: {
exclude: ['@runanywhere/web-llamacpp', '@runanywhere/web-onnx'],
}
// ↑ Critical: prevents Vite from pre-bundling WASM packages
// so import.meta.url resolves to correct WASM file paths
worker: { format: 'es' }
// ↑ Required for VLM web workers (future use)The copyWasmPlugin() copies WASM binaries from node_modules into dist/assets/ at build time so they're served correctly in production.
bun run dev # Start dev server at http://localhost:5173 (with COOP/COEP headers)
bun run build # Production build → dist/
bun run preview # Serve the built dist/ at http://localhost:4173The dist/ folder is a self-contained static site. Deploy it anywhere that supports custom HTTP headers.
{
"headers": [
{
"source": "/(.*)",
"headers": [
{ "key": "Cross-Origin-Opener-Policy", "value": "same-origin" },
{ "key": "Cross-Origin-Embedder-Policy", "value": "credentialless" }
]
},
{
"source": "/assets/(.*).wasm",
"headers": [
{ "key": "Content-Type", "value": "application/wasm" },
{ "key": "Cache-Control", "value": "public, max-age=31536000, immutable" }
]
}
]
}[[headers]]
for = "/*"
[headers.values]
Cross-Origin-Opener-Policy = "same-origin"
Cross-Origin-Embedder-Policy = "credentialless"
⚠️ The COOP/COEP headers are required. Without them,SharedArrayBufferis unavailable and WASM falls back to single-threaded mode (significantly slower).
| Package | Purpose |
|---|---|
react + react-dom |
UI framework |
@runanywhere/web |
Core SDK — RunAnywhere, ModelManager, VoicePipeline, AudioCapture, EventBus |
@runanywhere/web-llamacpp |
LLM/VLM backend — llama.cpp compiled to WASM |
@runanywhere/web-onnx |
STT/TTS/VAD backend — sherpa-onnx compiled to WASM |
vite |
Build tool |
vite-plugin-pwa |
Progressive Web App support |
tailwindcss |
Styling |
All models are downloaded from HuggingFace on first use and cached locally in the browser's OPFS.
| Model | HuggingFace Repo | Size | Role |
|---|---|---|---|
| LFM2 350M Q4_K_M | LiquidAI/LFM2-350M-GGUF |
~250 MB | Translation LLM |
| Whisper Tiny EN | runanywhere/sherpa-onnx-whisper-tiny.en |
~105 MB | Speech-to-Text |
| Piper TTS Lessac | runanywhere/vits-piper-en_US-lessac-medium |
~65 MB | Text-to-Speech |
| Silero VAD v5 | runanywhere/silero-vad-v5 |
~5 MB | Voice Activity Detection |
Total download: ~425 MB (one-time, then cached forever in OPFS).
- Find a Piper voice model for your language
- Add the model entry to
src/runanywhere.js:{ id: 'vits-piper-de_DE-thorsten-medium', name: 'Piper TTS German', url: 'https://huggingface.co/runanywhere/vits-piper-de_DE-thorsten-medium/resolve/main/vits-piper-de_DE-thorsten-medium.tar.gz', framework: LLMFramework.ONNX, modality: ModelCategory.SpeechSynthesis, memoryRequirement: 65_000_000, artifactType: 'archive', }
- Update
.env:VITE_MODEL_TTS_ID=vits-piper-de_DE-thorsten-medium
| Problem | Fix |
|---|---|
| WASM fails to load | Ensure COOP/COEP headers are set in dev server |
| Tab crashes on download | Close other browser tabs, use Chrome/Edge |
| Safari issues | Use Chrome or Edge — Safari has limited OPFS support |
SharedArrayBuffer is not defined |
Missing COOP/COEP headers — check vite.config.js |
| Models not found after page refresh | Normal — they load from OPFS cache automatically |
WASM expected magic word error |
Static files are being intercepted — check server routing |
MIT — feel free to use, modify, and deploy.
Built with ❤️ using RunAnywhere Web SDK, React, Vite, Bun, and Tailwind CSS.