🌐 Let'sTalk.live

Real-time AI voice translation with on-device speech-to-text, LLM translation, and neural text-to-speech — running entirely in your browser via WebAssembly. No server, no API key, full privacy.

✨ Features

Feature	Details
🎙️ Live voice recording	Tap the mic, speak — VAD auto-detects when you've finished
🧠 On-device STT	Whisper Tiny EN (sherpa-onnx WASM)
🤖 On-device LLM translation	LFM2 350M (llama.cpp WASM, Liquid AI)
🔊 On-device TTS	Piper TTS EN Lessac (sherpa-onnx WASM)
🌍 Language selector	Source + Target language (8 languages)
🎭 Voice clone toggle	UI ready (pipeline hook available)
📊 Live metrics	Real latency, accuracy, words/sec from each turn
🌙 Dark / Light mode	System-aware toggle
📱 PWA	Installable on mobile & desktop
⚡ Vite + Bun	Sub-200ms builds, fast HMR

🏗️ Architecture

User speaks
    │
    ▼
AudioCapture (16 kHz mic)
    │
    ▼
VAD — Silero VAD v5 (detects speech end)
    │   ← popSpeechSegment()
    ▼
VoicePipeline.processTurn(audioSamples)
    │
    ├── STT  ——→ Whisper Tiny EN  ——→ transcript text
    │
    ├── LLM  ——→ LFM2 350M        ——→ translation (streamed token by token)
    │
    └── TTS  ——→ Piper EN Lessac  ——→ Float32Array audio → AudioPlayback

All 4 models run in-browser using WebAssembly (llama.cpp + sherpa-onnx). Zero network calls after the one-time model download.

🚀 Quick Start

Prerequisites

Bun ≥ 1.x
Chrome 96+ or Edge 96+ (required for WebGPU / SharedArrayBuffer)

Install & Run

# Clone / enter the project
cd WebVoiceAgent

# Install dependencies
bun install

# Start dev server (with COOP/COEP headers for SharedArrayBuffer)
bun run dev

Open http://localhost:5173 and tap the microphone button.

⚠️ First run: The app will download ~425 MB of AI models (VAD 5MB + STT 105MB + LLM 250MB + TTS 65MB). These are stored permanently in your browser's OPFS — subsequent starts load instantly from cache.

📁 Project Structure

WebVoiceAgent/
├── public/                  # Static assets (favicon, PWA icons)
├── src/
│   ├── components/          # Reusable UI components
│   │   ├── Header.jsx       # App bar, theme, and status
│   │   ├── RecordingHub.jsx # Mic & Upload interaction center
│   │   ├── ConfigPanel.jsx  # Language & download settings
│   │   ├── SourceTranscript.jsx # User speech transcription
│   │   ├── TranslatedAudio.jsx # Translation playback & text
│   │   ├── MetricsDisplay.jsx # Latency & speed metrics
│   │   └── CustomAudioPlayer.jsx # Premium player with progress
│   ├── utils/               # Logic helpers
│   │   ├── audio.js         # WAV encoding utilities
│   │   └── time.js          # Timestamp formatting
│   ├── App.jsx              # Main orchestrator (state & pipeline)
│   ├── runanywhere.js       # SDK init & model registration
│   ├── constants.js         # Language lists & stage labels
│   ├── index.css            # Tailwind + custom brand styles
│   └── main.jsx             # React entry point
├── .env                     # Model IDs & LLM settings
├── vite.config.js           # Vite + PWA + WASM config
└── RunanywhereGuide.md      # SDK reference docs

⚙️ Environment Variables (`.env`)

All model IDs and LLM settings live in .env. Change these to swap models without touching code.

# RunAnywhere Model IDs
VITE_MODEL_LLM_ID=lfm2-350m-q4_k_m
VITE_MODEL_STT_ID=sherpa-onnx-whisper-tiny.en
VITE_MODEL_TTS_ID=vits-piper-en_US-lessac-medium
VITE_MODEL_VAD_ID=silero-vad-v5

# LLM generation settings
VITE_LLM_MAX_TOKENS=80
VITE_LLM_TEMPERATURE=0.7

Swapping Models

Slot	Current	Upgrade option
LLM	`lfm2-350m-q4_k_m`	`lfm2-1.2b-tool-q4_k_m` (800MB, better quality)
STT	`sherpa-onnx-whisper-tiny.en`	Whisper Base EN (larger, more accurate)
TTS	`vits-piper-en_US-lessac-medium`	Any Piper voice
VAD	`silero-vad-v5`	Same (no alternatives needed)

🔧 Key Files Explained

`src/runanywhere.js` — SDK Initialization

This file initialises the RunAnywhere SDK once (idempotent cached-promise pattern) and registers all 4 AI models pulled from .env.

await initSDK();  // safe to call from multiple components — runs only once

It exports:

initSDK() — async, idempotent SDK init
ModelManager — download/load/query models
ModelCategory — enum for VAD, STT, LLM, TTS
MODEL_IDS — model IDs read from .env

`src/App.jsx` — Orchestration

The main App component acts as the central state machine and orchestrator. It uses specific components from src/components to keep the UI modular and clean.

Stage	Meaning
`idle`	Waiting for user to tap mic
`sdk_init`	Initialising WASM backends
`downloading`	Downloading model files (cached in OPFS)
`loading`	Loading models into memory
`listening`	Mic active, VAD processing audio
`processing`	Speech segment detected, running STT
`generating`	LLM generating translation
`speaking`	TTS audio playing
`error`	Error state shown in RecordingHub

`src/components/` — UI Modules

The UI is broken down into small, focused components:

RecordingHub: Handles microphone capture, file uploads, and status animations.
SourceTranscript: Manages the list of source speech with quick playback.
TranslatedAudio: Displays translated text and handles the target audio playback with a progress bar.
ConfigPanel: Manages source/target languages and voice cloning preferences.
CustomAudioPlayer: A premium audio interaction module used for both source and target playback.
MetricsDisplay: Displays latency, accuracy, and speed at a glance.

`vite.config.js` — Build Configuration

Critical settings for RunAnywhere to work:

server: {
  headers: {
    'Cross-Origin-Opener-Policy': 'same-origin',
    'Cross-Origin-Embedder-Policy': 'credentialless',
  }
}
// ↑ Required for SharedArrayBuffer (multi-threaded WASM)

optimizeDeps: {
  exclude: ['@runanywhere/web-llamacpp', '@runanywhere/web-onnx'],
}
// ↑ Critical: prevents Vite from pre-bundling WASM packages
//   so import.meta.url resolves to correct WASM file paths

worker: { format: 'es' }
// ↑ Required for VLM web workers (future use)

The copyWasmPlugin() copies WASM binaries from node_modules into dist/assets/ at build time so they're served correctly in production.

🛠️ Available Scripts

bun run dev       # Start dev server at http://localhost:5173 (with COOP/COEP headers)
bun run build     # Production build → dist/
bun run preview   # Serve the built dist/ at http://localhost:4173

🌐 Deploying to Production

The dist/ folder is a self-contained static site. Deploy it anywhere that supports custom HTTP headers.

Vercel (`vercel.json`)

{
  "headers": [
    {
      "source": "/(.*)",
      "headers": [
        { "key": "Cross-Origin-Opener-Policy", "value": "same-origin" },
        { "key": "Cross-Origin-Embedder-Policy", "value": "credentialless" }
      ]
    },
    {
      "source": "/assets/(.*).wasm",
      "headers": [
        { "key": "Content-Type", "value": "application/wasm" },
        { "key": "Cache-Control", "value": "public, max-age=31536000, immutable" }
      ]
    }
  ]
}

Netlify (`netlify.toml`)

[[headers]]
  for = "/*"
  [headers.values]
    Cross-Origin-Opener-Policy = "same-origin"
    Cross-Origin-Embedder-Policy = "credentialless"

⚠️ The COOP/COEP headers are required. Without them, SharedArrayBuffer is unavailable and WASM falls back to single-threaded mode (significantly slower).

📦 Dependencies

Package	Purpose
`react` + `react-dom`	UI framework
`@runanywhere/web`	Core SDK — RunAnywhere, ModelManager, VoicePipeline, AudioCapture, EventBus
`@runanywhere/web-llamacpp`	LLM/VLM backend — llama.cpp compiled to WASM
`@runanywhere/web-onnx`	STT/TTS/VAD backend — sherpa-onnx compiled to WASM
`vite`	Build tool
`vite-plugin-pwa`	Progressive Web App support
`tailwindcss`	Styling

🤖 AI Models Used

All models are downloaded from HuggingFace on first use and cached locally in the browser's OPFS.

Model	HuggingFace Repo	Size	Role
LFM2 350M Q4_K_M	`LiquidAI/LFM2-350M-GGUF`	~250 MB	Translation LLM
Whisper Tiny EN	`runanywhere/sherpa-onnx-whisper-tiny.en`	~105 MB	Speech-to-Text
Piper TTS Lessac	`runanywhere/vits-piper-en_US-lessac-medium`	~65 MB	Text-to-Speech
Silero VAD v5	`runanywhere/silero-vad-v5`	~5 MB	Voice Activity Detection

Total download: ~425 MB (one-time, then cached forever in OPFS).

🧩 How to Add a New Language (TTS)

Find a Piper voice model for your language

Add the model entry to src/runanywhere.js:

{
  id: 'vits-piper-de_DE-thorsten-medium',
  name: 'Piper TTS German',
  url: 'https://huggingface.co/runanywhere/vits-piper-de_DE-thorsten-medium/resolve/main/vits-piper-de_DE-thorsten-medium.tar.gz',
  framework: LLMFramework.ONNX,
  modality: ModelCategory.SpeechSynthesis,
  memoryRequirement: 65_000_000,
  artifactType: 'archive',
}

Update .env: VITE_MODEL_TTS_ID=vits-piper-de_DE-thorsten-medium

🐛 Troubleshooting

Problem	Fix
WASM fails to load	Ensure COOP/COEP headers are set in dev server
Tab crashes on download	Close other browser tabs, use Chrome/Edge
Safari issues	Use Chrome or Edge — Safari has limited OPFS support
`SharedArrayBuffer is not defined`	Missing COOP/COEP headers — check `vite.config.js`
Models not found after page refresh	Normal — they load from OPFS cache automatically
`WASM expected magic word` error	Static files are being intercepted — check server routing

📄 License

MIT — feel free to use, modify, and deploy.

Built with ❤️ using RunAnywhere Web SDK, React, Vite, Bun, and Tailwind CSS.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
public		public
src		src
.env		.env
.gitignore		.gitignore
README.md		README.md
RunanywhereGuide.md		RunanywhereGuide.md
TECHNICAL_DOCS.md		TECHNICAL_DOCS.md
bun.lock		bun.lock
eslint.config.js		eslint.config.js
generate-icons.js		generate-icons.js
index.html		index.html
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌐 Let'sTalk.live

✨ Features

🏗️ Architecture

🚀 Quick Start

Prerequisites

Install & Run

📁 Project Structure

⚙️ Environment Variables (`.env`)

Swapping Models

🔧 Key Files Explained

`src/runanywhere.js` — SDK Initialization

`src/App.jsx` — Orchestration

`src/components/` — UI Modules

`vite.config.js` — Build Configuration

🛠️ Available Scripts

🌐 Deploying to Production

Vercel (`vercel.json`)

Netlify (`netlify.toml`)

📦 Dependencies

🤖 AI Models Used

🧩 How to Add a New Language (TTS)

🐛 Troubleshooting

📄 License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌐 Let'sTalk.live

✨ Features

🏗️ Architecture

🚀 Quick Start

Prerequisites

Install & Run

📁 Project Structure

⚙️ Environment Variables (.env)

Swapping Models

🔧 Key Files Explained

src/runanywhere.js — SDK Initialization

src/App.jsx — Orchestration

src/components/ — UI Modules

vite.config.js — Build Configuration

🛠️ Available Scripts

🌐 Deploying to Production

Vercel (vercel.json)

Netlify (netlify.toml)

📦 Dependencies

🤖 AI Models Used

🧩 How to Add a New Language (TTS)

🐛 Troubleshooting

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

⚙️ Environment Variables (`.env`)

`src/runanywhere.js` — SDK Initialization

`src/App.jsx` — Orchestration

`src/components/` — UI Modules

`vite.config.js` — Build Configuration

Vercel (`vercel.json`)

Netlify (`netlify.toml`)