VerbalCoding

The voice layer for any coding agent — real barge-in, streaming latency, and the agents you already use.

한국어 · 日本語 · 中文 · Español · Français · Русский

Why it exists

VerbalCoding turns a Discord voice channel into a hands-free cockpit for any CLI coding agent. Hermes ships its own /voice join for Hermes; VerbalCoding is a thin, agent-agnostic layer that puts the same loop on top of Hermes, Claude Code, Codex, Gemini, OpenCode, OpenClaw, Aider, Cursor CLI, or any non-interactive shell command — with the rough edges other voice frontends still have on their roadmap:

True audio barge-in — interrupt the agent mid-sentence; Hermes' built-in voice pauses its listener during TTS.
Streaming pipeline — first sentence plays while the agent is still writing (Hermes lists this as a future Phase-4 item).
Smart progress narration — describes intent ("wiring the new login route"), not file lists.
Voice plan mode — say "plan it first", edit by voice ("skip step 3"), say "approve" to execute.
Cross-agent routing by voice — "ask Codex what it thinks" for a single turn, "switch to Aider" to make it sticky, "back to default" to restore. The plan can also emit a which_agent slot so the agent itself picks the next backend.
Phone-down mode — push notification with a voice summary when a long task completes and the room is empty.

What feels different

Capability	Why it matters
Agent choice, first-class	Hermes Agent, Claude Code, Codex, Gemini CLI, OpenCode, OpenClaw, Aider, Cursor CLI, or any custom command. `vc setup` auto-detects what's installed.
Cross-agent voice routing	Say "ask Codex …" (single turn), "switch to Aider" (sticky), or "back to default". Missing binaries are detected and the bridge offers to fall back to the default agent. Handoff prompts carry recent utterances + last plan decisions to the new agent.
Real barge-in	VAD thresholds tuned for indoor and noisy rooms; cut in mid-utterance and resume the conversation.
Streaming end-to-end	`STREAMING_TTS=1` plays sentences as the agent produces them; first audio in well under a second on a warm cache.
Smart progress	Optional LLM summarizer collapses raw events into one human sentence; falls back to the existing regex labels when no key is set.
Plan-mode by voice	Narrated, editable, voice-driven plans without touching the keyboard.
Phone-down handoff	Long task + empty VC = push notification (`ntfy`/`pushover`) with a redacted one-line summary and tap-to-rejoin link.
Local speech loop	Discord audio is transcribed by local `whisper-cli`; TTS via Edge, OpenVoice, SpeechSwift/CosyVoice, or Supertonic.
Real operations support	Doctor auto-fixes, Docker UDP guidance, latency metrics, multi-instance project rooms, redacted config checks.

Already using Hermes Agent? Hermes itself has a working Discord voice loop via /voice join / /voice channel. Use VerbalCoding when you want it agent-agnostic, want barge-in and streaming today, or want plan-mode, push handoff, and smart narration on top of the same loop. The two coexist — VerbalCoding can drive Hermes as its backend.

Quick Start

npm install -g verbalcoding@latest
vc setup       # detects installed agents and lets you pick
vc doctor
vc start

vc setup is the normal human path. Keep Discord Developer Portal open while it asks for your bot token, application/client ID, transcript target, and voice channel names.

Automation can skip prompts, then fill Discord details later:

vc setup --yes
vc setup token <bot-token> --client-id <discord-client-id>
vc setup channels "General,Team Voice"
vc doctor

Contributor clone path:

git clone https://github.com/ca1773130n/VerbalCoding.git
cd VerbalCoding
./scripts/install.sh
vc doctor
./run.sh

Discord setup in one minute

Create a Discord application and bot in https://discord.com/developers/applications.
Enable the Message Content privileged intent.
Run vc setup and paste the bot token plus application/client ID when prompted.
Enter exact voice channel names for auto-join.
Invite the bot with:

vc bot invite <discord-client-id>
vc bot invite <discord-client-id> --guild <guild-id>

Secrets are stored in ignored local env files with mode 0600 and are not printed back by vc doctor.

Tiny command map

vc setup                               # guided setup with agent auto-detection
vc setup --yes                         # non-interactive bootstrap/starter config
vc setup token                         # rotate or add Discord bot token/client ID later
vc setup channels "General,Team Voice" # update auto-join voice channel names
vc bot invite CLIENT_ID                # generate a Discord bot invite URL
vc status                              # show active language, TTS, bridge settings, and resolved backend
vc language ko|en|auto                 # switch STT/progress/TTS language preset
vc doctor                              # redacted health check with auto-fix suggestions
vc start                               # start the default bridge
vc instance setup NAME                 # create an isolated project voice bot
vc instance start NAME                 # run that bot in the background

In Discord:

Command	What it does
`!join` / `!leave`	Join or leave your current voice channel.
`!ask <prompt>`	Send text to the same selected agent backend.
`!verbose on\|off`	Toggle short progress updates.
`!latency` / `!metrics`	Summarize recent STT/agent/TTS latency.
`!sensitivity normal\|conservative`	Tune barge-in for indoor or noisy environments.
`!session new <name> <workdir> [context] --voice <voice-channel>`	Bind a project session to a voice room.

Roadmap

The differentiation push is tracked in docs/ROADMAP.md. Five phases land the claims above:

#	Phase	What it adds
1	Streaming pipeline	Sentence-by-sentence TTS while the agent is still writing.
2	Agent-agnostic adapters	First-class Aider + Cursor CLI; `vc setup` auto-detects.
6	Smart progress	LLM-summarized narration. Falls back to today's regex labels.
7	Voice plan mode	Narrate plan, voice-edit, approve to execute.
10	Push notification handoff	ntfy/Pushover when a long task ends and the room is empty.

Learn more

Guide	What you get
Docs hub	One page linking every guide and localized doc set.
Roadmap	Differentiation plan and per-phase implementation plans.
Fresh Install	npm/global setup, Discord app setup, token/channel commands, first run.
Usage Guide	CLI commands, Discord commands, run modes, voice changes, latency metrics.
Hermes Built-in Voice vs VerbalCoding	What Hermes already supports and when VerbalCoding is worth adding.
Configuration	`.env`, agent backends, MCP server, TTS backends, operational notes.
Troubleshooting	Docker host networking, UDP voice failures, missing token/channel diagnostics.
Multi-Instance	One permanent Discord voice room per project.
Release Notes	Current capabilities, checks, and public-release gaps.

Requirements

Layer	Default
Runtime	Node.js 20+ and npm; setup can install via Homebrew/apt/dnf/pacman where supported.
Audio	`ffmpeg`; setup/doctor can install it on supported OSes.
Speech recognition	Local `whisper-cli` from whisper.cpp plus `models/ggml-small-q5_1.bin`.
TTS	Edge TTS by default; optional OpenVoice, SpeechSwift/CosyVoice, Supertonic, OmniVoice, and Qwen3 TTS CLI paths.
Discord	Bot token, Message Content intent, voice permissions, matching auto-join channel names.
Agent	At least one CLI harness installed; `vc setup` auto-detects Hermes, Claude Code, Codex, Gemini, OpenCode, OpenClaw, Aider, Cursor CLI.
Platform focus	macOS / Apple Silicon most tested; Linux bootstrap is best-effort; Windows unsupported for now.

Docker / container note

Discord text login can work while voice join fails if outbound UDP is blocked. If logs show Cannot perform IP discovery - socket closed, use Linux host networking for the service that runs vc start:

services:
  verbalcoding:
    network_mode: "host"

Do not combine network_mode: "host" with ports:. Docker Desktop for macOS/Windows behaves differently; if UDP still fails there, run VerbalCoding directly on the host or a Linux VM.

Contributing

Run lightweight checks before sending changes:

node --check app-node/main.mjs
npm test
bash -n run.sh scripts/install.sh scripts/bootstrap_prereqs.sh
npm pack --dry-run
vc doctor

Status

Public-release oriented but still early. The roadmap above tracks live differentiation work. Demo video/GIF, broader Linux validation, CI, and deeper security review are still TODOs.

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
.clinerules		.clinerules
.codeium		.codeium
.github		.github
app-node		app-node
config		config
docs		docs
instances		instances
integrations		integrations
scripts		scripts
spikes/001-mossttsnano-mlx-port		spikes/001-mossttsnano-mlx-port
tests		tests
.aider.conf.yml		.aider.conf.yml
.env.example		.env.example
.gitignore		.gitignore
.rules		.rules
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONVENTIONS.md		CONVENTIONS.md
GEMINI.md		GEMINI.md
LICENSE		LICENSE
README.es.md		README.es.md
README.fr.md		README.fr.md
README.ja.md		README.ja.md
README.ko.md		README.ko.md
README.md		README.md
README.ru.md		README.ru.md
README.zh.md		README.zh.md
SYNC-CHANGELOG.md		SYNC-CHANGELOG.md
config.toml		config.toml
opencode.json		opencode.json
package-lock.json		package-lock.json
package.json		package.json
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VerbalCoding

Why it exists

What feels different

Quick Start

Discord setup in one minute

Tiny command map

Roadmap

Learn more

Requirements

Docker / container note

Contributing

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VerbalCoding

Why it exists

What feels different

Quick Start

Discord setup in one minute

Tiny command map

Roadmap

Learn more

Requirements

Docker / container note

Contributing

Status

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages