You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Real-time voice conversation system with the Axel persona
A streaming audio pipeline that captures speech, understands context through a 4-layer memory system,
and responds with natural voice — all with sub-second latency. Also supports text chat via WebSocket.
Features
Streaming voice pipeline — Mic → Silero VAD → ElevenLabs STT → Claude → ElevenLabs TTS → Speaker, with producer-consumer audio queue for gapless playback
Text chat via WebSocket — /chat endpoint with per-connection ConversationEngine, shared memory and tool access
Dual-channel architecture — Shared ConversationEngine core for both voice and text, with channel-aware persona behavior
6-state FSM — IDLE / LISTENING / PROCESSING / SPEAKING / ACTIVE / INTERRUPTED with barge-in support for natural turn-taking
4-layer long-term memory — Semantic, episodic, emotional, and procedural memory stored in PostgreSQL + pgvector with HNSW indexes and time-decay scoring
Compaction-driven memory extraction — No per-exchange overhead; memories are extracted on server-side compaction events and shutdown via Haiku/Flash
RAG with reranking — Voyage AI embeddings (voyage-4-large) + rerank-2.5 for high-relevance memory recall
Prompt caching — 3-block system prompt layout (persona → RAG context → dynamic) optimized for Anthropic cache hits
Server-side compaction — Anthropic context management API handles thinking clearing, tool result clearing, and compaction automatically
Agentic tool loop — Up to 3 tool-use rounds per response with Home Assistant delegation and web search
Session timeline — Per-request temporal context with session start time and recent turn timestamps
Structured logging — Modular subsystem with structured formatters, handlers, and function tracing
Architecture
graph TD
subgraph Input
MIC["Microphone<br/><sub>PyAudio</sub>"]
VAD["Silero VAD<br/><sub>Speech Detection</sub>"]
end
subgraph Core
SM["State Machine<br/><sub>6-state FSM</sub>"]
ENGINE["ConversationEngine<br/><sub>Shared Core</sub>"]
STT["ElevenLabs Scribe v2<br/><sub>WebSocket Realtime</sub>"]
LLM["Claude Sonnet 4.6<br/><sub>Adaptive Thinking</sub>"]
end
subgraph Channels
VOICE["Voice Pipeline<br/><sub>Concurrent TTS</sub>"]
CHAT["Text Chat<br/><sub>WebSocket /chat</sub>"]
end
subgraph Output
TTS["ElevenLabs TTS<br/><sub>Streaming</sub>"]
PLAY["paplay<br/><sub>PulseAudio</sub>"]
end
subgraph Tools
HASS["Home Assistant<br/><sub>Conversation API</sub>"]
WEB["Web Search"]
end
subgraph Memory
MEM["Memory Extractor<br/><sub>Haiku / Flash</sub>"]
EMB["Voyage AI<br/><sub>voyage-4-large</sub>"]
DB[("PostgreSQL<br/><sub>pgvector</sub>")]
RERANK["Reranker<br/><sub>rerank-2.5</sub>"]
CTX["Context Builder<br/><sub>3-block layout</sub>"]
end
MIC --> VAD --> SM --> STT --> ENGINE
CHAT --> ENGINE
ENGINE --> LLM
ENGINE -->|voice| VOICE --> TTS --> PLAY
ENGINE -->|text| CHAT
LLM -->|tool_use| HASS & WEB -->|tool_result| LLM
LLM -->|post-compaction| MEM --> EMB --> DB
DB --> RERANK --> CTX -->|system prompt| LLM
cp .env.example .env
# Fill in required API keys: ANTHROPIC_API_KEY, ELEVENLABS_API_KEY
Run
# Dev launcher (auto-kills stale port processes)
./scripts/run.sh
# Or manually with hot reload
uv run uvicorn prot.app:app --host 0.0.0.0 --port 8000 --reload
src/prot/
app.py # FastAPI app, lifespan, HTTP + WebSocket endpoints
engine.py # ConversationEngine — shared core for voice + text
pipeline.py # Main voice pipeline orchestrator
state.py # 6-state FSM with barge-in support
config.py # Pydantic Settings (all env vars)
context.py # 3-block system prompt builder + session timeline
persona.py # Axel persona loader (data/axel.xml)
audio.py # PyAudio microphone capture
vad.py # Silero VAD speech detection
stt.py # ElevenLabs Scribe v2 (WebSocket STT)
llm.py # Claude API — streaming + tool-use loop
hass.py # Home Assistant conversation API
tts.py # ElevenLabs TTS streaming
playback.py # paplay audio output (producer-consumer queue)
processing.py # LLM → TTS → playback orchestration
memory.py # Compaction-driven 4-layer memory extraction
graphrag.py # pgvector-backed memory storage
decay.py # AdaptiveDecayCalculator for time-decay scoring
embeddings.py # Voyage AI embeddings
reranker.py # Voyage AI reranker
db.py # asyncpg connection pool + schema init
schema.sql # PostgreSQL schema (auto-applied on startup)
logging/ # Structured logging subsystem
tests/ # Unit & integration tests
deploy/ # systemd service file (prot.service)
scripts/ # Dev launcher (run.sh)
data/ # Persona config (axel.xml) + runtime data
Testing
# Unit tests (no API keys needed)
uv run pytest
# Integration tests (requires real API keys in .env)
uv run pytest -m integration
# Coverage report
uv run pytest --cov=prot --cov-report=term-missing
Test configuration: pytest-asyncio with asyncio_mode = "auto". Test files mirror source structure: test_<module>.py for each module.