A voice note application with transcription, semantic search, and weekly digests.
- Voice Recording: Record voice notes directly in the browser
- Automatic Transcription: Uses Whisper large-v3 for accurate transcription
- Speaker Diarization: Identify different speakers in recordings (optional, uses pyannote.audio)
- Semantic Search: Find notes by meaning using sentence-transformers + ChromaDB
- Weekly Digest: See top topics and activity stats for each week
| Layer | Technology |
|---|---|
| Frontend | React + Vite, TailwindCSS |
| Backend | Python FastAPI |
| Transcription | OpenAI Whisper (local, large-v3) |
| Diarization | pyannote.audio (local, speaker-diarization-3.1) |
| Embeddings | sentence-transformers (all-MiniLM-L6-v2) |
| Vector Search | ChromaDB |
| Database | SQLite |
- Python 3.10+
- Node.js 18+
- uv (Python package manager)
- ~10GB RAM for Whisper large-v3
cd backend
# Install dependencies with uv
uv sync
# Run the server
uv run uvicorn main:app --reload --host 0.0.0.0 --port 8000The first run will download the Whisper model (~3GB) and sentence-transformer model (~90MB).
cd frontend
# Install dependencies
npm install
# Run dev server
npm run devOpen http://localhost:5173 in your browser.
To enable speaker identification in recordings:
- Create a free account at https://huggingface.co
- Accept model terms at https://huggingface.co/pyannote/speaker-diarization-3.1
- Get your token at https://huggingface.co/settings/tokens
- Set the environment variable:
export HF_TOKEN=your_huggingface_tokenIf HF_TOKEN is not set, transcription will still work but without speaker labels.
| Endpoint | Method | Description |
|---|---|---|
/api/notes |
GET | List all notes |
/api/notes |
POST | Upload new voice note |
/api/notes/{id} |
GET | Get single note |
/api/notes/{id} |
PUT | Update note |
/api/notes/{id} |
DELETE | Delete note |
/api/notes/{id}/audio |
GET | Get audio file |
/api/search |
GET | Semantic search |
/api/digest |
GET | Weekly digest |
voice-note/
├── backend/
│ ├── main.py # FastAPI app
│ ├── transcription.py # Whisper integration
│ ├── diarization.py # Speaker diarization with pyannote
│ ├── embeddings.py # Sentence transformers + ChromaDB
│ ├── topics.py # Topic extraction for digest
│ ├── database.py # SQLite models
│ └── pyproject.toml # uv dependencies
├── frontend/
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── pages/ # Route pages
│ │ ├── api.ts # API client
│ │ └── App.tsx # Main app
│ └── package.json
└── data/ # Audio files & databases
MIT