Trannote is a real-time transcription and speaker diarization system designed to deliver high-accuracy transcriptions with minimal latency. This project leverages OpenAI's Whisper model for transcription and AssemblyAI for speaker diarization. The current implementation processes audio on the server side, but future iterations will transition to client-side processing for improved efficiency.
- Real-time transcription using OpenAI's Whisper model.
- Speaker diarization powered by AssemblyAI.
- WebSocket-based communication for continuous audio streaming.
- Live text display in a simple, user-friendly web interface.
- Python (WebSockets, asyncio)
- Whisper (OpenAI) for transcription
- AssemblyAI for diarization
- Sounddevice for capturing audio
- Diarization to Pyannote-Audio once it's stable on Hugging Face.
- Optimize transcription latency to support near-instant results.
- Shift audio input from server-side to client-side in future updates.
- Start the WebSocket server (
transcription.py) to listen for audio streams. - The web client (
index.html) establishes a WebSocket connection. - Audio is recorded and streamed from the client to the server.
- Whisper transcribes the audio in real-time and sends text back to the client.
- After stopping the recording, the entire audio file is sent for diarization.
- AssemblyAI processes the file and identifies speakers.
- Clone this repository:
git clone https://github.com/REDFLAG-bugs/trannote.git cd trannote - Install dependencies:
pip install -r requirements.txt
- Set up the environment variable for AssemblyAI:
export ASSEMBLYAI_API_KEY=your_api_key_here - Run the WebSocket server:
python transcription.py
- Open
index.htmlin a browser and start transcribing!
Feel free to contribute by reporting issues, suggesting features, or submitting pull requests. The goal is to make Trannote a truly real-time transcription powerhouse!
This project is open-source and available under the MIT License.
- Improve latency and optimize for real-time performance.
- Fine-tune Whisper models for domain-specific accuracy.
- Deploy on Hugging Face Spaces for wider accessibility.
👀 Stay tuned for updates as Trannote evolves into a fully optimized real-time transcription solution!