A real-time audio transcription tool that captures system audio and transcribes it using OpenAI's Whisper model. Perfect for meeting notes, lecture recordings, or any scenario where you need automatic transcription of audio playing on your computer.
- 🎤 Real-time Audio Capture: Captures audio from your system or microphone
- 📝 Automatic Transcription: Uses Whisper AI for accurate speech-to-text conversion
- 🔄 Automatic Resampling: Handles different audio sample rates automatically
- 💾 Persistent Storage: Saves transcriptions to a text file for easy access
- 🖥️ Cross-Platform: Works on macOS, Windows, and Linux
- ⚡ Efficient: Processes audio in 15-second chunks for optimal performance
- 🎯 Smart Device Detection: Automatically finds and uses virtual audio devices
- Rust: 1.70 or later (Install Rust)
- Cargo: Comes with Rust installation
- Whisper Model:
ggml-base.en.bin(included in repository, or download from OpenAI Whisper)
- macOS 10.15+ (Catalina or later)
- For system audio capture: BlackHole (recommended)
- Windows 10 or later
- For system audio capture: VB-Audio Cable or enable "Stereo Mix" in sound settings
- ALSA or PulseAudio
- For system audio capture: Configure PulseAudio loopback
git clone https://github.com/vikashviraj/whyme.git
cd whymeThe model file should be placed in the model/ directory:
# Create model directory if it doesn't exist
mkdir -p model
# Download the base English model (if not already present)
# You can download from: https://huggingface.co/ggerganov/whisper.cpp
# Or directly from: https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.binNote: The repository includes model/ggml-base.en.bin by default. If you need a different model or language, replace this file.
# Make the build script executable (macOS/Linux)
chmod +x build.sh
# Build and run
./build.sh run# Build
cargo build --release
# Run
cargo run --releaseIf you encounter C++ compilation errors on macOS, the build.sh script automatically sets the required environment variables. For manual builds:
export CXXFLAGS="-I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1"
export MACOSX_DEPLOYMENT_TARGET=15.0
cargo build --release-
Start the application:
./build.sh run # or cargo run --release -
Select your audio source:
- The application will automatically detect and use:
- macOS: BlackHole (if installed) or default microphone
- Windows: VB-Audio Cable, Loopback devices, or default microphone
- Linux: Default audio input device
- The application will automatically detect and use:
-
View transcriptions:
- Transcriptions are saved to
Notes/transcript.txt - Console output shows real-time transcription progress
- Press
Ctrl+Cto stop gracefully
- Transcriptions are saved to
To see what audio devices are available on your system:
cargo run --bin list-devicesThis is useful for troubleshooting or selecting a specific device.
Edit src/main.rs and modify the transcript file path:
.open("Notes/transcript.txt")?; // Change this pathThe default chunk size is 15 seconds. To change it, modify the constant in src/main.rs:
const CHUNK_SECONDS: usize = 15; // Change this valueReplace model/ggml-base.en.bin with a different Whisper model:
ggml-tiny.en.bin- Fastest, least accurateggml-base.en.bin- Balanced (default)ggml-small.en.bin- Better accuracyggml-medium.en.bin- High accuracyggml-large-v2.bin- Best accuracy (multilingual)
-
Install BlackHole:
brew install blackhole-16ch # or download from: https://github.com/ExistentialAudio/BlackHole -
Configure Audio Routing:
- Open "Audio MIDI Setup" (Applications > Utilities)
- Create a Multi-Output Device including:
- Your speakers/headphones
- BlackHole 16ch
- Set this as your system output
- WhyMe will automatically detect and use BlackHole
-
Option A: VB-Audio Cable (Recommended)
- Download and install VB-Audio Cable
- Set VB-Audio Cable as your default playback device
- WhyMe will automatically detect and use it
-
Option B: Stereo Mix
- Right-click the speaker icon → "Sounds"
- Go to "Recording" tab
- Right-click and enable "Show Disabled Devices"
- Enable "Stereo Mix"
- WhyMe will detect and use it automatically
Configure PulseAudio loopback:
# Load loopback module
pactl load-module module-loopback
# Or create a null sink
pactl load-module module-null-sink sink_name=virtual_speakerProblem: Application can't find any audio input devices.
Solutions:
- Run
cargo run --bin list-devicesto see available devices - Check that your audio drivers are installed and working
- On Windows, ensure "Stereo Mix" or virtual audio cable is enabled
- On macOS, verify BlackHole is installed and running
Problem: Transcriptions are inaccurate or contain repeated words.
Solutions:
- Ensure audio levels are adequate (not too quiet or too loud)
- Check that the correct audio source is selected
- Try a larger Whisper model (e.g.,
ggml-small.en.bin) - Verify the audio isn't being resampled incorrectly
Problem: C++ compilation errors related to <atomic> header.
Solution: Use the provided build.sh script, which sets the correct environment variables:
./build.sh buildProblem: Error about missing model file.
Solution: Ensure model/ggml-base.en.bin exists in the project root:
ls model/ggml-base.en.binIf missing, download from Hugging Face or use the included model.
Problem: Application uses too much CPU.
Solutions:
- Use a smaller Whisper model (
ggml-tiny.en.bin) - Increase
CHUNK_SECONDSto process larger chunks less frequently - Close other resource-intensive applications
- Audio Capture: Uses
cpalfor cross-platform audio input - Resampling: Custom linear interpolation for sample rate conversion
- Transcription:
whisper-rsbindings to OpenAI's Whisper model - Storage: Simple file-based output to
Notes/transcript.txt
- Audio captured from input device (system audio or microphone)
- Converted to mono if multi-channel
- Resampled to 16kHz if needed (Whisper's required sample rate)
- Buffered in 15-second chunks
- Normalized to prevent clipping
- Sent to Whisper for transcription
- Results written to transcript file
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes with clear, documented code
- Test on your platform before submitting
- Commit with clear messages:
git commit -m "Add amazing feature" - Push to your fork:
git push origin feature/amazing-feature - Open a Pull Request
# Clone your fork
git clone https://github.com/vikashviraj/whyme.git
cd whyme
# Create a branch
git checkout -b feature/your-feature
# Make changes and test
cargo test
cargo run --release
# Submit PR- Follow Rust standard formatting:
cargo fmt - Run clippy:
cargo clippy - Ensure tests pass:
cargo test
# Run tests
cargo test
# Check compilation
cargo check
# List audio devices (useful for testing)
cargo run --bin list-devicesThe repository includes GitHub Actions workflows for automated Windows testing. Push to GitHub to trigger automated builds on Windows.
- Memory Usage: ~150-200 MB (model + buffers)
- CPU Usage: Moderate (depends on Whisper model size)
- Latency: ~15 seconds (chunk processing time)
- Accuracy: Depends on Whisper model (base model provides good balance)
- Currently supports English only (can be extended to other languages)
- Processes audio in 15-second chunks (not truly real-time)
- Requires local Whisper model (no cloud API)
- Best results with clear audio and minimal background noise
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Whisper - The amazing speech recognition model
- whisper-rs - Rust bindings for Whisper
- cpal - Cross-platform audio library
- BlackHole - Virtual audio driver for macOS
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Created with ❤️ for the open-source community.
Note: This project is not affiliated with OpenAI. Whisper is used under OpenAI's terms of service.