High-precision audio transcription with word-level timestamps
Features • Installation • Usage • Benchmarks • Audiobook Pipeline
- Word-Level Precision — Each word gets accurate start/end timestamps with adaptive buffers
- Smart Segmentation — Automatically groups words into readable segments (2-15 words) based on pauses and punctuation
- Interactive CLI — Beautiful terminal UI with file picker, fuzzy search, and keyboard navigation
- Zero Config — Python environment auto-setup, no manual installation required
- Blazing Fast — Optimized with int8 quantization, processes 41s audio in ~2s
- Audiobook Support — Chapter-based pipeline for long audio files with Claude Code integration
- Two Output Modes
sync— Full JSON with timestamps, segments, and analyticstext— Plain transcription for quick exports
- Multi-Language — English, Portuguese, Spanish, French, German, and more
Audio File → faster-whisper (Python) → Word Timestamps → Segmentation Engine → Timing Optimizer → JSON Output
- Transcription — Uses faster-whisper with int8 quantization for fast CPU inference
- Segmentation — Groups words using weighted boundary scoring (pauses, punctuation, sentence structure)
- Timing — Applies per-word complexity analysis with asymmetric buffers (60% lead-in, 40% trail-out)
- Indexing — Creates time-bucket indices for O(log n) lookups
git clone https://github.com/ojowwalker77/syncEngine.git
cd syncEngine
bun installThat's it. Python dependencies are installed automatically on first run.
bun run src/index.ts- Type
@and select your audio file - Choose language, model, and output mode
- Press Enter on Use Selection
# Basic usage
bun run src/index.ts audio.mp3
# With options
bun run src/index.ts audio.mp3 -l pt -m base --mode sync
# Text-only output (fast)
bun run src/index.ts audio.mp3 --mode text -o transcript.txt| Flag | Description | Default |
|---|---|---|
-l, --language |
Language code (en, pt, es, etc.) |
en |
-m, --model |
Whisper model (tiny, base, small, medium) |
base |
-o, --output |
Output file path | outputs/<name>_sync.json |
--mode |
Output mode (sync, text) |
sync |
--verbose |
Detailed logging | false |
Test audio: 41-second Alice in Wonderland excerpt (ElevenLabs TTS)
| Model | Time | RTF | Quality |
|---|---|---|---|
| tiny | 1.66s | 0.04x | Good (minor errors) |
| base | 2.55s | 0.06x | Very Good |
| small | 8.46s | 0.21x | Excellent |
| medium | 16.06s | 0.39x | Excellent |
RTF = Real-Time Factor (lower is faster). All models are faster than realtime.
Recommendation: Use base for best speed/quality balance. Use tiny for drafts.
See benchmark/BENCHMARK.md for detailed results.
# Run benchmark on your audio
./benchmark/run.sh your_audio.mp3For long audio files (1hr+), use the chapter-based pipeline with Claude Code integration:
Full Audio → Draft Transcription → Claude Analysis → FFmpeg Split → Quality Transcription
See CLAUDE.md for detailed instructions.
-
Generate draft transcript:
bun run src/index.ts audiobook.mp3 --model tiny --mode text -o draft.txt
-
Ask Claude Code to analyze:
"Analyze this transcript and identify chapter breaks. Create a breakpoints.json file."
-
Split audio with FFmpeg:
# Claude will provide specific commands like: ffmpeg -i audiobook.mp3 -ss 0.000 -t 300.500 -c copy chapter_001.mp3 -
Transcribe each chapter:
bun run src/index.ts chapter_001.mp3 --mode sync -o chapter_001_sync.json
{
"metadata": {
"language": "en",
"model": "base",
"generated_at": "2024-12-08T..."
},
"segments": [
{
"id": 1,
"start": 0.0,
"end": 2.5,
"text": "Hello world this is a test",
"word_count": 6,
"complexity": 0.35
}
],
"words": [
{
"Id": 1,
"word": "Hello",
"start": 0.0,
"end": 0.4,
"buffer_applied": 0.02
}
],
"analytics": {
"total_words": 1250,
"total_duration": 180.5,
"words_per_minute": 145
}
}syncEngine/
├── src/
│ ├── cli/ # Terminal UI and argument parsing
│ ├── processing/
│ │ ├── segmentation/ # Word grouping algorithms
│ │ ├── timing/ # Buffer calculation and optimization
│ │ └── indexing/ # Performance indices
│ ├── transcription/ # Python bridge and setup
│ ├── output/ # JSON builder and analytics
│ ├── effect/ # Effect-TS services and errors
│ ├── pipeline/ # Chapter pipeline components
│ └── types/ # TypeScript definitions
├── python/
│ └── transcribe.py # faster-whisper wrapper
├── benchmark/ # Benchmark scripts and results
├── examples/ # FFmpeg and usage examples
└── outputs/ # Generated files (gitignored)
- Bun 1.0+
- Python 3.10-3.12 (auto-installed via Homebrew on macOS)
- FFmpeg (for audio splitting in chapter pipeline)
Contributions are welcome!
# Clone and install
git clone https://github.com/ojowwalker77/syncEngine.git
cd syncEngine
bun install
# Run in development
bun run src/index.ts
# Type check
bun run typecheck
# Lint
bun run lintMIT