Skip to content

Local audio transcription engine with word-level timing — indexed JSON output built for downstream processing.

Notifications You must be signed in to change notification settings

ojowwalker77/syncEngine

Repository files navigation

Sync Engine

High-precision audio transcription with word-level timestamps

Bun License

FeaturesInstallationUsageBenchmarksAudiobook Pipeline


Features

  • Word-Level Precision — Each word gets accurate start/end timestamps with adaptive buffers
  • Smart Segmentation — Automatically groups words into readable segments (2-15 words) based on pauses and punctuation
  • Interactive CLI — Beautiful terminal UI with file picker, fuzzy search, and keyboard navigation
  • Zero Config — Python environment auto-setup, no manual installation required
  • Blazing Fast — Optimized with int8 quantization, processes 41s audio in ~2s
  • Audiobook Support — Chapter-based pipeline for long audio files with Claude Code integration
  • Two Output Modes
    • sync — Full JSON with timestamps, segments, and analytics
    • text — Plain transcription for quick exports
  • Multi-Language — English, Portuguese, Spanish, French, German, and more

How It Works

Audio File → faster-whisper (Python) → Word Timestamps → Segmentation Engine → Timing Optimizer → JSON Output
  1. Transcription — Uses faster-whisper with int8 quantization for fast CPU inference
  2. Segmentation — Groups words using weighted boundary scoring (pauses, punctuation, sentence structure)
  3. Timing — Applies per-word complexity analysis with asymmetric buffers (60% lead-in, 40% trail-out)
  4. Indexing — Creates time-bucket indices for O(log n) lookups

Installation

git clone https://github.com/ojowwalker77/syncEngine.git
cd syncEngine
bun install

That's it. Python dependencies are installed automatically on first run.

Usage

Interactive Mode (Recommended)

bun run src/index.ts
  1. Type @ and select your audio file
  2. Choose language, model, and output mode
  3. Press Enter on Use Selection

CLI Mode

# Basic usage
bun run src/index.ts audio.mp3

# With options
bun run src/index.ts audio.mp3 -l pt -m base --mode sync

# Text-only output (fast)
bun run src/index.ts audio.mp3 --mode text -o transcript.txt

Options

Flag Description Default
-l, --language Language code (en, pt, es, etc.) en
-m, --model Whisper model (tiny, base, small, medium) base
-o, --output Output file path outputs/<name>_sync.json
--mode Output mode (sync, text) sync
--verbose Detailed logging false

Benchmarks

Test audio: 41-second Alice in Wonderland excerpt (ElevenLabs TTS)

Model Time RTF Quality
tiny 1.66s 0.04x Good (minor errors)
base 2.55s 0.06x Very Good
small 8.46s 0.21x Excellent
medium 16.06s 0.39x Excellent

RTF = Real-Time Factor (lower is faster). All models are faster than realtime.

Recommendation: Use base for best speed/quality balance. Use tiny for drafts.

See benchmark/BENCHMARK.md for detailed results.

Run Benchmarks

# Run benchmark on your audio
./benchmark/run.sh your_audio.mp3

Audiobook Pipeline

For long audio files (1hr+), use the chapter-based pipeline with Claude Code integration:

Full Audio → Draft Transcription → Claude Analysis → FFmpeg Split → Quality Transcription

See CLAUDE.md for detailed instructions.

Quick Start

  1. Generate draft transcript:

    bun run src/index.ts audiobook.mp3 --model tiny --mode text -o draft.txt
  2. Ask Claude Code to analyze:

    "Analyze this transcript and identify chapter breaks. Create a breakpoints.json file."

  3. Split audio with FFmpeg:

    # Claude will provide specific commands like:
    ffmpeg -i audiobook.mp3 -ss 0.000 -t 300.500 -c copy chapter_001.mp3
  4. Transcribe each chapter:

    bun run src/index.ts chapter_001.mp3 --mode sync -o chapter_001_sync.json

Output Format

Sync Mode (JSON)

{
  "metadata": {
    "language": "en",
    "model": "base",
    "generated_at": "2024-12-08T..."
  },
  "segments": [
    {
      "id": 1,
      "start": 0.0,
      "end": 2.5,
      "text": "Hello world this is a test",
      "word_count": 6,
      "complexity": 0.35
    }
  ],
  "words": [
    {
      "Id": 1,
      "word": "Hello",
      "start": 0.0,
      "end": 0.4,
      "buffer_applied": 0.02
    }
  ],
  "analytics": {
    "total_words": 1250,
    "total_duration": 180.5,
    "words_per_minute": 145
  }
}

Project Structure

syncEngine/
├── src/
│   ├── cli/              # Terminal UI and argument parsing
│   ├── processing/
│   │   ├── segmentation/ # Word grouping algorithms
│   │   ├── timing/       # Buffer calculation and optimization
│   │   └── indexing/     # Performance indices
│   ├── transcription/    # Python bridge and setup
│   ├── output/           # JSON builder and analytics
│   ├── effect/           # Effect-TS services and errors
│   ├── pipeline/         # Chapter pipeline components
│   └── types/            # TypeScript definitions
├── python/
│   └── transcribe.py     # faster-whisper wrapper
├── benchmark/            # Benchmark scripts and results
├── examples/             # FFmpeg and usage examples
└── outputs/              # Generated files (gitignored)

Requirements

  • Bun 1.0+
  • Python 3.10-3.12 (auto-installed via Homebrew on macOS)
  • FFmpeg (for audio splitting in chapter pipeline)

Contributing

Contributions are welcome!

# Clone and install
git clone https://github.com/ojowwalker77/syncEngine.git
cd syncEngine
bun install

# Run in development
bun run src/index.ts

# Type check
bun run typecheck

# Lint
bun run lint

License

MIT


About

Local audio transcription engine with word-level timing — indexed JSON output built for downstream processing.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published