LLM Transcribe

LLM Transcribe is a Python CLI tool that leverages state-of-the-art Large Language Models to convert audio files into richly annotated transcriptions with timestamps, speaker identification, and emotional context.

Built for developers and technical users who need high-quality transcription with minimal setup, it supports all major LLM providers (Google Gemini, OpenAI, Anthropic) through a single interface, handles arbitrarily long audio files—from quick voice notes to multi-hour meetings—and produces structured output that captures not just what was said but how it was said, including non-verbal cues like laughter, hesitation, or frustration.

Installation

We recommend using uv

uv tool install llm-transcribe

After installation, the llm-transcribe command will be available in your PATH.

Usage

# Simplest way, result will be in audio.txt
llm-transcribe audio.wav

# Specify model and output filename
llm-transcribe -m gemini-2.5-flash -o audio.out.txt audio.wav

# Test setup without processing audio
llm-transcribe audio.wav --test

# Add context for better transcription
llm-transcribe -c "Meeting between John and Kate about Q3 revenue projections" audio.wav

# Get help
llm-transcribe --help

How it Works

Audio Processing: Audio file is split into 10-minute chunks with 1-minute overlap using a sliding-window approach.
- Chunks are technically up to 11 minutes long for context preservation
- Timing: [0:00-10:00], [9:00-20:00], [19:00-30:00], etc.
- Supports all common audio formats (WAV, MP3, M4A, etc.)
LLM Transcription: Each chunk is transcribed using modern LLMs instead of traditional speech-to-text methods.
- Audio converted to base64 and sent with carefully crafted prompts
- Context from previous chunk's final minute maintains speaker consistency
- Superior accuracy through understanding context and capturing emotional nuances
- Output format: timestamped lines with speaker identification and non-verbal cues
Context Management: Maintains conversation flow across chunks.
- Previous chunk's final minute provides context for the next chunk
- Speaker names and roles preserved throughout entire transcription
- Handles complex timestamp arithmetic (chunk-relative to absolute time)
Production Ready: Built on modern Python stack with robust error handling.
- Retry logic for API failures and rate limiting
- Cost tracking and provider abstraction via LiteLLM
- Supports all major LLM providers through unified interface

Output Format

The transcription output includes timestamps and speaker identification:

[01:23:45] John: We should start the meeting now.
[01:23:52] Sarah: I agree. Let me pull up the agenda.
[01:24:05] Speaker 1: Can everyone hear me clearly?
[01:24:08] John: Yes, you're coming through fine.

Speaker Identification Priority

Named speakers: When identifiable by name (John, Sarah, etc.)
Role-based: When identifiable by role (Manager, Salesperson, etc.)
Numerical fallback: Unknown speakers labeled as "Speaker 1", "Speaker 2", etc.

Supported Models

Works with any LLM provider supported by LiteLLM:

Gemini: gemini-2.5-flash, gemini-1.5-pro
OpenAI: gpt-4o, gpt-4o-mini
Anthropic: claude-3-5-sonnet, claude-3-5-haiku
And many more: See LiteLLM documentation

Requirements

Python 3.13
FFmpeg (for audio processing)
API key for your chosen LLM provider

Configuration

Set your API key as an environment variable:

# For Gemini
export GOOGLE_API_KEY=your_api_key_here

# For OpenAI
export OPENAI_API_KEY=your_api_key_here

# For Anthropic
export ANTHROPIC_API_KEY=your_api_key_here

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
docs		docs
src/llm_transcribe		src/llm_transcribe
tests		tests
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Transcribe

Installation

Usage

How it Works

Output Format

Speaker Identification Priority

Supported Models

Requirements

Configuration

About

Uh oh!

Releases

Packages

Languages

rivol/llm-transcribe

Folders and files

Latest commit

History

Repository files navigation

LLM Transcribe

Installation

Usage

How it Works

Output Format

Speaker Identification Priority

Supported Models

Requirements

Configuration

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages