Skip to content

ErikBahena/transcript-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TranscriptAI

TranscriptAI

Automatic topic segmentation and analysis for VTT transcripts, powered by local LLMs via Ollama.

What it does

Give it a .vtt transcript file and it will:

  1. Detect topic boundaries using embedding-based DeepTiling — no LLM calls needed for segmentation
  2. Analyze each topic with a single LLM call — titles, summaries, keywords, key quotes, and subtopics
  3. Output structured JSON ready for downstream consumption
  4. Visualize topics on a timeline

A 5-minute transcript processes in ~30 seconds (0.10x real-time factor) on Apple Silicon with gemma3:4b.

How it works

Boundary detection uses a DeepTiling algorithm:

  • Sliding window embeddings via nomic-embed-text
  • Cosine similarity curves between left/right context windows
  • Depth score computation to find topic transitions
  • Automatic merging of short segments using cached embeddings (zero additional API calls)

Analysis sends all topics to the LLM in a single call, producing:

  • Specific, descriptive titles using proper nouns
  • Narrative summaries explaining arguments and tensions
  • Named entity keywords (not generic category words)
  • Verbatim key quotes with context explaining why they matter

Requirements

  • Python 3.x
  • Ollama running locally
  • Models: gemma3:4b (analysis) + nomic-embed-text (embeddings)
pip install requests matplotlib pandas numpy
ollama pull gemma3:4b
ollama pull nomic-embed-text

Usage

# Basic analysis
python main.py transcript.vtt

# With visualization
python main.py transcript.vtt --visualize

# Custom output path
python main.py transcript.vtt -o analysis.json

# List available models
python main.py transcript.vtt --list-models

CLI Options

Flag Default Description
--output, -o vtt_analysis.json Output JSON path
--visualize, -v off Generate timeline visualization
--visualization-output topic_timeline.png Visualization output path
--llm-url http://localhost:11434 Ollama API URL
--model gemma3:4b LLM model for analysis
--num-ctx 3072 LLM context window size
--max-workers 4 Concurrent LLM requests (fallback mode)
--boundary-method embedding embedding, llm, or hybrid
--boundary-sensitivity 0.1 Higher = fewer boundaries (0.0–2.0)
--embedding-model nomic-embed-text Model for embeddings

Example Output

{
  "topics": [
    {
      "title": "Washaway Beach: A Cycle of Loss",
      "summary": "A coastal town faces devastating property loss due to rising sea levels...",
      "keywords": ["Washaway Beach", "coastal erosion", "home prices"],
      "key_quotes": [
        {
          "quote": "Year after year, more homes would fall in the water.",
          "context": "Establishes the scale and ongoing nature of the erosion crisis."
        }
      ],
      "subtopics": [...],
      "timespan": "0:00:16 - 0:00:49",
      "start_seconds": 16.66,
      "end_seconds": 49.659
    }
  ],
  "summary": {
    "total_duration": "0:05:09",
    "topic_count": 7
  }
}

Timeline Visualization

Visualization

Architecture

Component Role
VTTParser Parses VTT files into timestamped segments
EmbeddingBoundaryDetector DeepTiling boundary detection with embedding cache
LLMClient Ollama API interface with connection pooling and retry
VTTAnalyzer Orchestrates the full pipeline
TopicSegment Data model for analyzed topics

License

MIT

About

AI-powered subtitle analysis that intelligently segments video transcripts into topics, extracts key insights, and visualizes content structure—all using local language models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages