Automatic topic segmentation and analysis for VTT transcripts, powered by local LLMs via Ollama.
Give it a .vtt transcript file and it will:
- Detect topic boundaries using embedding-based DeepTiling — no LLM calls needed for segmentation
- Analyze each topic with a single LLM call — titles, summaries, keywords, key quotes, and subtopics
- Output structured JSON ready for downstream consumption
- Visualize topics on a timeline
A 5-minute transcript processes in ~30 seconds (0.10x real-time factor) on Apple Silicon with gemma3:4b.
Boundary detection uses a DeepTiling algorithm:
- Sliding window embeddings via
nomic-embed-text - Cosine similarity curves between left/right context windows
- Depth score computation to find topic transitions
- Automatic merging of short segments using cached embeddings (zero additional API calls)
Analysis sends all topics to the LLM in a single call, producing:
- Specific, descriptive titles using proper nouns
- Narrative summaries explaining arguments and tensions
- Named entity keywords (not generic category words)
- Verbatim key quotes with context explaining why they matter
- Python 3.x
- Ollama running locally
- Models:
gemma3:4b(analysis) +nomic-embed-text(embeddings)
pip install requests matplotlib pandas numpy
ollama pull gemma3:4b
ollama pull nomic-embed-text# Basic analysis
python main.py transcript.vtt
# With visualization
python main.py transcript.vtt --visualize
# Custom output path
python main.py transcript.vtt -o analysis.json
# List available models
python main.py transcript.vtt --list-models| Flag | Default | Description |
|---|---|---|
--output, -o |
vtt_analysis.json |
Output JSON path |
--visualize, -v |
off | Generate timeline visualization |
--visualization-output |
topic_timeline.png |
Visualization output path |
--llm-url |
http://localhost:11434 |
Ollama API URL |
--model |
gemma3:4b |
LLM model for analysis |
--num-ctx |
3072 |
LLM context window size |
--max-workers |
4 |
Concurrent LLM requests (fallback mode) |
--boundary-method |
embedding |
embedding, llm, or hybrid |
--boundary-sensitivity |
0.1 |
Higher = fewer boundaries (0.0–2.0) |
--embedding-model |
nomic-embed-text |
Model for embeddings |
{
"topics": [
{
"title": "Washaway Beach: A Cycle of Loss",
"summary": "A coastal town faces devastating property loss due to rising sea levels...",
"keywords": ["Washaway Beach", "coastal erosion", "home prices"],
"key_quotes": [
{
"quote": "Year after year, more homes would fall in the water.",
"context": "Establishes the scale and ongoing nature of the erosion crisis."
}
],
"subtopics": [...],
"timespan": "0:00:16 - 0:00:49",
"start_seconds": 16.66,
"end_seconds": 49.659
}
],
"summary": {
"total_duration": "0:05:09",
"topic_count": 7
}
}| Component | Role |
|---|---|
VTTParser |
Parses VTT files into timestamped segments |
EmbeddingBoundaryDetector |
DeepTiling boundary detection with embedding cache |
LLMClient |
Ollama API interface with connection pooling and retry |
VTTAnalyzer |
Orchestrates the full pipeline |
TopicSegment |
Data model for analyzed topics |
MIT

