[DMP 2026] Intelligent CC Suggestion Tool Complete Pipeline (Goals 1, 2 & 3)#8
Open
Ashutoshx7 wants to merge 28 commits into
Open
[DMP 2026] Intelligent CC Suggestion Tool Complete Pipeline (Goals 1, 2 & 3)#8Ashutoshx7 wants to merge 28 commits into
Ashutoshx7 wants to merge 28 commits into
Conversation
Goals 1, 2 & 3 — full end-to-end pipeline: Goal 1 — Sound Event Detection: - YAMNet-based audio classification (521 AudioSet classes) - WebRTC VAD / energy-based speech filtering - Consecutive event merging with peak confidence Goal 2 — Speaker Reaction Detection: - Temporal reaction windows (300ms-1500ms after sound onset) - Scene cut detection (histogram comparison) to prevent false positives - MediaPipe PoseLandmarker (flinch/head turn via shoulder/ear/nose landmarks) - MediaPipe FaceLandmarker (surprise via mouth openness) - Multi-person scoring (max reaction across all detected people) Goal 3 — CC Decision Engine + SRT Output: - Category-aware fusion weights (high_impact, interactive, social, ambient) - Speech-pause bonus for interrupted dialogue - Scene-cut fallback to audio-only scoring - Standard SRT output with human-readable summary - IoU-based evaluation framework (P/R/F1/overcaption rate) 19/19 tests passing. Full pipeline tested end-to-end.
- FastAPI backend: upload, async pipeline processing, event toggle, SRT export - Minimal black & white design: Inter + JetBrains Mono, 1px borders, subtle surfaces - Results page: stats bar, video player, interactive timeline, event cards with toggles - SRT live preview, accept/reject per event, download export - Audio extractor: OpenCV fallback for environments without ffmpeg
…on support - Created eval/ground_truth/test_clip.json with 3 annotated events - Added eval/__init__.py for clean package imports - Verified --evaluate mode runs end-to-end with P/R/F1 output
- requirements.txt: added fastapi, uvicorn, python-multipart, pytest - .gitignore: added web/uploads/, stopped ignoring .avi/.mp4 source files - README.md: documented web UI, setup, testing, Hindi support - setup.sh: auto-generates test data, prints web UI usage - pipeline.py: fixed WAV cleanup to preserve pre-existing test audio
- generate_demo_data.py: 15s video with siren, alarm, bell, knock - speech_filter.py: raised energy thresholds to prevent non-speech sounds from being filtered as speech - demo pipeline now shows 15 detected → 6 accepted with category filtering
New features: - report_generator.py: professional HTML report with dark monochrome design (stats grid, category distribution, color-coded event table, SRT preview) - report_generator.py: JSON report with full event data for integration - label_mapper.py: expanded from 30 to 120+ AudioSet class mappings (high impact, interactive, social, transport, physical, India-specific, nature) - Pipeline now auto-generates _report.html and _report.json alongside SRT Tests: 30 passing (was 19) - TestReportGenerator: JSON structure, HTML elements, filter rate - TestExtendedLabels: India-specific, high impact, social, transport, nature - TestEnergyVAD: threshold behavior, silent/loud detection
…, timeline, contact
…ke animation for high-impact
…= 90% filter rate
…+ smarter detection logic
… reduce false reactions + auto cache-bust
…overlay, top-3 detection)
…erlay, 150+ labels
…eyboard shortcuts (Space/arrows/J/K) + SLS export button
…s, and correct repo links
Author
|
Hi @abinash-sketch, I’m available for the interview. I’m happy to connect at any time that works best for you. Thank you. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Intelligent CC Suggestion Tool
fixes #2
Proposal DOCS LINK - https://docs.google.com/document/d/18DmqvkRuaiw3bRKe-c_-T5-KrA5auziadDFyorAlWG8/edit?usp=sharing
AI-powered tool that identifies moments in a video where a Closed Caption (CC) annotation is genuinely necessary — such as when a non-speech audio event meaningfully affects the speakers or the scene — and suggests contextually relevant CC text, without over-captioning routine or low-impact sounds.
PLanet.Read.mp4
Youtube link of the video https://youtu.be/zOPK43g-OwQ?si=iPUpVk_uKhRCEQDV
Architecture
Key Innovations
Setup
The setup script downloads MediaPipe model files to
models/.Usage
CLI (Command Line)
Web UI
python web/app.py # Open http://localhost:8000The web interface provides:
Spaceplay/pause,←→seek ±5s,J/Kjump between eventsOutput
The CLI produces:
<video>_cc.srt— Standard SRT subtitle file with CC annotations<video>_cc.sls— SLS (Same Language Subtitling) format with score metadata<video>_cc_summary.txt— Human-readable report showing accepted/rejected events with scoresExample SRT Output
Configuration
All thresholds are tunable via YAML config — zero hardcoded magic numbers.
config/default.yaml— Pipeline settings (confidence thresholds, reaction window timing, fusion weights)config/sound_categories.yaml— Category-aware weights per sound typeSound Categories
Project Structure
Testing
Tech Stack
Evaluation Metrics
Hindi/Regional Content Support
Known Limitations
default.yaml.What I'd Improve Next