feat: intelligent CC suggestion pipeline — all 3 goals + Hindi support by Naitik120gupta · Pull Request #13 · PlanetRead/Intelligent-cc-generation

Naitik120gupta · 2026-05-09T10:26:19Z

Intelligent CC Suggestion Tool — DMP 2026 Demo

Contributor: Naitik | Issue: #2

What this PR contains

A single-file, end-to-end working pipeline covering all 3 goals from the ticket. No complex file structure is needed.

Architecture

Video → [Module 1: YAMNet SED] → audio events + timestamps → [Module 2: MediaPipe Reaction] → visual confidence scores → [Module 3: Decision Engine] → SRT / JSON output

What makes this submission different

1. Hindi CC label support (ticket requirement — others missed this)
The ticket explicitly targets "Hindi and regional-language content."
This pipeline is the only submission with native Hindi output:

python intelligent_cc_pipeline.py --video input.mp4 --lang hi
# Output: [तालियाँ], [विस्फोट], [गोलीबारी], [सायरन]

2. Audio-only bypass for high-impact events
Safety-critical sounds (gunshot, explosion, siren, alarm) get approved
on strong audio confidence alone (≥ 0.75), even without a visible face reaction.
Rationale: a gunshot off-camera still warrants a CC. This is a named
--audio-only-thresh flag the user can tune or disable.

3. Freeze response detection in Module 2
Most implementations only detect motion spikes. This pipeline also scores
sudden stillness after an event — the startle freeze response — as a
reaction signal. This catches a class of reactions other visual models miss.

4. Transparent decision basis in JSON output
Every accepted CC includes a decision_basis field:
"audio+visual", "audio_only_high_confidence", or "high_impact_bypass"
so editors know exactly why each CC was approved.

Sample output (Hindi mode, canva.mp4)

Time	Label (EN)	Label (HI)	Audio	Visual	Combined
6.24s	[APPLAUSE]	[तालियाँ]	1.00	0.00	approved
13.92s	[EXPLOSION]	[विस्फोट]	0.97	0.50	0.71
13.44s	[GUNSHOT]	[गोलीबारी]	0.95	0.00	bypass

Files

intelligent_cc_pipeline.py — full pipeline, all 3 modules
README.md — installation, usage, design decisions, known limitations
sample_output_en.srt — English CC output
sample_output_hi.srt — Hindi CC output
sample_report.json — full JSON report with decision_basis per event

Demo video

intelligent_cc_pipeline.mp4

Youtube link of the video https://youtu.be/zn3huIukfiY

Known limitations

YAMNet has no culturally-specific Indian sound classes (dhol, shehnai)
Module 2 tracks one face — multi-speaker scenes need extension
CPU-only; GPU would significantly reduce processing time
SLS format needs PlanetRead's exact spec to finalize byte-level output

feat: intelligent CC suggestion pipeline — all 3 goals + Hindi support

cf62141

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: intelligent CC suggestion pipeline — all 3 goals + Hindi support#13

feat: intelligent CC suggestion pipeline — all 3 goals + Hindi support#13
Naitik120gupta wants to merge 1 commit into
PlanetRead:mainfrom
Naitik120gupta:feat/intelligent-cc-pipeline-naitik

Naitik120gupta commented May 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Naitik120gupta commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Intelligent CC Suggestion Tool — DMP 2026 Demo

What this PR contains

Architecture

What makes this submission different

Sample output (Hindi mode, canva.mp4)

Files

Demo video

Known limitations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Naitik120gupta commented May 9, 2026 •

edited

Loading