feat: Implement foundational MVP pipeline for Whisper-based closed caption generation by ravencore06 · Pull Request #4 · PlanetRead/Intelligent-cc-generation

ravencore06 · 2026-05-03T04:01:10Z

This PR introduces the foundational Minimum Viable Product (MVP) pipeline for the Intelligent Closed Caption Generation tool. It establishes the core project structure, dependencies, and functional modules to successfully ingest video, transcribe speech via AI, and output mathematically precise subtitle files.

Fixes #2

Key Changes

Project Scaffolding: Established a clean, modular src/ directory structure separating audio extraction, ASR, and subtitle formatting.
Audio Extraction (src/audio_extractor.py): Integrated moviepy to dynamically strip audio tracks from incoming video files and format them to 16kHz WAV format (optimal for Whisper).
ASR Engine (src/speech_to_text.py): Integrated openai-whisper for highly accurate speech-to-text with precise segment timestamps.
Zero-Config FFmpeg Handling: Engineered a runtime injection of imageio-ffmpeg to automatically expose bundled FFmpeg binaries to Whisper. This prevents the pipeline from crashing on Windows machines that do not have FFmpeg manually installed in their system PATH.
Subtitle Generation (src/subtitle_generator.py): Created a custom formatter to convert raw Whisper segment timestamps into standard SubRip (.srt) syntax.
CLI Entry Point (main.py): Added a simple command-line interface utilizing argparse to allow easy execution of the pipeline.

How to Test

Pull this branch and ensure your virtual environment is active.
Install the new dependencies:
```
pip install -r requirements.txt
```

ravencore06 · 2026-05-03T04:03:58Z

@keerthiseelan-planetread , could you please review this MVP and suggest if the direction aligns with project goals?

abinash-sketch · 2026-05-07T04:42:41Z

Let me know when can we connect.

ravencore06 · 2026-05-07T08:51:53Z

Hi @abinash-sketch,thank you for reaching out.
I’d be happy to connect and discuss the project further.

Email: srinidhisadhanala@gmail.com
LinkedIn:https://www.linkedin.com/in/srinidhi-sadhanala-raven21

my timings are flexible. So please feel free to connect whenever convenient for you.

ravencore06 · 2026-05-08T14:12:15Z

Module 1: Sound Event Detection Demo

Detects sounds like car horns, gunshots, dog barks, and sirens from audio using YAMNet + MediaPipe. Outputs timestamps and confidence scores.

Files: sound_event_detection.py (core), demo_module1.py (runner), create_demo_samples.py (downloads 6 sample clips)

To run:

pip install -r requirements.txt
python create_demo_samples.py && python demo_module1.py --all
Results: All 6 test sounds detected correctly (car horn 80%, siren 67%, dog bark 85%, glass breaking 33%, gunshot 89%, engine 67%).

sample inputs:
car_horn.wav

dog_bark.wav

glass_break.wav

Demo video:
https://drive.google.com/file/d/1t2Wag1vHcxlU2NVxShGO8zol5845mjb0/view?usp=sharing

ravencore06 · 2026-05-12T06:28:14Z

Module 2: Visual Scene/Action Detection

Detects visual events from video frames:

Objects - person, vehicle, helicopter, dog, etc. (via MediaPipe Image Classifier / MobileNet)
Actions - punching, fall down (via MediaPipe Pose keypoint analysis)
Scene changes - camera cuts, location switches (via histogram comparison)
Outputs events with timestamps as JSON.

To run: python demo_module2.py "Avengers vs Ultron.mp4"

Input video:

https://drive.google.com/file/d/1mcOrIeyXr_A3lWq6fktQUJ14eS8vw6xQ/view?usp=sharing

demo video:

https://drive.google.com/file/d/1mMGjgSg7qD4oFw_Npz7l8Omrud60FCmx/view?usp=sharing

ravencore06 · 2026-05-12T08:36:49Z

Module 3 takes sounds + visuals + speech and combines them into a single subtitle file.

It decides which caption is most important at every moment. If a gunshot happens during dialogue, the gunshot shows first — the speech waits until it's quiet again. It outputs a .srt file you can load into VLC or any video player.

Input video: https://drive.google.com/file/d/1mcOrIeyXr_A3lWq6fktQUJ14eS8vw6xQ/view?usp=sharing

demo video:

Screen.Recording.2026-05-12.140457.mp4

MVP Pipeline for Whisper based CC generation

c324631

ravencore06 added 3 commits May 8, 2026 15:19

Readme Changed

0a1b512

Readme fixed

60416c4

Module 1

60a766a

Module 2

2387632

Module 3

0b749f3

ravencore06 added 2 commits May 12, 2026 20:50

Readme added

3f7038e

Project Proposal added

92d2a18

ravencore06 mentioned this pull request May 14, 2026

[DMP 2026]: Create Intelligent Closed Caption (CC) Suggestion Tool #2

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement foundational MVP pipeline for Whisper-based closed caption generation#4

feat: Implement foundational MVP pipeline for Whisper-based closed caption generation#4
ravencore06 wants to merge 8 commits into
PlanetRead:mainfrom
ravencore06:cc-audio-extraction

ravencore06 commented May 3, 2026

Uh oh!

ravencore06 commented May 3, 2026

Uh oh!

abinash-sketch commented May 7, 2026

Uh oh!

ravencore06 commented May 7, 2026 •

edited

Loading

Uh oh!

ravencore06 commented May 8, 2026 •

edited

Loading

Uh oh!

ravencore06 commented May 12, 2026

Uh oh!

ravencore06 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ravencore06 commented May 3, 2026

Key Changes

How to Test

Uh oh!

ravencore06 commented May 3, 2026

Uh oh!

abinash-sketch commented May 7, 2026

Uh oh!

ravencore06 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ravencore06 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ravencore06 commented May 12, 2026

Uh oh!

ravencore06 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ravencore06 commented May 7, 2026 •

edited

Loading

ravencore06 commented May 8, 2026 •

edited

Loading