project setup by mishradev1 · Pull Request #5 · PlanetRead/Intelligent-cc-generation

mishradev1 · 2026-05-04T19:05:28Z

This PR establishes the foundational project structure and core utilities for the Intelligent Closed Caption Generation tool. It sets up a clean, modular architecture designed to support the three main goals - Sound Event Detection, Speaker Reaction Detection, and the CC Decision Engine.

Fixes #2

Key Changes

Project Setup: Established a modular src/ directory structure with separate packages for utils/, detectors/, models/, engine/, and output/ - each mapped to a specific pipeline stage.
Configuration System (config/settings.py): Created a centralized configuration module with documented defaults for all pipeline stages - audio extraction, sound event detection (YAMNet), reaction detection (MediaPipe), CC decision thresholds, and output formatting. This ensures all future modules share consistent, tunable parameters.
Audio Extraction (src/utils/audio_extractor.py): Built an AudioExtractor class that uses FFmpeg directly via subprocess to strip audio from video files and convert to 16kHz mono WAV - the format required by YAMNet for sound event classification. Includes input validation for 8 video formats, proper error handling, and timeout protection.
Comprehensive README: Added full project documentation with an architecture diagram, installation guide, usage examples, project structure breakdown, and tech stack overview.
Dependencies & Packaging: Added requirements.txt with all planned dependencies (TensorFlow, OpenCV, MediaPipe, librosa, etc.) and setup.py with a cc-suggest console script entry point.
Test Suite: Added 15 unit tests covering AudioExtractor initialization, input validation, extraction success/failure, timeouts, and output path generation. All tests use mocked FFmpeg - no external tools needed to run them.

How to Test

Pull this branch and create a virtual environment
Install dependencies:
pip install -r requirements.txt
Run the test suite:
python -m pytest tests/ -v

Expected: 15 passed

- Add project structure with src/, config/, tests/ packages - Add comprehensive README with architecture diagram, setup, and usage - Add requirements.txt with all pipeline dependencies - Add setup.py with console script entry point - Add .gitignore for Python, ML models, and media files - Add AudioExtractor class using FFmpeg for video-to-audio conversion - Add centralized config/settings.py with defaults for all modules - Add 15 unit tests for AudioExtractor (all passing)

mishradev1 · 2026-05-04T19:07:16Z

@abinash-sketch @keerthiseelan-planetread
Could you please review this initial setup and let me know if the project structure and direction align with the goals? The next PR I am working on will build on this to add the Sound Event Detection module (Goal 1) and Speaker Reaction Detection module (Goal 2).

abinash-sketch · 2026-05-07T04:42:13Z

let me know when can we connect.

mishradev1 and others added 2 commits May 5, 2026 00:14

Update README.md

22f4f0e

mishradev1 marked this pull request as draft May 4, 2026 19:08

mishradev1 marked this pull request as ready for review May 4, 2026 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

project setup#5

project setup#5
mishradev1 wants to merge 2 commits into
PlanetRead:mainfrom
mishradev1:feat/project-setup

mishradev1 commented May 4, 2026 •

edited

Loading

Uh oh!

mishradev1 commented May 4, 2026

Uh oh!

abinash-sketch commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mishradev1 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

How to Test

Uh oh!

mishradev1 commented May 4, 2026

Uh oh!

abinash-sketch commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mishradev1 commented May 4, 2026 •

edited

Loading