Skip to content

project setup#5

Open
mishradev1 wants to merge 2 commits into
PlanetRead:mainfrom
mishradev1:feat/project-setup
Open

project setup#5
mishradev1 wants to merge 2 commits into
PlanetRead:mainfrom
mishradev1:feat/project-setup

Conversation

@mishradev1
Copy link
Copy Markdown

@mishradev1 mishradev1 commented May 4, 2026

This PR establishes the foundational project structure and core utilities for the Intelligent Closed Caption Generation tool. It sets up a clean, modular architecture designed to support the three main goals - Sound Event Detection, Speaker Reaction Detection, and the CC Decision Engine.

Fixes #2

Key Changes

  • Project Setup: Established a modular src/ directory structure with separate packages for utils/, detectors/, models/, engine/, and output/ - each mapped to a specific pipeline stage.

  • Configuration System (config/settings.py): Created a centralized configuration module with documented defaults for all pipeline stages - audio extraction, sound event detection (YAMNet), reaction detection (MediaPipe), CC decision thresholds, and output formatting. This ensures all future modules share consistent, tunable parameters.

  • Audio Extraction (src/utils/audio_extractor.py): Built an AudioExtractor class that uses FFmpeg directly via subprocess to strip audio from video files and convert to 16kHz mono WAV - the format required by YAMNet for sound event classification. Includes input validation for 8 video formats, proper error handling, and timeout protection.

  • Comprehensive README: Added full project documentation with an architecture diagram, installation guide, usage examples, project structure breakdown, and tech stack overview.

  • Dependencies & Packaging: Added requirements.txt with all planned dependencies (TensorFlow, OpenCV, MediaPipe, librosa, etc.) and setup.py with a cc-suggest console script entry point.

  • Test Suite: Added 15 unit tests covering AudioExtractor initialization, input validation, extraction success/failure, timeouts, and output path generation. All tests use mocked FFmpeg - no external tools needed to run them.

How to Test

  1. Pull this branch and create a virtual environment

  2. Install dependencies:
    pip install -r requirements.txt

  3. Run the test suite:
    python -m pytest tests/ -v

Expected: 15 passed

mishradev1 and others added 2 commits May 5, 2026 00:14
- Add project structure with src/, config/, tests/ packages
- Add comprehensive README with architecture diagram, setup, and usage
- Add requirements.txt with all pipeline dependencies
- Add setup.py with console script entry point
- Add .gitignore for Python, ML models, and media files
- Add AudioExtractor class using FFmpeg for video-to-audio conversion
- Add centralized config/settings.py with defaults for all modules
- Add 15 unit tests for AudioExtractor (all passing)
@mishradev1
Copy link
Copy Markdown
Author

@abinash-sketch @keerthiseelan-planetread
Could you please review this initial setup and let me know if the project structure and direction align with the goals? The next PR I am working on will build on this to add the Sound Event Detection module (Goal 1) and Speaker Reaction Detection module (Goal 2).

@mishradev1 mishradev1 marked this pull request as draft May 4, 2026 19:08
@mishradev1 mishradev1 marked this pull request as ready for review May 4, 2026 19:08
@abinash-sketch
Copy link
Copy Markdown

let me know when can we connect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DMP 2026]: Create Intelligent Closed Caption (CC) Suggestion Tool

2 participants