project setup#5
Open
mishradev1 wants to merge 2 commits into
Open
Conversation
- Add project structure with src/, config/, tests/ packages - Add comprehensive README with architecture diagram, setup, and usage - Add requirements.txt with all pipeline dependencies - Add setup.py with console script entry point - Add .gitignore for Python, ML models, and media files - Add AudioExtractor class using FFmpeg for video-to-audio conversion - Add centralized config/settings.py with defaults for all modules - Add 15 unit tests for AudioExtractor (all passing)
Author
|
@abinash-sketch @keerthiseelan-planetread |
|
let me know when can we connect. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR establishes the foundational project structure and core utilities for the Intelligent Closed Caption Generation tool. It sets up a clean, modular architecture designed to support the three main goals - Sound Event Detection, Speaker Reaction Detection, and the CC Decision Engine.
Fixes #2
Key Changes
Project Setup: Established a modular
src/directory structure with separate packages forutils/,detectors/,models/,engine/, andoutput/- each mapped to a specific pipeline stage.Configuration System (
config/settings.py): Created a centralized configuration module with documented defaults for all pipeline stages - audio extraction, sound event detection (YAMNet), reaction detection (MediaPipe), CC decision thresholds, and output formatting. This ensures all future modules share consistent, tunable parameters.Audio Extraction (
src/utils/audio_extractor.py): Built anAudioExtractorclass that uses FFmpeg directly viasubprocessto strip audio from video files and convert to 16kHz mono WAV - the format required by YAMNet for sound event classification. Includes input validation for 8 video formats, proper error handling, and timeout protection.Comprehensive README: Added full project documentation with an architecture diagram, installation guide, usage examples, project structure breakdown, and tech stack overview.
Dependencies & Packaging: Added
requirements.txtwith all planned dependencies (TensorFlow, OpenCV, MediaPipe, librosa, etc.) andsetup.pywith acc-suggestconsole script entry point.Test Suite: Added 15 unit tests covering AudioExtractor initialization, input validation, extraction success/failure, timeouts, and output path generation. All tests use mocked FFmpeg - no external tools needed to run them.
How to Test
Pull this branch and create a virtual environment
Install dependencies:
pip install -r requirements.txt
Run the test suite:
python -m pytest tests/ -v
Expected: 15 passed