Feat: Intelligent Context-Aware Engine & Localized Subtitles (Resolves #2, Resolves #26)#27
Open
krishk2 wants to merge 2 commits into
Open
Feat: Intelligent Context-Aware Engine & Localized Subtitles (Resolves #2, Resolves #26)#27krishk2 wants to merge 2 commits into
krishk2 wants to merge 2 commits into
Conversation
…ration into feature/intelligent-cc-engine
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves Issue #2 and Issue #26
🎥 Demo Link
View Pipeline Execution Demo
🚀 Overview
This PR completely overhauls the AutoCC Multimodal Pipeline to solve critical localization, inference overhead, and foley-misclassification issues. By injecting an intelligent context-routing engine, we bypass YAMNet's inherent Western acoustic biases and gracefully handle dense, music-heavy audio environments.
⚙️ Pipeline Explanation
The AutoCC engine operates in 4 highly optimized phases:
cv2video pointers in RAM for zero-latency frame jumping.Sewing Machine➔[Rapid punches]), and generates the final context-awareoutput.srt.🧠 Unique Architectural Approaches
1. Overcoming Western Bias via Transfer Learning (Custom RF Classifier)
YAMNet natively misclassifies localized sounds (e.g., it cannot identify a Rickshaw Horn or a Dhak drum, mapping them to generic bells or noise).
RandomForestClassifier(trained on 5,800+ clips from the SAS-KIIT and Mendeley Indian Urban Environment datasets).Indian Crowd/Human (Local Context)).2. Defeating Background Interference via HPSS Music Stripping
Indian educational and cinematic media is notorious for aggressive background music. This causes YAMNet to endlessly detect "Music," masking the actual ambient events and stalling the pipeline with hundreds of false-positive visual checks.
librosa.--context indian, the script performs an acoustic "X-Ray." It mathematically splits the waveform, throws away the "Harmonic" frequencies (melodic music, sustained chords), and only feeds the raw "Percussive" transients (horns, crashes, dog barks) into YAMNet.3. Intelligent Foley-to-Semantic Mapping
Audio models are "blind" and take sounds literally. Rapid punches in an action scene are systematically mislabeled by YAMNet as a
[Sewing Machine]or[Fusillade]due to acoustic similarities.CaptionGenerator.[Sewing Machine]detection coupled with a high visual flinch score is intelligently rewritten into[Rapid punches].🛠️ Additional Optimizations Included
<2.0.0to resolve fatal_multiarray_umathcrashes with TensorFlow.MediaProcessorto persist thecv2.VideoCaptureobject in RAM, cutting video processing time from 10+ minutes down to ~15 seconds by eliminating redundant disk-reads.📦 Installation & Requirements
To run this pipeline, install the dependencies using the newly provided
requirements.txtfile.Warning
CRITICAL: The
requirements.txtexplicitly pinsnumpy<2.0. TensorFlow's C-API crashes when running YAMNet on newer versions of NumPy.💻 How to Run
To run the pipeline on a standard/Western video:
To run the pipeline on an Indian cinematic/educational video (enables HPSS Music Stripping & Local Models):
The final context-aware subtitles will be saved directly to
output.srt.