Skip to content

feat: Add Sound Event Detection (SED) module using YAMNet backbone#14

Open
Alokkohli200 wants to merge 2 commits into
PlanetRead:mainfrom
Alokkohli200:feature/sed-module-alok
Open

feat: Add Sound Event Detection (SED) module using YAMNet backbone#14
Alokkohli200 wants to merge 2 commits into
PlanetRead:mainfrom
Alokkohli200:feature/sed-module-alok

Conversation

@Alokkohli200
Copy link
Copy Markdown

Overview

I have implemented the Sound Event Detection (SED) module as part of the Intelligent CC Suggestion Tool. This module is designed to identify contextually significant non-speech audio events to enhance accessibility for regional content.

Technical Implementation

  • Model: Utilized the YAMNet architecture via TensorFlow Hub, chosen for its high accuracy in environmental sound classification.
  • Pipeline: Developed a robust extraction process that converts video audio to 16kHz mono waveforms as required by the model.
  • Filtering Logic: Implemented a confidence-based filter (threshold 0.25) to ignore ambient noise (Silence, White noise) while capturing impactful events like footsteps, music, or sirens.
  • Environment: Optimized and verified the pipeline for Apple Silicon (M4) using Python 3.12, with specific dependency pinning (protobuf<5) to ensure compatibility.

Demo & Verification

Future Improvements

  • Implement temporal smoothing to merge consecutive short-window detections into single, continuous captions.
  • Integration with the SRT generator module for automated caption insertion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant