feat: Speaker Reaction Detection module (Goal 2 / Mid-Point Milestone) by bhuvan-somisetty · Pull Request #21 · PlanetRead/Intelligent-cc-generation

bhuvan-somisetty · 2026-05-12T07:14:39Z

Summary

Implements Goal 2 — Speaker Reaction Detection Module from issue #2, completing the mid-point milestone.

src/vision/extractor.py — FrameExtractor: extracts a configurable reaction window (100 ms before onset → 1 500 ms after event end, sampled at 5 fps) from any OpenCV-readable video file. Heavy cv2 imports are deferred so the module is importable without OpenCV installed.
src/vision/reaction.py — ReactionDetector: analyses a FrameWindow and returns a ReactionResult with a reaction_score in [0, 1]. Three signals are fused with fixed weights (45 / 35 / 20 %):
1. Optical-flow motion (cv2.calcOpticalFlowFarneback) — catches sudden scene-wide movement after the event onset
2. Face-landmark head shift (MediaPipe Face Mesh nose-tip displacement) — catches head turns and startled flinches
3. Mouth-open ratio (lip/eye landmark distance) — catches surprised open-mouth expressions
  If MediaPipe is not installed, head/mouth scores fall back to 0 and motion alone contributes — no ImportError at runtime.

Test plan

26 new tests covering FrameExtractor and ReactionDetector — all cv2 / mediapipe calls are mocked at sys.modules level, no ML stack needed
All 52 tests pass (pytest tests/ -v): 26 new (Goal 2) + 26 existing (Goal 1)
ReactionResult output connects directly into the CC Decision Engine (Goal 3): reaction_score is the visual confidence value that will be fused with AudioEvent.confidence

Refs #2

Signed-off-by: bhuvan-somisetty <somisettybhuvan5@gmail.com>

Implements the mid-point milestone from issue PlanetRead#2: visual analysis of video frames around each detected audio event to assign a reaction confidence score. src/vision/extractor.py — FrameExtractor Extracts a configurable reaction window (default: 100 ms before onset, 1 500 ms after end, sampled at 5 fps) from any OpenCV-readable video. All cv2 imports are deferred so the module is importable without OpenCV. src/vision/reaction.py — ReactionDetector Analyses a FrameWindow and returns a ReactionResult (score in [0, 1]). Three signals are fused with fixed weights (45 / 35 / 20): 1. Optical-flow motion magnitude via cv2.calcOpticalFlowFarneback 2. MediaPipe Face Mesh nose-tip shift (head turns / flinches) 3. Mouth-open ratio from lip/eye landmarks (surprise expressions) If mediapipe is absent, head/mouth scores fall back to 0 and only motion contributes — no ImportError at runtime. 52 tests pass (26 new + 26 existing Goal 1 tests), no ML stack required. Refs PlanetRead#2

bhuvan-somisetty added 2 commits May 10, 2026 10:39

feat: add Sound Event Detection module (Goal 1) with YAMNet backbone

340fdac

Signed-off-by: bhuvan-somisetty <somisettybhuvan5@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Speaker Reaction Detection module (Goal 2 / Mid-Point Milestone)#21

feat: Speaker Reaction Detection module (Goal 2 / Mid-Point Milestone)#21
bhuvan-somisetty wants to merge 2 commits into
PlanetRead:mainfrom
bhuvan-somisetty:feat/visual-reaction-goal2

bhuvan-somisetty commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bhuvan-somisetty commented May 12, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant