Skip to content

feat: Speaker Reaction Detection module (Goal 2 / Mid-Point Milestone)#21

Open
bhuvan-somisetty wants to merge 2 commits into
PlanetRead:mainfrom
bhuvan-somisetty:feat/visual-reaction-goal2
Open

feat: Speaker Reaction Detection module (Goal 2 / Mid-Point Milestone)#21
bhuvan-somisetty wants to merge 2 commits into
PlanetRead:mainfrom
bhuvan-somisetty:feat/visual-reaction-goal2

Conversation

@bhuvan-somisetty
Copy link
Copy Markdown

Summary

Implements Goal 2 — Speaker Reaction Detection Module from issue #2, completing the mid-point milestone.

  • src/vision/extractor.pyFrameExtractor: extracts a configurable reaction window (100 ms before onset → 1 500 ms after event end, sampled at 5 fps) from any OpenCV-readable video file. Heavy cv2 imports are deferred so the module is importable without OpenCV installed.
  • src/vision/reaction.pyReactionDetector: analyses a FrameWindow and returns a ReactionResult with a reaction_score in [0, 1]. Three signals are fused with fixed weights (45 / 35 / 20 %):
    1. Optical-flow motion (cv2.calcOpticalFlowFarneback) — catches sudden scene-wide movement after the event onset
    2. Face-landmark head shift (MediaPipe Face Mesh nose-tip displacement) — catches head turns and startled flinches
    3. Mouth-open ratio (lip/eye landmark distance) — catches surprised open-mouth expressions
      If MediaPipe is not installed, head/mouth scores fall back to 0 and motion alone contributes — no ImportError at runtime.

Test plan

  • 26 new tests covering FrameExtractor and ReactionDetector — all cv2 / mediapipe calls are mocked at sys.modules level, no ML stack needed
  • All 52 tests pass (pytest tests/ -v): 26 new (Goal 2) + 26 existing (Goal 1)
  • ReactionResult output connects directly into the CC Decision Engine (Goal 3): reaction_score is the visual confidence value that will be fused with AudioEvent.confidence

Refs #2

Signed-off-by: bhuvan-somisetty <somisettybhuvan5@gmail.com>
Implements the mid-point milestone from issue PlanetRead#2: visual analysis of video
frames around each detected audio event to assign a reaction confidence score.

src/vision/extractor.py — FrameExtractor
  Extracts a configurable reaction window (default: 100 ms before onset,
  1 500 ms after end, sampled at 5 fps) from any OpenCV-readable video.
  All cv2 imports are deferred so the module is importable without OpenCV.

src/vision/reaction.py — ReactionDetector
  Analyses a FrameWindow and returns a ReactionResult (score in [0, 1]).
  Three signals are fused with fixed weights (45 / 35 / 20):
    1. Optical-flow motion magnitude via cv2.calcOpticalFlowFarneback
    2. MediaPipe Face Mesh nose-tip shift (head turns / flinches)
    3. Mouth-open ratio from lip/eye landmarks (surprise expressions)
  If mediapipe is absent, head/mouth scores fall back to 0 and only
  motion contributes — no ImportError at runtime.

52 tests pass (26 new + 26 existing Goal 1 tests), no ML stack required.

Refs PlanetRead#2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant