Skip to content

RishvinReddy/HandMatrix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

โœ‹ HandMatrix Neural Engine

AI-Powered Gesture Control System ยท Real-Time Computer Vision ยท Multi-Modal Human-Computer Interaction

Touchless Control. Real-Time Intelligence. Customizable Interaction.


๐Ÿ“‹ Table of Contents

Section Description
๐Ÿš€ Overview Project introduction and vision
๐ŸŽฏ Problem Statement What we solve and why
๐Ÿง  System Architecture Full system design & data flow
โš™๏ธ Tech Stack Technologies and libraries used
โœ‹ Gesture Library Complete gesture-to-action mapping
๐Ÿ” Data Flow Pipeline Frame-to-action signal chain
๐Ÿ“ฆ Module Breakdown Component responsibilities
๐Ÿ“ Project Structure Directory layout
โšก Installation Setup and run locally
๐Ÿงช Modes & Profiles Control modes overview
๐Ÿ“Š Performance Metrics Benchmarks and targets
โš ๏ธ Challenges & Solutions Known issues and mitigations
๐Ÿ”ฎ Roadmap Future development plan
๐Ÿค Contributing How to contribute

๐Ÿš€ Overview

HandMatrix Neural Engine is a production-grade, AI-powered multi-modal gesture control system that enables users to interact with computers and digital environments entirely through natural movement โ€” no physical input devices required.

It combines Google MediaPipe's landmark detection, Gemini AI's reasoning layer, and a React + TypeScript frontend dashboard to deliver a fully customizable, real-time touchless interaction engine.

What makes it different?

Traditional Input    โ†’    Static, physical, limited, inaccessible
HandMatrix           โ†’    Dynamic, AI-driven, touchless, fully customizable
Capability Description
โœ‹ Single-Hand Control Cursor movement, clicking, scrolling via index finger and pinch
๐Ÿคš Two-Hand Gestures Zoom, pan, volume, brightness via relative hand distance/angle
๐Ÿ‘ค Face-Based Control Head tilt scroll, blink click, nod shortcuts
๐Ÿง  AI Gesture Learning Gemini AI analyzes gesture patterns and adapts mappings
๐ŸŽ›๏ธ Custom Mappings JSON-configurable gesture โ†’ OS action bindings
๐Ÿ“Š Live Dashboard Real-time React UI for landmark visualization and control

๐ŸŽฏ Problem Statement

The Accessibility Gap in Human-Computer Interaction

mindmap
  root((HCI Problem))
    Physical Barriers
      Motor disabilities
      Limited dexterity
      Post-injury recovery
    Context Limitations
      Sterile environments
      Hands-free scenarios
      Industrial operation
    Immersion Deficits
      Gaming latency
      Non-intuitive controls
      No spatial awareness
    Technology Gaps
      No AI adaptation
      No personalization
      Static keybindings
Loading

Traditional input devices (mouse, keyboard, touchpad) suffer from:

Problem Impact HandMatrix Solution
โŒ No touchless input Inaccessible to motor-impaired users โœ… Full gesture-based OS control
โŒ Static bindings Can't adapt to user behavior โœ… AI-powered dynamic remapping
โŒ Single modality One type of input only โœ… Hand + Face + Voice hybrid
โŒ No spatial awareness 2D only, no depth โœ… 3D landmark tracking (x, y, z)
โŒ Not immersive Breaks gaming flow โœ… Gaming mode with spatial controls
โŒ Device dependency Breaks in hardware failure โœ… Camera-only input fallback

๐Ÿง  System Architecture

High-Level System Design

flowchart TD
    subgraph INPUT["๐Ÿ“ก INPUT LAYER"]
        CAM["๐ŸŽฅ Webcam\n(60fps Stream)"]
        MIC["๐ŸŽ™๏ธ Microphone\n(Voice Input)"]
    end

    subgraph VISION["๐Ÿง  VISION PROCESSING LAYER"]
        MP_HANDS["MediaPipe Hands\n21 Landmarks"]
        MP_FACE["MediaPipe FaceMesh\n468 Landmarks"]
        MP_POSE["MediaPipe Pose\n33 Landmarks"]
        FRAME["Frame Preprocessor\n(Canvas API)"]
    end

    subgraph ENGINE["โš™๏ธ GESTURE ENGINE"]
        GR["Gesture Recognizer\n(Pattern Matching)"]
        AI["Gemini AI Reasoner\n(Context Aware)"]
        FILTER["Kalman Filter\n(Noise Reduction)"]
        BUFFER["Temporal Buffer\n(30-frame window)"]
    end

    subgraph CUSTOM["๐ŸŽ›๏ธ CUSTOMIZATION LAYER"]
        CONFIG["JSON Config Engine"]
        PROFILES["User Profile Manager"]
        MAPPER["Action Mapper"]
    end

    subgraph OUTPUT["๐Ÿ’ป OUTPUT LAYER"]
        CURSOR["Cursor Control\n(Mouse API)"]
        KEYBOARD["Keyboard Events\n(Key Simulation)"]
        VOLUME["System Volume\n(OS Control)"]
        SCROLL["Scroll Engine"]
        SHORTCUTS["Custom Shortcuts"]
    end

    subgraph DASHBOARD["๐Ÿ“Š REACT DASHBOARD"]
        VIZ["Landmark Visualizer"]
        STATS["Real-time Stats"]
        LOG["Action Log"]
        SETTINGS["Settings Panel"]
    end

    CAM --> FRAME
    MIC --> AI
    FRAME --> MP_HANDS
    FRAME --> MP_FACE
    FRAME --> MP_POSE

    MP_HANDS --> GR
    MP_FACE --> GR
    MP_POSE --> GR

    GR --> FILTER
    FILTER --> BUFFER
    BUFFER --> AI
    AI --> MAPPER

    CONFIG --> MAPPER
    PROFILES --> MAPPER

    MAPPER --> CURSOR
    MAPPER --> KEYBOARD
    MAPPER --> VOLUME
    MAPPER --> SCROLL
    MAPPER --> SHORTCUTS

    MAPPER --> VIZ
    MAPPER --> STATS
    MAPPER --> LOG
    SETTINGS --> CONFIG
Loading

Component Interaction Diagram

C4Context
    title HandMatrix โ€” Component Interaction Overview

    Person(user, "User", "Moves hands/face in front of camera")
    
    System_Boundary(handmatrix, "HandMatrix Neural Engine") {
        Component(webcam, "Webcam Module", "Browser Media API", "Captures real-time video stream")
        Component(mediapipe, "MediaPipe Engine", "WASM + TFLite", "Detects 21+468+33 landmarks")
        Component(gesture, "Gesture Classifier", "Custom Algorithm", "Interprets landmark patterns")
        Component(ai, "Gemini AI Layer", "Google GenAI SDK", "Contextual reasoning & adaptation")
        Component(mapper, "Action Mapper", "TypeScript", "Maps gestures to OS actions")
        Component(dashboard, "React Dashboard", "React 19 + Vite", "Real-time UI visualization")
    }

    System_Ext(os, "Operating System", "macOS/Windows/Linux")
    System_Ext(gemini, "Gemini API", "Google Cloud AI")

    Rel(user, webcam, "Performs gestures")
    Rel(webcam, mediapipe, "Raw frames")
    Rel(mediapipe, gesture, "Landmark data")
    Rel(gesture, ai, "Pattern context")
    Rel(ai, gemini, "API calls")
    Rel(ai, mapper, "Classified gesture")
    Rel(mapper, os, "System events")
    Rel(mapper, dashboard, "Live data stream")
    Rel(dashboard, user, "Visual feedback")
Loading

โš™๏ธ Tech Stack

Full-Stack Technology Overview

graph LR
    subgraph FE["๐Ÿ–ฅ๏ธ Frontend"]
        R["React 19"]
        TS["TypeScript 5.8"]
        TW["Tailwind CSS 4"]
        VT["Vite 6.2"]
        LR["Lucide React Icons"]
        MO["Motion (Framer)"]
    end

    subgraph CV["๐Ÿ‘๏ธ Computer Vision"]
        MH["MediaPipe Hands\n(21 landmarks)"]
        MF["MediaPipe FaceMesh\n(468 landmarks)"]
        MC["MediaPipe Camera Utils"]
        MD["MediaPipe Drawing Utils"]
        MTV["MediaPipe Tasks Vision"]
    end

    subgraph AI["๐Ÿค– AI Layer"]
        GA["@google/genai\nGemini 1.5 Flash"]
    end

    subgraph SYS["โš™๏ธ System Layer"]
        PY["Python Backend (optional)"]
        PAG["PyAutoGUI"]
        PN["pynput"]
        EX["Express.js API"]
        DOT["dotenv"]
    end

    subgraph TOOLS["๐Ÿ› ๏ธ Dev Tools"]
        TSX["tsx (TS runner)"]
        ESL["ESLint"]
        SHD["Shadcn UI"]
        CVA["class-variance-authority"]
    end
Loading

Dependency Table

Category Package Version Purpose
Core react ^19.0.0 UI framework
Core react-dom ^19.0.0 DOM rendering
Core typescript ~5.8.2 Type safety
Build vite ^6.2.0 Dev server + bundler
CV @mediapipe/hands ^0.4.1675469240 Hand landmark detection
CV @mediapipe/face_mesh ^0.4.1633559619 Face landmark detection
CV @mediapipe/tasks-vision ^0.10.34 Unified vision tasks
CV @mediapipe/camera_utils ^0.3.1675466862 Camera stream control
CV @mediapipe/drawing_utils ^0.3.1675466124 Canvas rendering
AI @google/genai ^1.29.0 Gemini AI integration
UI tailwindcss ^4.1.14 Utility-first styling
UI lucide-react ^0.546.0 Icon library
UI motion ^12.23.24 Animation engine
UI shadcn ^4.2.0 Component library
API express ^4.21.2 Backend REST server
Util clsx ^2.1.1 Conditional classnames
Util dotenv ^17.2.3 Environment variables

โœ‹ Gesture Library

Complete Gesture-to-Action Mapping

flowchart LR
    subgraph HAND_SINGLE["โœ‹ Single Hand Gestures"]
        G1["โ˜๏ธ Index Up\nโ†’ Move Cursor"]
        G2["๐Ÿค Pinch\nโ†’ Left Click"]
        G3["โœŒ๏ธ Two Fingers\nโ†’ Scroll"]
        G4["โœ‹ Open Palm\nโ†’ Pause/Stop"]
        G5["๐Ÿ‘Š Fist\nโ†’ Drag"]
        G6["๐ŸคŸ Three Fingers\nโ†’ Right Click"]
        G7["๐Ÿ–๏ธ All Fingers\nโ†’ Screenshot"]
    end

    subgraph HAND_DUAL["๐Ÿคš Two-Hand Gestures"]
        G8["โ†”๏ธ Spread Apart\nโ†’ Zoom In"]
        G9["๐Ÿ” Both Pinch\nโ†’ Zoom Out"]
        G10["โ†•๏ธ Vertical Spread\nโ†’ Volume Up/Down"]
        G11["๐Ÿ”„ Rotate Hands\nโ†’ Rotate Screen"]
        G12["๐Ÿ‘ Both Open\nโ†’ Fullscreen"]
    end

    subgraph FACE["๐Ÿ‘ค Face Gestures"]
        G13["โ†•๏ธ Head Tilt\nโ†’ Scroll Page"]
        G14["๐Ÿ‘๏ธ Single Blink\nโ†’ Left Click"]
        G15["๐Ÿ‘€ Double Blink\nโ†’ Right Click"]
        G16["๐Ÿ˜ฎ Mouth Open\nโ†’ Play/Pause"]
        G17["โ†”๏ธ Head Turn\nโ†’ Next/Prev Tab"]
    end
Loading

Landmark Reference Map

graph TD
    subgraph HAND_LANDMARKS["Hand โ€” 21 Landmark Points"]
        WRIST["0: Wrist"]
        THUMB["1-4: Thumb MCPโ†’Tip"]
        INDEX["5-8: Index MCPโ†’Tip"]
        MIDDLE["9-12: Middle MCPโ†’Tip"]
        RING["13-16: Ring MCPโ†’Tip"]
        PINKY["17-20: Pinky MCPโ†’Tip"]
        WRIST --> THUMB
        WRIST --> INDEX
        WRIST --> MIDDLE
        WRIST --> RING
        WRIST --> PINKY
    end
Loading

Detection Logic Table

Gesture Landmarks Used Condition Confidence Threshold
Index Pointing L5โ€“L8 Only index finger extended > 0.85
Pinch L4 + L8 Distance thumb-tip to index-tip < 30px > 0.90
Two Finger Scroll L8 + L12 Index + middle extended, others closed > 0.80
Open Palm L4,8,12,16,20 All fingertips above MCP nodes > 0.75
Fist L4,8,12,16,20 All fingertips below MCP nodes > 0.80
Zoom In/Out Both L8s Inter-hand distance delta > 0.70
Head Tilt Face L10,152 Roll angle > ยฑ15ยฐ > 0.85
Blink Eye L159,145 Eye aspect ratio < 0.25 > 0.90
Mouth Open Face L13,14 Mouth aspect ratio > 0.50 > 0.80

๐Ÿ” Data Flow Pipeline

Frame-to-Action Signal Chain

sequenceDiagram
    autonumber
    participant CAM as ๐ŸŽฅ Camera
    participant CANVAS as ๐Ÿ–ผ๏ธ Canvas API
    participant MP as ๐Ÿง  MediaPipe
    participant FILTER as ๐Ÿ“ Kalman Filter
    participant BUFFER as ๐Ÿ’พ Temporal Buffer
    participant GE as โš™๏ธ Gesture Engine
    participant AI as ๐Ÿค– Gemini AI
    participant MAPPER as ๐Ÿ—บ๏ธ Action Mapper
    participant OS as ๐Ÿ’ป OS / Browser
    participant UI as ๐Ÿ“Š React Dashboard

    CAM->>CANVAS: Raw video frame (60fps)
    CANVAS->>MP: Preprocessed image bitmap
    MP->>MP: Run TFLite inference (Hands + Face)
    MP-->>FILTER: 21+468 raw landmark coordinates (x,y,z)
    FILTER->>FILTER: Smooth noise with Kalman equations
    FILTER->>BUFFER: Stabilized landmark positions
    BUFFER->>GE: 30-frame window of landmarks
    GE->>GE: Pattern match against gesture templates
    GE->>AI: Ambiguous gesture context (optional)
    AI->>AI: Gemini classifies intent from context
    AI-->>MAPPER: Resolved gesture label + confidence
    MAPPER->>MAPPER: Lookup JSON binding config
    MAPPER->>OS: Dispatch mouse/keyboard/system event
    MAPPER->>UI: Push landmark + action data (WebSocket)
    UI-->>CAM: User sees feedback overlay
Loading

Latency Budget

gantt
    title Frame Processing Latency Budget (Target: <50ms)
    dateFormat  X
    axisFormat  %Lms

    section Camera Capture
    Frame Acquisition       :0, 5

    section Vision Processing
    Canvas Preprocessing    :5, 8
    MediaPipe Inference     :8, 28

    section Gesture Engine
    Kalman Filtering        :28, 32
    Pattern Matching        :32, 38
    AI Reasoning (cached)   :38, 42

    section Output
    Action Dispatch         :42, 45
    UI Update               :45, 50
Loading

๐Ÿ“ฆ Module Breakdown

Responsibility Matrix

graph TB
    subgraph CORE["Core Modules"]
        HT["HandTracker\nโ€ข Initializes MediaPipe Hands\nโ€ข Manages landmark stream\nโ€ข Handles multi-hand detection"]
        FT["FaceTracker\nโ€ข FaceMesh initialization\nโ€ข Eye/mouth ratio calc\nโ€ข Head pose estimation"]
        GE["GestureEngine\nโ€ข Pattern recognition\nโ€ข Temporal smoothing\nโ€ข Confidence scoring"]
    end

    subgraph PROCESSING["Processing Modules"]
        KF["KalmanFilter\nโ€ข Noise reduction\nโ€ข Position smoothing\nโ€ข Velocity estimation"]
        TB["TemporalBuffer\nโ€ข 30-frame sliding window\nโ€ข Gesture onset detection\nโ€ข Hold duration tracking"]
        GC["GestureClassifier\nโ€ข Template matching\nโ€ข Threshold comparison\nโ€ข Multi-label output"]
    end

    subgraph INTEGRATION["Integration Modules"]
        AM["ActionMapper\nโ€ข Reads JSON bindings\nโ€ข Maps gesture โ†’ event\nโ€ข Debounce management"]
        AI["GeminiAdapter\nโ€ข Ambiguity resolution\nโ€ข Context reasoning\nโ€ข Adaptive learning"]
        WS["WebSocket Bridge\nโ€ข React โ†” Engine comm\nโ€ข Event streaming\nโ€ข State sync"]
    end

    subgraph UI_MODS["UI Modules"]
        LD["LandmarkDrawer\nโ€ข Canvas overlay\nโ€ข Skeleton rendering\nโ€ข Debug visualization"]
        DB["Dashboard\nโ€ข Live stats\nโ€ข Mode switcher\nโ€ข Profile editor"]
        LOG["ActionLogger\nโ€ข Event timeline\nโ€ข Confidence log\nโ€ข Export to JSON"]
    end

    HT --> GE
    FT --> GE
    GE --> KF
    KF --> TB
    TB --> GC
    GC --> AM
    GC --> AI
    AI --> AM
    AM --> WS
    WS --> DB
    WS --> LD
    WS --> LOG
Loading

๐Ÿ“ Project Structure

handmatrix-neural-engine/
โ”‚
โ”œโ”€โ”€ ๐Ÿ“„ index.html                    # App entry point
โ”œโ”€โ”€ ๐Ÿ“„ package.json                  # Dependencies & scripts
โ”œโ”€โ”€ ๐Ÿ“„ vite.config.ts                # Vite + Tailwind config
โ”œโ”€โ”€ ๐Ÿ“„ tsconfig.json                 # TypeScript config
โ”œโ”€โ”€ ๐Ÿ“„ components.json               # Shadcn component registry
โ”œโ”€โ”€ ๐Ÿ“„ metadata.json                 # Project metadata
โ”œโ”€โ”€ ๐Ÿ“„ .env.example                  # Environment variable template
โ”œโ”€โ”€ ๐Ÿ“„ .gitignore
โ”‚
โ”œโ”€โ”€ ๐Ÿ“ src/
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ main.tsx                  # React app bootstrap
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ App.tsx                   # Root component (41KB โ€” core engine)
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ index.css                 # Global styles + Tailwind layers
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ ๐Ÿ“ components/               # React UI Components
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ LandmarkOverlay.tsx   # Canvas-based landmark renderer
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ GestureLog.tsx        # Real-time action event log
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ ModeSelector.tsx      # Cursor/Gaming/Media mode UI
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ SettingsPanel.tsx     # Gesture mapping configurator
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ StatsDashboard.tsx    # Performance metrics display
โ”‚   โ”‚   โ””โ”€โ”€ ๐Ÿ“„ ProfileManager.tsx    # User profile CRUD
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ ๐Ÿ“ lib/                      # Core engine library
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ gesture-engine.ts     # Pattern matching core
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ kalman-filter.ts      # Noise smoothing algorithm
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ action-mapper.ts      # Gesture โ†’ OS action dispatch
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ gemini-adapter.ts     # Gemini AI integration layer
โ”‚   โ”‚   โ””โ”€โ”€ ๐Ÿ“„ utils.ts              # Shared utilities
โ”‚   โ”‚
โ”œโ”€โ”€ ๐Ÿ“ components/                   # Shadcn UI components
โ”‚   โ””โ”€โ”€ ๐Ÿ“„ ui/                       # Button, Card, Dialog, etc.
โ”‚
โ””โ”€โ”€ ๐Ÿ“ lib/                          # Shared non-src libraries

โšก Installation

Prerequisites

Requirement Minimum Version Recommended
Node.js 18.0 20+ LTS
npm 9.0 10+
Browser Chrome 90+ Chrome 120+
Camera 720p 1080p 60fps
CPU 4 cores 8 cores
RAM 4GB 8GB+

Step-by-Step Setup

# 1. Clone the repository
git clone https://github.com/rishvinreddy/handmatrix-neural-engine.git
cd handmatrix-neural-engine

# 2. Install dependencies
npm install

# 3. Set up environment variables
cp .env.example .env
# Edit .env and add your Gemini API key:
# VITE_GEMINI_API_KEY=your_gemini_api_key_here

# 4. Start the development server
npm run dev
# โ†’ Runs at http://localhost:3000

Environment Variables

# .env.example
VITE_GEMINI_API_KEY=          # Required: Google Gemini AI API key
VITE_MODEL_NAME=gemini-1.5-flash   # AI model to use
VITE_DETECTION_CONFIDENCE=0.8      # MediaPipe detection threshold
VITE_TRACKING_CONFIDENCE=0.7       # MediaPipe tracking threshold
VITE_MAX_HANDS=2                   # Max simultaneous hands tracked
VITE_CAMERA_FPS=60                 # Target camera frame rate

Deployment Flow

flowchart LR
    DEV["๐Ÿ‘จโ€๐Ÿ’ป Development\nnpm run dev\nlocalhost:3000"] 
    --> BUILD["๐Ÿ“ฆ Production Build\nnpm run build\ndist/ folder"]
    --> PREVIEW["๐Ÿ” Preview\nnpm run preview"]
    --> DEPLOY["๐Ÿš€ Deploy\nGitHub Pages / Vercel / Netlify"]

    style DEV fill:#1e293b,color:#60a5fa
    style BUILD fill:#1e293b,color:#a78bfa
    style PREVIEW fill:#1e293b,color:#34d399
    style DEPLOY fill:#1e293b,color:#fb923c
Loading

๐Ÿงช Modes & Profiles

Control Mode State Machine

stateDiagram-v2
    [*] --> IDLE : App Launch

    IDLE --> CURSOR_MODE : Mode Select (Default)
    IDLE --> GAMING_MODE : Press G
    IDLE --> MEDIA_MODE : Press M
    IDLE --> ACCESSIBILITY_MODE : Press A
    IDLE --> CUSTOM_MODE : Press C

    CURSOR_MODE --> GAMING_MODE : Gesture Switch
    GAMING_MODE --> CURSOR_MODE : Gesture Switch
    MEDIA_MODE --> CURSOR_MODE : Gesture Switch
    ACCESSIBILITY_MODE --> CURSOR_MODE : Gesture Switch

    CURSOR_MODE --> IDLE : Pause Gesture
    GAMING_MODE --> IDLE : Pause Gesture
    MEDIA_MODE --> IDLE : Pause Gesture

    state CURSOR_MODE {
        [*] --> tracking
        tracking --> clicking
        clicking --> scrolling
        scrolling --> tracking
    }

    state GAMING_MODE {
        [*] --> wasd_control
        wasd_control --> action_triggers
        action_triggers --> camera_look
    }

    state MEDIA_MODE {
        [*] --> playback
        playback --> volume
        volume --> seek
    }
Loading

Mode Feature Matrix

Feature ๐Ÿ–ฑ๏ธ Cursor Mode ๐ŸŽฎ Gaming Mode ๐ŸŽต Media Mode โ™ฟ Accessibility Mode
Cursor Movement โœ… โŒ โŒ โœ…
Click (Pinch) โœ… โŒ โŒ โœ…
Scroll โœ… โŒ โŒ โœ…
WASD Keys โŒ โœ… โŒ โŒ
Jump (Open Palm) โŒ โœ… โŒ โŒ
Attack (Fist) โŒ โœ… โŒ โŒ
Volume Control โŒ โŒ โœ… โœ…
Play/Pause โŒ โŒ โœ… โœ…
Track Seek โŒ โŒ โœ… โŒ
Face Control โœ… โœ… โœ… โœ…
Blink Click โŒ โŒ โŒ โœ…
Dwell Select โŒ โŒ โŒ โœ…

โš™๏ธ Customization Engine

Configuration Architecture

flowchart TD
    subgraph INPUT_CONFIG["Configuration Sources"]
        DEF["Default Config\n(Built-in templates)"]
        USR["User Profile\n(JSON in localStorage)"]
        CLOUD["Cloud Sync\n(Future: Firebase)"]
    end

    subgraph MERGE["Config Merge Engine"]
        PRI["Priority Resolver\n(User > Default)"]
        VAL["Schema Validator\n(Zod)"]
        CACHE["Config Cache\n(In-memory)"]
    end

    subgraph RUNTIME["Runtime Layer"]
        MAP["Action Mapper"]
        DEBOUND["Debounce Controller"]
        SENS["Sensitivity Scaler"]
    end

    DEF --> PRI
    USR --> PRI
    CLOUD --> PRI
    PRI --> VAL
    VAL --> CACHE
    CACHE --> MAP
    CACHE --> DEBOUND
    CACHE --> SENS
Loading

Example Config JSON

{
  "profile": "Default",
  "version": "1.0.0",
  "mode": "cursor",
  "sensitivity": {
    "cursor_speed": 1.5,
    "scroll_speed": 2.0,
    "gesture_confidence": 0.80,
    "debounce_ms": 150
  },
  "gesture_bindings": {
    "pinch": "left_click",
    "three_fingers": "right_click",
    "two_fingers_up": "scroll_up",
    "two_fingers_down": "scroll_down",
    "open_palm": "pause_control",
    "fist": "drag_start",
    "spread_both_hands": "zoom_in",
    "pinch_both_hands": "zoom_out",
    "head_tilt_right": "next_tab",
    "head_tilt_left": "prev_tab",
    "single_blink": "left_click",
    "mouth_open": "play_pause"
  },
  "face_control": {
    "enabled": true,
    "head_tilt_threshold_degrees": 15,
    "blink_ear_threshold": 0.25,
    "mouth_mar_threshold": 0.50
  }
}

๐Ÿ“Š Performance Metrics

System Performance Targets

Metric Target Acceptable Poor
Frame Processing Time < 16ms < 33ms > 50ms
End-to-End Latency < 50ms < 100ms > 200ms
Gesture Accuracy > 95% > 85% < 75%
False Positive Rate < 2% < 5% > 10%
CPU Usage (idle) < 20% < 40% > 60%
RAM Footprint < 200MB < 400MB > 600MB
Camera FPS 60fps 30fps < 15fps
Landmark Detect/sec > 60 > 30 < 15

Feature Distribution

pie title HandMatrix โ€” Module Size Distribution
    "Gesture Engine & AI" : 35
    "MediaPipe Integration" : 20
    "React Dashboard UI" : 18
    "Action Mapper & OS Control" : 12
    "Customization System" : 10
    "Utilities & Config" : 5
Loading

Accuracy by Gesture Category

xychart-beta
    title "Gesture Recognition Accuracy by Category (%)"
    x-axis ["Pinch", "Open Palm", "Fist", "Two Finger", "Head Tilt", "Blink", "Dual Hand"]
    y-axis "Accuracy (%)" 0 --> 100
    bar [97, 94, 91, 93, 88, 90, 85]
    line [97, 94, 91, 93, 88, 90, 85]
Loading

โš ๏ธ Challenges & Solutions

Risk Matrix

quadrantChart
    title Risk vs Impact Matrix
    x-axis "Low Likelihood" --> "High Likelihood"
    y-axis "Low Impact" --> "High Impact"

    quadrant-1 Critical Risks
    quadrant-2 High Impact / Low Likelihood
    quadrant-3 Low Priority
    quadrant-4 Monitor

    Lighting Variance: [0.85, 0.75]
    Camera Quality: [0.60, 0.65]
    CPU Overload: [0.55, 0.80]
    False Positives: [0.70, 0.70]
    Gesture Ambiguity: [0.75, 0.60]
    API Rate Limits: [0.30, 0.55]
    Browser Compat: [0.40, 0.50]
Loading

Mitigation Strategies

flowchart LR
    subgraph PROBLEMS["โš ๏ธ Known Challenges"]
        P1["Poor Lighting"]
        P2["Gesture Ambiguity"]
        P3["CPU Performance"]
        P4["Multi-Hand Conflict"]
        P5["False Positives"]
    end

    subgraph SOLUTIONS["โœ… Implemented Solutions"]
        S1["Adaptive brightness\nnormalization in preprocessing"]
        S2["Temporal smoothing +\nGemini AI disambiguation"]
        S3["Web Workers for\noff-thread inference"]
        S4["Priority queue +\nprimary hand dominance"]
        S5["Debounce engine +\nconfidence gating"]
    end

    P1 --> S1
    P2 --> S2
    P3 --> S3
    P4 --> S4
    P5 --> S5
Loading
Challenge Root Cause Mitigation Status
Lighting sensitivity MediaPipe relies on contrast Histogram equalization + brightness normalization โœ… Implemented
Gesture ambiguity Similar landmark configs Temporal buffer + Gemini reasoning โœ… Implemented
CPU bottleneck WASM inference on main thread Offload to Web Worker ๐Ÿ”„ In Progress
Jitter/tremor Raw coordinates noisy Kalman filter smoothing โœ… Implemented
False positives Unintentional gestures Debounce + hold duration gating โœ… Implemented
Multi-hand conflict Two hands competing Dominant hand priority system โœ… Implemented
Camera permission Browser security model Graceful degradation UI โœ… Implemented

๐Ÿ”ฎ Roadmap

Development Timeline

gantt
    title HandMatrix Neural Engine โ€” Development Roadmap
    dateFormat  YYYY-MM
    axisFormat  %b %Y

    section Phase 1 โ€” MVP (Complete)
    Single-hand gesture detection     :done, p1a, 2025-10, 2025-11
    Cursor + click control            :done, p1b, 2025-11, 2025-12
    React dashboard foundation        :done, p1c, 2025-11, 2025-12

    section Phase 2 โ€” Enhanced (Complete)
    Two-hand gesture support          :done, p2a, 2025-12, 2026-01
    Face mesh integration             :done, p2b, 2026-01, 2026-02
    Gemini AI disambiguation          :done, p2c, 2026-01, 2026-02
    JSON customization engine         :done, p2d, 2026-02, 2026-03

    section Phase 3 โ€” Pro (In Progress)
    Gaming mode keybindings          :active, p3a, 2026-03, 2026-05
    User profile management          :active, p3b, 2026-03, 2026-05
    Web Worker performance           :p3c, 2026-04, 2026-06
    accessibility mode               :p3d, 2026-05, 2026-07

    section Phase 4 โ€” Future
    Voice + gesture hybrid            :p4a, 2026-07, 2026-09
    AI gesture learning               :p4b, 2026-08, 2026-10
    Cloud profile sync                :p4c, 2026-09, 2026-11
    Mobile / AR/VR support           :p4d, 2026-10, 2027-01
Loading

Feature Versioning

Version Features Status
v1.0 Single-hand cursor + click, basic dashboard โœ… Released
v1.5 Two-hand gestures, face control โœ… Released
v2.0 Gemini AI, customization engine, modes โœ… Released
v2.5 Gaming mode, user profiles, Web Workers ๐Ÿ”„ In Progress
v3.0 Voice hybrid, AI learning, mobile ๐Ÿ“… Planned
v4.0 AR/VR integration, cloud sync ๐Ÿ”ฎ Future

๐Ÿงฉ Use Cases

mindmap
  root((HandMatrix\nUse Cases))
    Accessibility
      Motor-impaired users
      Post-surgery recovery
      ALS/Parkinson's patients
    Professional
      Surgical theater control
      Clean-room environments
      Industrial control panels
    Entertainment
      PC gaming
      VR navigation
      Live performance
    Education
      Touchless presentations
      Interactive whiteboards
      Remote teaching
    Smart Home
      Gesture-based IoT
      TV/media control
      Lighting control
    Health & Fitness
      Hands-free workout tracking
      Rehab exercise tracking
Loading

๐Ÿ†š Competitive Comparison

Feature HandMatrix Leap Motion Kinect Eye Gaze Trackers
Hardware Required โŒ Camera only โœ… Special device โœ… Special device โœ… Special device
Cost $0 $79+ Discontinued $500+
AI Disambiguation โœ… Gemini AI โŒ โŒ โŒ
Face Control โœ… โŒ โœ… Limited โŒ
Custom Bindings โœ… JSON Config โœ… Limited โŒ โŒ
Browser Native โœ… WebApp โŒ Desktop only โŒ Desktop only โŒ
Open Source โœ… MIT โŒ โŒ โŒ
Accessibility Mode โœ… โŒ โœ… Limited โœ…
Frames Per Second 60fps 200fps 30fps 60fps
Setup Complexity โญ Simple โญโญ Medium โญโญโญ Complex โญโญโญ Complex

๐Ÿง  AI Integration

Gemini AI Role in HandMatrix

flowchart TB
    subgraph INPUTS["Gemini Input Context"]
        LM["Landmark sequence\n(last 10 frames)"]
        MODE["Current mode\n(cursor/gaming/media)"]
        HIST["Action history\n(last 5 actions)"]
        ENV["Environment state\n(active app, time)"]
    end

    GEMINI["๐Ÿค– Gemini 1.5 Flash\nAI Reasoning Engine"]

    subgraph OUTPUTS["Gemini Decisions"]
        CLASSIFY["Gesture classification\n(high-confidence)"]
        RESOLVE["Ambiguity resolution\n(similar gestures)"]
        SUGGEST["Adaptive suggestions\n(new mappings)"]
        EXPLAIN["Natural language\nexplanation to user"]
    end

    LM --> GEMINI
    MODE --> GEMINI
    HIST --> GEMINI
    ENV --> GEMINI

    GEMINI --> CLASSIFY
    GEMINI --> RESOLVE
    GEMINI --> SUGGEST
    GEMINI --> EXPLAIN
Loading

๐Ÿ“œ Learning Outcomes

Domain Skills Developed
Computer Vision MediaPipe WASM, landmark extraction, OpenCV normalization
AI Engineering Gemini API integration, prompt engineering, context management
Real-Time Systems Frame-perfect processing, Kalman filtering, temporal buffers
System Design Multi-modal fusion, event-driven architecture, state machines
HCI Accessibility design, gesture UX, feedback loops
Frontend React 19, TypeScript 5.8, Vite, Tailwind, Canvas API
Performance Web Workers, WASM optimization, 60fps rendering
Product User profiling, mode systems, customization engines

๐Ÿค Contributing

flowchart LR
    FORK["๐Ÿด Fork Repo"] 
    --> CLONE["๐Ÿ“ฅ Clone Locally"]
    --> BRANCH["๐ŸŒฟ Create Feature Branch\ngit checkout -b feat/your-feature"]
    --> CODE["๐Ÿ‘จโ€๐Ÿ’ป Write Code\n+ Tests"]
    --> COMMIT["๐Ÿ“ Conventional Commit\nfeat: add new gesture"]
    --> PUSH["๐Ÿ“ค Push Branch"]
    --> PR["๐Ÿ”€ Open Pull Request\nwith description"]
    --> REVIEW["๐Ÿ‘€ Code Review"]
    --> MERGE["โœ… Merged!"]
Loading

Contribution Guidelines

  • Follow Conventional Commits (feat:, fix:, docs:, perf:)
  • Write TypeScript โ€” no any types allowed
  • Add JSDoc comments to all exported functions
  • Test gestures manually before submitting PR
  • Keep PR scope small and focused

๐Ÿ‘จโ€๐Ÿ’ป Author

Rishvin Reddy
B.Tech CSE (BIC) ยท Woxsen University

Portfolio GitHub LinkedIn


๐Ÿ“œ License

MIT License

Copyright (c) 2026 Rishvin Reddy

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, subject to the following conditions: ...

โญ HandMatrix is not just a project โ€”

It is the future of how humans interact with machines.
Touch becomes optional. Intention becomes the interface.


If this project inspires you:

Star Fork Issues

Built with โค๏ธ by Rishvin Reddy ยท Woxsen University ยท 2026

About

HandMatrix is a real-time AI-based hand gesture control system built using computer vision, enabling touchless interaction with computers, applications, and games.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors