You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HandMatrix Neural Engine is a production-grade, AI-powered multi-modal gesture control system that enables users to interact with computers and digital environments entirely through natural movement โ no physical input devices required.
It combines Google MediaPipe's landmark detection, Gemini AI's reasoning layer, and a React + TypeScript frontend dashboard to deliver a fully customizable, real-time touchless interaction engine.
graph LR
subgraph FE["๐ฅ๏ธ Frontend"]
R["React 19"]
TS["TypeScript 5.8"]
TW["Tailwind CSS 4"]
VT["Vite 6.2"]
LR["Lucide React Icons"]
MO["Motion (Framer)"]
end
subgraph CV["๐๏ธ Computer Vision"]
MH["MediaPipe Hands\n(21 landmarks)"]
MF["MediaPipe FaceMesh\n(468 landmarks)"]
MC["MediaPipe Camera Utils"]
MD["MediaPipe Drawing Utils"]
MTV["MediaPipe Tasks Vision"]
end
subgraph AI["๐ค AI Layer"]
GA["@google/genai\nGemini 1.5 Flash"]
end
subgraph SYS["โ๏ธ System Layer"]
PY["Python Backend (optional)"]
PAG["PyAutoGUI"]
PN["pynput"]
EX["Express.js API"]
DOT["dotenv"]
end
subgraph TOOLS["๐ ๏ธ Dev Tools"]
TSX["tsx (TS runner)"]
ESL["ESLint"]
SHD["Shadcn UI"]
CVA["class-variance-authority"]
end
Loading
Dependency Table
Category
Package
Version
Purpose
Core
react
^19.0.0
UI framework
Core
react-dom
^19.0.0
DOM rendering
Core
typescript
~5.8.2
Type safety
Build
vite
^6.2.0
Dev server + bundler
CV
@mediapipe/hands
^0.4.1675469240
Hand landmark detection
CV
@mediapipe/face_mesh
^0.4.1633559619
Face landmark detection
CV
@mediapipe/tasks-vision
^0.10.34
Unified vision tasks
CV
@mediapipe/camera_utils
^0.3.1675466862
Camera stream control
CV
@mediapipe/drawing_utils
^0.3.1675466124
Canvas rendering
AI
@google/genai
^1.29.0
Gemini AI integration
UI
tailwindcss
^4.1.14
Utility-first styling
UI
lucide-react
^0.546.0
Icon library
UI
motion
^12.23.24
Animation engine
UI
shadcn
^4.2.0
Component library
API
express
^4.21.2
Backend REST server
Util
clsx
^2.1.1
Conditional classnames
Util
dotenv
^17.2.3
Environment variables
โ Gesture Library
Complete Gesture-to-Action Mapping
flowchart LR
subgraph HAND_SINGLE["โ Single Hand Gestures"]
G1["โ๏ธ Index Up\nโ Move Cursor"]
G2["๐ค Pinch\nโ Left Click"]
G3["โ๏ธ Two Fingers\nโ Scroll"]
G4["โ Open Palm\nโ Pause/Stop"]
G5["๐ Fist\nโ Drag"]
G6["๐ค Three Fingers\nโ Right Click"]
G7["๐๏ธ All Fingers\nโ Screenshot"]
end
subgraph HAND_DUAL["๐ค Two-Hand Gestures"]
G8["โ๏ธ Spread Apart\nโ Zoom In"]
G9["๐ Both Pinch\nโ Zoom Out"]
G10["โ๏ธ Vertical Spread\nโ Volume Up/Down"]
G11["๐ Rotate Hands\nโ Rotate Screen"]
G12["๐ Both Open\nโ Fullscreen"]
end
subgraph FACE["๐ค Face Gestures"]
G13["โ๏ธ Head Tilt\nโ Scroll Page"]
G14["๐๏ธ Single Blink\nโ Left Click"]
G15["๐ Double Blink\nโ Right Click"]
G16["๐ฎ Mouth Open\nโ Play/Pause"]
G17["โ๏ธ Head Turn\nโ Next/Prev Tab"]
end
Loading
Landmark Reference Map
graph TD
subgraph HAND_LANDMARKS["Hand โ 21 Landmark Points"]
WRIST["0: Wrist"]
THUMB["1-4: Thumb MCPโTip"]
INDEX["5-8: Index MCPโTip"]
MIDDLE["9-12: Middle MCPโTip"]
RING["13-16: Ring MCPโTip"]
PINKY["17-20: Pinky MCPโTip"]
WRIST --> THUMB
WRIST --> INDEX
WRIST --> MIDDLE
WRIST --> RING
WRIST --> PINKY
end
Loading
Detection Logic Table
Gesture
Landmarks Used
Condition
Confidence Threshold
Index Pointing
L5โL8
Only index finger extended
> 0.85
Pinch
L4 + L8
Distance thumb-tip to index-tip < 30px
> 0.90
Two Finger Scroll
L8 + L12
Index + middle extended, others closed
> 0.80
Open Palm
L4,8,12,16,20
All fingertips above MCP nodes
> 0.75
Fist
L4,8,12,16,20
All fingertips below MCP nodes
> 0.80
Zoom In/Out
Both L8s
Inter-hand distance delta
> 0.70
Head Tilt
Face L10,152
Roll angle > ยฑ15ยฐ
> 0.85
Blink
Eye L159,145
Eye aspect ratio < 0.25
> 0.90
Mouth Open
Face L13,14
Mouth aspect ratio > 0.50
> 0.80
๐ Data Flow Pipeline
Frame-to-Action Signal Chain
sequenceDiagram
autonumber
participant CAM as ๐ฅ Camera
participant CANVAS as ๐ผ๏ธ Canvas API
participant MP as ๐ง MediaPipe
participant FILTER as ๐ Kalman Filter
participant BUFFER as ๐พ Temporal Buffer
participant GE as โ๏ธ Gesture Engine
participant AI as ๐ค Gemini AI
participant MAPPER as ๐บ๏ธ Action Mapper
participant OS as ๐ป OS / Browser
participant UI as ๐ React Dashboard
CAM->>CANVAS: Raw video frame (60fps)
CANVAS->>MP: Preprocessed image bitmap
MP->>MP: Run TFLite inference (Hands + Face)
MP-->>FILTER: 21+468 raw landmark coordinates (x,y,z)
FILTER->>FILTER: Smooth noise with Kalman equations
FILTER->>BUFFER: Stabilized landmark positions
BUFFER->>GE: 30-frame window of landmarks
GE->>GE: Pattern match against gesture templates
GE->>AI: Ambiguous gesture context (optional)
AI->>AI: Gemini classifies intent from context
AI-->>MAPPER: Resolved gesture label + confidence
MAPPER->>MAPPER: Lookup JSON binding config
MAPPER->>OS: Dispatch mouse/keyboard/system event
MAPPER->>UI: Push landmark + action data (WebSocket)
UI-->>CAM: User sees feedback overlay
# 1. Clone the repository
git clone https://github.com/rishvinreddy/handmatrix-neural-engine.git
cd handmatrix-neural-engine
# 2. Install dependencies
npm install
# 3. Set up environment variables
cp .env.example .env
# Edit .env and add your Gemini API key:# VITE_GEMINI_API_KEY=your_gemini_api_key_here# 4. Start the development server
npm run dev
# โ Runs at http://localhost:3000
Environment Variables
# .env.example
VITE_GEMINI_API_KEY= # Required: Google Gemini AI API key
VITE_MODEL_NAME=gemini-1.5-flash # AI model to use
VITE_DETECTION_CONFIDENCE=0.8 # MediaPipe detection threshold
VITE_TRACKING_CONFIDENCE=0.7 # MediaPipe tracking threshold
VITE_MAX_HANDS=2 # Max simultaneous hands tracked
VITE_CAMERA_FPS=60 # Target camera frame rate
Deployment Flow
flowchart LR
DEV["๐จโ๐ป Development\nnpm run dev\nlocalhost:3000"]
--> BUILD["๐ฆ Production Build\nnpm run build\ndist/ folder"]
--> PREVIEW["๐ Preview\nnpm run preview"]
--> DEPLOY["๐ Deploy\nGitHub Pages / Vercel / Netlify"]
style DEV fill:#1e293b,color:#60a5fa
style BUILD fill:#1e293b,color:#a78bfa
style PREVIEW fill:#1e293b,color:#34d399
style DEPLOY fill:#1e293b,color:#fb923c
flowchart LR
subgraph PROBLEMS["โ ๏ธ Known Challenges"]
P1["Poor Lighting"]
P2["Gesture Ambiguity"]
P3["CPU Performance"]
P4["Multi-Hand Conflict"]
P5["False Positives"]
end
subgraph SOLUTIONS["โ Implemented Solutions"]
S1["Adaptive brightness\nnormalization in preprocessing"]
S2["Temporal smoothing +\nGemini AI disambiguation"]
S3["Web Workers for\noff-thread inference"]
S4["Priority queue +\nprimary hand dominance"]
S5["Debounce engine +\nconfidence gating"]
end
P1 --> S1
P2 --> S2
P3 --> S3
P4 --> S4
P5 --> S5
Loading
Challenge
Root Cause
Mitigation
Status
Lighting sensitivity
MediaPipe relies on contrast
Histogram equalization + brightness normalization
โ Implemented
Gesture ambiguity
Similar landmark configs
Temporal buffer + Gemini reasoning
โ Implemented
CPU bottleneck
WASM inference on main thread
Offload to Web Worker
๐ In Progress
Jitter/tremor
Raw coordinates noisy
Kalman filter smoothing
โ Implemented
False positives
Unintentional gestures
Debounce + hold duration gating
โ Implemented
Multi-hand conflict
Two hands competing
Dominant hand priority system
โ Implemented
Camera permission
Browser security model
Graceful degradation UI
โ Implemented
๐ฎ Roadmap
Development Timeline
gantt
title HandMatrix Neural Engine โ Development Roadmap
dateFormat YYYY-MM
axisFormat %b %Y
section Phase 1 โ MVP (Complete)
Single-hand gesture detection :done, p1a, 2025-10, 2025-11
Cursor + click control :done, p1b, 2025-11, 2025-12
React dashboard foundation :done, p1c, 2025-11, 2025-12
section Phase 2 โ Enhanced (Complete)
Two-hand gesture support :done, p2a, 2025-12, 2026-01
Face mesh integration :done, p2b, 2026-01, 2026-02
Gemini AI disambiguation :done, p2c, 2026-01, 2026-02
JSON customization engine :done, p2d, 2026-02, 2026-03
section Phase 3 โ Pro (In Progress)
Gaming mode keybindings :active, p3a, 2026-03, 2026-05
User profile management :active, p3b, 2026-03, 2026-05
Web Worker performance :p3c, 2026-04, 2026-06
accessibility mode :p3d, 2026-05, 2026-07
section Phase 4 โ Future
Voice + gesture hybrid :p4a, 2026-07, 2026-09
AI gesture learning :p4b, 2026-08, 2026-10
Cloud profile sync :p4c, 2026-09, 2026-11
Mobile / AR/VR support :p4d, 2026-10, 2027-01
Loading
Feature Versioning
Version
Features
Status
v1.0
Single-hand cursor + click, basic dashboard
โ Released
v1.5
Two-hand gestures, face control
โ Released
v2.0
Gemini AI, customization engine, modes
โ Released
v2.5
Gaming mode, user profiles, Web Workers
๐ In Progress
v3.0
Voice hybrid, AI learning, mobile
๐ Planned
v4.0
AR/VR integration, cloud sync
๐ฎ Future
๐งฉ Use Cases
mindmap
root((HandMatrix\nUse Cases))
Accessibility
Motor-impaired users
Post-surgery recovery
ALS/Parkinson's patients
Professional
Surgical theater control
Clean-room environments
Industrial control panels
Entertainment
PC gaming
VR navigation
Live performance
Education
Touchless presentations
Interactive whiteboards
Remote teaching
Smart Home
Gesture-based IoT
TV/media control
Lighting control
Health & Fitness
Hands-free workout tracking
Rehab exercise tracking
Rishvin Reddy
B.Tech CSE (BIC) ยท Woxsen University
๐ License
MIT License
Copyright (c) 2026 Rishvin Reddy
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, subject to the following conditions: ...
โญ HandMatrix is not just a project โ
It is the future of how humans interact with machines. Touch becomes optional. Intention becomes the interface.
If this project inspires you:
Built with โค๏ธ by Rishvin Reddy ยท Woxsen University ยท 2026
About
HandMatrix is a real-time AI-based hand gesture control system built using computer vision, enabling touchless interaction with computers, applications, and games.