Skip to content

SPICExLAB/Handmocap

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

4-Camera Hand Motion Capture

This pipeline pairs four synchronized cameras with 2D hand pose detection (via MMPose or MediaPipe) to reconstruct 3D hand landmarks and visualize reprojection quality. Camera 0 defines the world frame; all coordinates are reported in meters relative to that camera.

Quick Start

Hardware checklist

  • 4 cameras mounted in the diamond layout (Cam0 bottom, Cam1 top-left, Cam2 top-right, Cam3 bottom-right)
  • Rigid mounts with matching heights and slight inward tilt (≈10–15°)
  • 9×6 inner-corner chessboard (square size 23 mm unless you change the scripts)
  • Even, diffuse lighting across the workspace

Software setup

pip install -r requirements.txt
mkdir -p video/camera video/hands

Download (or symlink) an MMDetection hand detector config/checkpoint and an MMPose top-down hand pose config/checkpoint; you will pass their paths to hand_inference.py via CLI flags.

Workflow

  1. Record calibration videos
    Place the chessboard throughout the capture volume while all four cameras record simultaneously. Save as video/camera/cam0.mp4cam3.mp4.

  2. Run calibration

    python calibration.py

    Produces output/calibration/multi_camera_calib.npz and multi_camera_rectify.npz. Target per-camera RMS < 0.5 px and stereo RMS < 1.0 px.

  3. Record hand-motion videos
    Capture synchronized hand footage and store in video/hand/0.mp43.mp4 (or pass --sequence to change the folder).

  4. Run 2D hand inference
    Pick the variant that best fits your use case:

    • MMPose (default / highest accuracy)
      python hand_inference.py \
          --det-config models/det/rtmdet_tiny_8xb32-300e_coco.py \
          --det-checkpoint models/det/rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth \
          --pose-config models/pose/rtmpose-m_8xb256-210e_hand5-256x256.py \
          --pose-checkpoint models/pose/rtmpose-m_simcc-hand5_pt-aic-coco_210e-256x256-74fb594_20230320.pth \
          --device cuda:0 \
          --sequence hand
      hand_inference.py is a thin wrapper around hand_inference_mmpose.py, so you can call either script with the same flags.
    • MediaPipe (no configs/checkpoints required)
      python hand_inference_mediapipe.py --sequence hand
      Useful for quick tests on CPU-only machines. Produces the same pickle format, so downstream steps remain unchanged.

    Both variants write cached detections to output/detections/<sequence>_2d_detections.pkl. During inference with --preview, both show real-time visualization of detected hand skeletons (21 keypoints + connections) overlaid on each camera view.

  5. Triangulate 3D hands

    python hand_triangulation.py \
        --detections output/detections/hand_2d_detections.pkl \
        --display

    Writes multi- and single-hand 3D trajectories under output/tracking/.

  6. Evaluate results (optional but recommended)

    python evaluate.py
    python check_hand_consistency.py

    Generates a reprojection video and diagnostic plots under output/evaluation/ and output/visualization/.

  7. Calibrate quality (optional)
    python checkerboard_eval.py summarizes checkerboard reprojection errors for sanity checking.

Hand detector / pose configs (hand_inference*.py)

  • --det-config / --det-checkpoint: MMDetection hand detector (e.g., RTMDet hand). Set --det-cat-id if your detector uses a different class index (default 0).
  • --pose-config / --pose-checkpoint: MMPose top-down hand pose model (e.g., RTMPose). Make sure the model predicts 21 keypoints that follow the MediaPipe ordering.
  • --device: cpu, cuda:0, etc. Defaults to cuda:0 if available, otherwise cpu.
  • Optional --det-score-thr / --pose-score-thr tune per-camera detection filtering; --max-hands-per-view limits per-camera tracking.
  • Use --sequence to target a different video/<sequence>/cam.mp4 folder, and --output to rename the cached detection pickle.
  • MediaPipe variant ignores detector/pose config flags; tune its behavior via --det-score-thr, --pose-score-thr, --max-hands-per-view, and --max-frames.

Project Layout

Hand_MoCap/
├── calibration.py                      # Step 1: chessboard-based calibration
├── hand_inference_mmpose.py            # Step 2a: MMPose detection + pose estimation
├── hand_inference_mediapipe.py         # Step 2b: MediaPipe detection + pose estimation
├── hand_triangulation.py               # Step 3: multi-view matching & 3D reconstruction
├── evaluate.py                         # Step 4: reprojection video generation
├── checkerboard_eval.py                # Optional: calibration quality diagnostics
├── video_utils.py                      # Utility: video discovery helpers
├── video/                              # Input footage (camera + hand recordings)
│   ├── camera/                         # Calibration videos
│   └── hand/                           # Hand motion videos
├── output/                             # Generated artifacts (auto-created)
│   ├── calibration/                    # Calibration parameters
│   ├── detections/                     # Cached 2D detections
│   ├── tracking/                       # 3D hand trajectories
│   └── evaluation/                     # Reprojection videos
├── models/                             # MMPose/MMDetection model files
│   ├── det/                            # Detection model configs/checkpoints
│   └── pose/                           # Pose model configs/checkpoints
└── requirements.txt                    # Python dependencies

Output directories

  • output/calibration/
    • multi_camera_calib.npz – intrinsics (K0–K3), distortion (D0–D3), rotations (R1–R3), and translations (T1–T3) expressed from camera 0
    • multi_camera_rectify.npz – rectification transforms (R1_01, P2_03, Q_02, …) for stereo matching
  • output/detections/
    • <sequence>_2d_detections.pkl – cached per-frame, per-camera 2D keypoints produced by hand_inference.py
  • output/tracking/
    • hand_3d_positions_multi.pkl – list of frames; each frame contains 0–2 hands with (21, 3) arrays in meters
    • hand_3d_positions.npy – single-hand array (first hand per frame) with NaNs when no hand is present
  • output/evaluation/
    • reprojection_4cam.mp4 – 2×2 grid showing original footage with reprojected landmarks
  • output/visualization/
    • hand_consistency_check.png – coverage, motion, and ID-consistency plots

Delete output/ to reset the workspace; scripts recreate folders as needed.

Camera Setup Guidance

Top view (diamond layout):

    Cam1          Cam2
      \            /
       \          /
   45°  \        /  45°
         \      /
      [Hand Workspace]
         /      \
   45°  /        \  45°
       /          \
      /            \
    Cam0          Cam3
  • Positioning: keep all cameras ~0.5 m from the workspace center at a common height (~30 cm above the surface) and tilt inward by ~10–15°.
  • Baselines: expect ~0.7 m between adjacent cameras; Camera 1↔3 forms the widest pair (~1 m) and provides strong depth cues.
  • Coverage: the most reliable capture volume is a 20 cm cube at the center; quality remains good out to ≈35 cm before dropping to two-camera coverage near the edges.

Setup checklist

  • Mounts are rigid and heights match
  • Lighting is uniform with minimal glare or shadows
  • Cameras share the same resolution and frame rate (≥30 FPS)
  • Auto-exposure/white balance are consistent or locked
  • Recording start times are tightly synchronized (<100 ms skew)

Calibration & Tracking Details

calibration.py

  • Defaults: chessboard_size = (9, 6) inner corners, square_size = 0.023 m.
  • Samples every sample_every_n_frames (default 50) up to max_frames sets.
  • Prints per-camera RMS and stereo RMS errors plus baselines. Recalibrate if RMS > 1.0 px or baselines are inconsistent.

hand_inference_mmpose.py / hand_inference_mediapipe.py

  • Runs 2D hand detection per camera view to cache 2D joints for every frame in a synchronized sequence.
  • MMPose variant: Uses MMDetection + MMPose pipeline; accepts --det-config, --pose-config, --device, and related flags.
  • MediaPipe variant: CPU-friendly alternative requiring no model downloads; tune via --det-score-thr and --pose-score-thr.
  • Both support --preview for real-time skeleton visualization (green lines + keypoints) and --sequence to select input folder.
  • Writes a pickle containing per-frame, per-camera detections (keypoints + confidences) under output/detections/.

hand_triangulation.py

  • Consumes the cached detections, camera calibration, and (optionally) the raw videos to match hands across views and triangulate them.
  • Performs bundle-adjusted triangulation with per-landmark outlier rejection; enable --debug-matching for verbose pairing logs and --display for reprojected overlays.
  • Saves both multi-hand (pickle) and single-hand (NumPy) trajectories under output/tracking/.

evaluate.py

  • Reprojects tracked 3D landmarks into all cameras to visually validate alignment.
  • Video layout is Cam0 | Cam1 over Cam2 | Cam3; green landmarks mark the first hand, magenta the second.

check_hand_consistency.py

  • Aggregates statistics such as per-frame hand counts, wrist trajectories, and inter-hand distances to spot ID swaps or dropouts.

Data Formats

  • multi_camera_calib.npz
    Load with np.load, access K*, D*, R*, T*, E*, F*. Rotations/Translations map camera 0 coordinates into other camera frames (P_i = R_i @ P_0 + T_i).

  • hand_3d_positions_multi.pkl

    import pickle
    with open("output/tracking/hand_3d_positions_multi.pkl", "rb") as f:
        frames = pickle.load(f)
    # frames[frame_idx][hand_idx][landmark_idx] -> (x, y, z) in meters

    Landmarks follow MediaPipe ordering (0 wrist, 4 thumb tip, 8 index tip, 12 middle tip, 16 ring tip, 20 pinky tip).

  • hand_3d_positions.npy
    Shape (num_frames, 21, 3); NaN rows indicate frames without a detected hand.

Troubleshooting & Tips

  • Chessboard not detected: improve lighting, slow down board motion, confirm chessboard_size/square_size match the physical board.
  • High calibration error: capture more diverse poses (cover corners and tilt angles), ensure cameras remain fixed, clean lenses.
  • Hands appear gray or unmatched: check synchronization, lighting balance, and detection thresholds in hand_inference.py (--det-score-thr, --pose-score-thr).
  • Jittery trajectories: recalibrate, verify camera mounts, or apply temporal smoothing to the exported data.
  • Large reprojection error: re-run checkerboard_eval.py to locate problematic frames/cameras; recalibrate if mean error exceeds a few pixels.

Requirements & Performance

  • Python 3.8+ with PyTorch, OpenCV, NumPy, and Matplotlib (see requirements.txt).
  • For MMPose: Requires MMPose, MMDetection, MMEngine, and MMCV packages plus model checkpoints.
  • For MediaPipe: Only requires the mediapipe package (installed via requirements.txt).
  • Typical run times on a modern GPU laptop: calibration ≈1 min (25 frames), tracking 5–10 FPS for 4 cameras, evaluation ≈30 FPS for rendering. CPU-only inference works but is significantly slower.

Next Steps

  • Use the pickle output for gesture recognition, biomechanics analysis, or downstream machine learning.
  • Tune detection settings via hand_inference.py flags (score thresholds, max hands) and triangulation heuristics with hand_triangulation.py (--max-hands-total, --reproj-rejection) to balance robustness and speed.
  • Extend the evaluator or diagnostics scripts to suit your application (e.g., export CSV summaries or integrate temporal filters).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 98.6%
  • Shell 1.4%