YogaFix is a real-time Yoga Pose detection and feedback system built using Python, OpenCV, and Mediapipe for pose estimation. The system is served via a FastAPI backend that captures webcam frames server-side, processes them to detect poses, and provides real-time feedback over WebSocket connections.
Important: This version relies on server-side webcam access (using
cv2.VideoCapture(0)). It must be deployed on hardware with an attached webcam (e.g., a local machine or a dedicated server/VPS with USB passthrough). Cloud platforms like Render or similar PaaS do not provide direct hardware access.
- Overview
- Features
- Tech Stack
- Directory Structure
- How It Works
- API Endpoints
- Running Locally
- Performance & Concurrency Considerations
- Future Enhancements
- License
This module provides real-time feedback for yoga poses by:
- Capturing webcam video directly on the server.
- Processing each frame with a pose detection algorithm.
- Comparing the user's pose to predefined “ideal” poses.
- Returning annotated frames and detailed feedback over WebSockets.
The system is designed for scenarios where server-side processing is viable (e.g., dedicated hardware) and offers low-latency feedback for enhanced user interaction.
- Real-Time Processing: Captures and processes frames from a physical webcam attached to the server.
- Multiple Pose Detection: Supports a variety of yoga poses, including:
- T Pose
- Tree Pose
- Warrior 3 Pose
- Bridge Pose
- Cat Pose
- Cobra Pose
- Crescent Lunge Pose
- Downward Facing Dog Pose
- Leg-Up-The-Wall Pose
- Mountain Pose
- Padmasana (Lotus Pose)
- Pigeon Pose
- Seated Forward Bend
- Standing Forward Bend
- Triangle Pose
- Warrior Pose
- Detailed Feedback: Computes similarity scores based on joint angles and generates corrective feedback.
- WebSocket Communication: Uses FastAPI’s asynchronous WebSocket support for real-time bi-directional communication.
- CORS Enabled: Easily integrates with separate front-end applications.
- Programming Language: Python 3.x
- Backend Framework: FastAPI
- WebSocket Server: Uvicorn (ASGI server)
- Computer Vision: OpenCV
- Pose Estimation: Mediapipe
- Data Processing: NumPy
- Asynchronous Programming: asyncio
YogaModule/
├─ api/
│ └─ main.py # FastAPI application with server-side webcam processing
├─ logic/
│ ├─ __init__.py
│ ├─ T_pose.py # T Pose detection logic
│ ├─ traingle_pose.py # Triangle Pose detection logic
│ ├─ Tree_pose.py # Tree Pose detection logic
│ ├─ Crescent_lunge_pose.py # Crescent Lunge detection logic
│ ├─ warrior_pose.py # Warrior Pose detection logic
│ └─ mountain_pose.py # Mountain Pose detection logic
├─ tests/
| └─index.htm # Client side code to test the API
└─ README.md # This README file
web-app/app.py: Contains the FastAPI backend which captures frames from a server-side webcam, processes them, and sends back annotated frames and feedback.logic/: Contains the pose checker classes that perform frame processing, angle calculations, and generate feedback.
-
Webcam Capture:
The API opens a connection to a physical webcam usingcv2.VideoCapture(0). -
Frame Processing:
- Each frame is read, flipped for a mirror view, and passed to a selected pose checker.
- The pose checker uses Mediapipe to extract landmarks and compute joint angles.
- A similarity score is calculated by comparing the user's pose with the ideal pose.
- Annotated frames are generated by drawing landmarks using Mediapipe’s drawing utilities.
-
WebSocket Communication:
- The processed frame (encoded as a JPEG and then base64) and the feedback (similarity score, joint details, textual corrections) are sent back to the client via a WebSocket connection.
- A connection manager handles multiple clients and processing tasks concurrently.
-
URL:
/ws/{client_id} -
Method: WebSocket
-
Description:
When a client connects and sends a JSON message containing a"pose_type", the API starts processing frames from the server-side webcam. It continuously sends back a JSON response containing:frame: Base64-encoded annotated JPEG image.feedback: An object with:similarity: A float value representing overall pose similarity.feedback_text: A textual description of the feedback.joint_similarities: Detailed feedback per joint (if applicable).
-
Stop Command:
Clients can send{"command": "stop"}to disconnect and stop processing.
- URL:
/health - Method: GET
- Description:
Returns a JSON response indicating the server status.{ "status": "healthy" }
-
Clone the Repository:
git clone https://github.com/yourusername/YogaModule.git cd YogaModule -
Set Up a Virtual Environment (Optional but Recommended):
python -m venv venv source venv/bin/activate # For Linux/Mac # or venv\Scripts\activate # For Windows
-
Install Dependencies:
pip install fastapi uvicorn opencv-python mediapipe numpy
If a
requirements.txtis available, run:
pip install -r requirements.txt -
Run the FastAPI Server:
cd web-app uvicorn app:app --reload --host 0.0.0.0 --port 8000The server will start at http://localhost:8000.
-
Connect a WebSocket Client: Use a WebSocket client or a browser-based front-end to connect to
ws://localhost:8000/ws/{client_id}and send JSON messages as described.
Note: Ensure that the machine running the server has a webcam attached. If
cv2.VideoCapture(0)fails, verify the webcam index or hardware permissions.
-
CPU-Intensive Processing:
Frame processing (especially with OpenCV and Mediapipe) is CPU-bound. For multiple concurrent connections, consider:- Offloading heavy computations to separate worker threads or processes.
- Horizontal scaling (running multiple instances) if using dedicated hardware.
-
Vertical Scaling:
Since server-side webcam processing avoids network transmission delays and base64 overhead from the client, it can offer faster processing. However, vertical scaling (upgrading CPU/RAM) is crucial if many clients connect concurrently. -
Hardware Constraints:
This approach requires a physical webcam. In cloud environments, server-side webcam access is typically not available, so this setup is best suited for dedicated hardware or on-premise servers.
Demo of the project can be seen in this playlist
-
GPU Acceleration:
Integrate CUDA/TensorRT to speed up pose estimation on GPUs. -
Asynchronous Processing:
Use thread pools or asynchronous libraries to better handle CPU-bound tasks without blocking the event loop. -
Client-Side Integration:
Develop a web or mobile front-end that dynamically connects via WebSockets for real-time feedback. -
Support for Multiple Cameras:
Extend the module to support multiple simultaneous camera inputs or multiple users.
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.