The Photo Validation System is an automated tool designed to validate ID photos against official requirements. It ensures that uploaded images meet specific criteria, such as correct dimensions, neutral expression, white background, proper face positioning, and absence of accessories like glasses or headwear.
Face Detection – Ensures a face is present in the image.
Dimension Validation – Ensures the image size is exactly 413x531 pixels.
Background Check – Confirms the background is white.
Expression Analysis – Checks for a neutral expression (no smiling or teeth showing).
Pose Estimation – Verifies direct gaze, visible shoulders, and unobstructed ears.
Accessory Detection – Identifies whether glasses or headwear are present.
git clone <repository-url>
cd Photo Validation Systempip install -r requirements.txt
pip install retinaface --no-deps
TOGETHER_API_KEY=your_api_key_hereuvicorn main:app --host 0.0.0.0 --port 8000
streamlit run app.py- API will be available at http://127.0.0.1:8000/.
- The interface will open, allowing users to upload images for validation.
The Photo Validation System follows a modular approach, leveraging computer vision and machine learning models to validate images based on official ID photo requirements. The system performs the following checks:
- Face Detection: Identifies whether a face is present.
- Dimension Validation: Ensures the image is exactly 413x531 pixels.
- Background Check: Ensures the background is white.
- Pose Estimation: Checks if the subject is looking directly at the camera with visible shoulders and unobstructed ears.
- Expression Analysis: Ensures a neutral expression (no smiling or teeth showing).
- Accessory Detection: Identifies whether glasses or headwear are present.
- FastAPI – Backend framework
- Streamlit – Frontend for user interaction
- OpenCV – Image processing
- MTCNN (FaceNet-PyTorch) – Face detection
- RetinaFace – Background validation
- DeepFace – Expression analysis
- MediaPipe – Pose estimation
- Meta-Llama 3.2 Vision Model – Accessory detection (temporary solution)
- Uses MTCNN to detect faces.
- If no face is detected, validation fails.
- Uses OpenCV + RetinaFace to identify if the background is white.
- Faces are masked out before calculating the white background ratio.
- Uses MediaPipe to verify:
- Looking directly at the camera
- Shoulders are visible
- Ears are unobstructed
- If any of these checks fail, the validation fails.
- Uses DeepFace to analyze the dominant emotion.
- Only neutral expressions pass.
- Uses OpenCV to check if the image is 413x531 pixels.
- If dimensions do not match, validation fails.
- Originally planned as a computer vision (CV) approach using object detection models.
- Due to accuracy issues, Meta-Llama 3.2 (Vision-Language Model) was used instead.
- Future improvements involve fine-tuning a dedicated YOLOv8 model to replace this approach.
- Issue: The CV-based approach for accessory detection lacked precision.
- Temporary Fix: Used a Vision-Language Model (Meta-Llama 3.2) via the Together API.
- Limitation: External API dependency increases cost.
- Issue: The white background detection may fail under uneven lighting conditions.
- Limitation: If the lighting varies significantly, the HSV threshold method may not work reliably.
- Issue: Some models (e.g., MTCNN, RetinaFace) are computationally expensive.
- Limitation: Slower validation times, especially for high-resolution images.
- Issue: The current implementation lacks the accuracy and efficiency required for large-scale production deployment.
- Limitation: The system needs further fine-tuning and optimization, especially for reducing false positives in accessory detection and improving real-time performance.
- Approach: Fine-tune a YOLOv8-based object detection model for accessories.
- Benefit: Eliminates reliance on Vision-Language Models, reducing cost.
- Approach: Implement a deep learning segmentation model.
- Benefit: Improves robustness against a more varied dataset.
- Approach: Improve the user interface by implementing better frontend techniques.
- Benefit: Improves user experience
