This project is an automated tool that analyzes student marksheets (image or PDF) and provides personalized recommendations based on marks and attendance using OCR and Machine Learning.
1️⃣ Upload: Users upload a marksheet (PNG/JPG/PDF).
2️⃣ Extract: OCR (EasyOCR, OpenCV) reads text from the document.
3️⃣ Process: Extracted text is cleaned and parsed (subject marks & attendance).
4️⃣ Predict: A trained RandomForestClassifier predicts one of:
- Eligible for Advanced Courses
- Needs Improvement
- High Risk of Failure
5️⃣ Display: Results are shown with charts in a dashboard.
- 📂 Image & PDF uploads
- 🔍 OCR text extraction
- 📈 Visual results: bar & pie charts
- ⚙️ 98% accurate ML model
- 🔑 Easy-to-use Flask web app
| Tech | Use |
|---|---|
| Python 3.7+ | Language |
| Flask | Web server |
| EasyOCR, OpenCV | OCR text extraction |
| pdf2image, Poppler | PDF to image |
| pandas, numpy | Data handling |
| scikit-learn | ML model (RandomForestClassifier) |
| matplotlib, seaborn | Charts |
| HTML, CSS | UI templates |
- Algorithm: RandomForestClassifier
- Accuracy: 98% on test data
- ROC AUC: 1.00 for all classes
- Example confusion matrix:

- Python 3.7+
pip install flask easyocr opencv-python pillow pdf2image pandas numpy scikit-learn matplotlib seaborn- Poppler:
- Windows: Poppler for Windows
- Linux:
sudo apt-get install poppler-utils
- Model files:
student_recommendation_model.pkl,label_encoder.pkl