GitHub - qriyanka/Skin-Condition-Classifier-Project

Skin Condition Classifier A machine learning classifier for six dermatological skin conditions, built on clinical and histopathological data.

About this project

I have a background in Clinical Laboratory Sciences and I am currently completing a 900-hour Master Esthetician program at Atelier Esthetique. I know what eosinophil infiltrate looks like under a microscope and I know what erythema looks like on a client. This project is about using that background to actually build something.

I trained two machine learning models to classify six dermatological skin conditions using 34 clinical and histopathological features. I wanted to build an end-to-end pipeline that reflects how skin conditions are actually differentiated in clinical practice, not just run a model on a dataset I found online.

What it classifies

Six erythemato-squamous conditions that come up in both clinical lab work and esthetic practice:

Condition	Clinical presentation
Psoriasis	Chronic, immune-mediated, disrupts the skin barrier
Seborrhoeic Dermatitis	Scalp and facial, driven by yeast-related inflammation
Lichen Planus	T-cell mediated, polygonal papule presentation
Pityriasis Rosea	Herald patch, self-resolving, commonly misread
Chronic Dermatitis	Compromised barrier, persistent inflammation
Pityriasis Rubra Pilaris	Rare, frequently misdiagnosed, important edge case

Dataset: UCI Dermatology Dataset (366 patients, 34 clinical features, peer-reviewed) Citation: Ilter, N. and Guvenir, H.A. (1998). Differentiating Erythemato-Squamous Diseases.

Results

Model	Test Accuracy	Macro ROC-AUC
Random Forest	94.6%	0.9982
Neural Network (MLP)	89.2%	0.9953

The Random Forest performed better and it was the right choice for this dataset. 366 patients and 34 features is not a deep learning problem. The feature importance output also maps directly back to clinical markers, so you can read why the model made a prediction, not just that it did.

ROC-AUC of 0.9982 means near-perfect separation across all six conditions. Random guessing on 6 classes gives you 0.5.

How to run it

There are 3 scripts, run them in order (python src/load_data.py, python src/explore.py, python src/train.py) and everything generates itself. The charts and model outputs all save to results/ automatically.

Technical decisions

Random Forest over deep learning: Small clinical datasets and tree-based methods are a natural fit. Neural networks would overfit here. Random Forest also gives you feature importance natively, which matters when the features have real clinical meaning.

Class weights: The data has a 5.6x imbalance between the most and least common condition. Without correction the model would underperform on rare conditions, which in a clinical context are often the most important ones to catch.

Median imputation: Eight missing values, all in the age column. Median imputation is standard for clinical data with small amounts of missingness and does not distort the distribution.

On the features: The 34 features split into clinical observations you would make during a skin assessment, erythema, scaling, itching, border definition, koebner phenomenon, and histopathological markers you would see under a microscope, acanthosis, hyperkeratosis, parakeratosis, eosinophil infiltrate, PNL infiltrate. My CLS training means I can read both layers of this dataset fluently, which shaped how I approached the analysis.

Achievement/Reflection/Limitations

Built an end to end ML from scratch. Woo hoo! I'll come back to this to see how I can improve it. Skin type is genuinely hard to classify from a single image without controlled lighting so it makes sense why my Random Forest model only got 32% accuracy. Health scores are all low (18-21 range) and it's because the dataset images are low resolution compressed JPEGs which lose detail. In a production system like Haut.AI (my source of inspiration for this project), they use high resolution controlled lighting photos which would give much higher scores.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
results		results
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About this project

What it classifies

Results

How to run it

Technical decisions

Achievement/Reflection/Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About this project

What it classifies

Results

How to run it

Technical decisions

Achievement/Reflection/Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages