Skip to content

ccicchella/HeartDiseaseClassificationMLModel

Repository files navigation

Heart Disease Classification

A machine learning project comparing multiple classification models for predicting heart disease presence and severity using the UCI Heart Disease Dataset. Built as part of MSU's CSE 404 (Machine Learning) course.

Overview

This project implements and benchmarks five different model types — Neural Networks, SVM, Decision Tree, Random Forest, and XGBoost — on a 5-class classification task (no disease + 4 severity levels) as well as a binary presence/absence task.

Tech Stack

Python, PyTorch, scikit-learn, XGBoost, Pandas, Matplotlib, Seaborn

How to Run

Install dependencies:

pip install torch scikit-learn xgboost pandas matplotlib seaborn ucimlrepo imbalanced-learn

Run any model directly:

python model_with_advanced_stats.py
python svm_model_changed.py
python random_forest.py
python decision_tree_model.py

For XGBoost, open and run xgboost.ipynb in Jupyter Notebook. The dataset is fetched automatically from the UCI ML Repository — no manual download needed.


Heart Disease Classification Neural Network Structures and Performance Summary

1. ThreeLayerNetwork

Architecture:

  • Input layer: 13 features
  • Hidden layers:
    • Layer 1: 128 neurons, ReLU activation
    • Layer 2: 64 neurons, ReLU activation
  • Output layer: 5 neurons (no activation applied)

Performance:

  • Training (Epoch 250):
    • Accuracy: 61.3%
    • Average Loss: 0.9996
  • Validation:
    • Accuracy: 36.7%
    • Average Loss: 1.3301
  • Test:
    • Accuracy: 47.5%
    • Average Loss: 1.2531

2. FourLayerNetwork

Architecture:

  • Input layer: 13 features
  • Hidden layers:
    • Layer 1: 128 neurons, ReLU activation
    • Layer 2: 64 neurons, ReLU activation
    • Layer 3: 32 neurons, ReLU activation
  • Output layer: 5 neurons (no activation applied)

Performance:

  • Training (Epoch 250):
    • Accuracy: 63.7%
    • Average Loss: 0.9793
  • Validation:
    • Accuracy: 43.3%
    • Average Loss: 1.3079
  • Test:
    • Accuracy: 50.8%
    • Average Loss: 1.2819

3. ThreeLayerExpandedNetwork

Architecture:

  • Input layer: 13 features
  • Hidden layers:
    • Layer 1: 512 neurons, ReLU activation
    • Layer 2: 128 neurons, ReLU activation
  • Output layer: 5 neurons (no activation applied)

Performance:

  • Training (Epoch 250):
    • Accuracy: 73.1%
    • Average Loss: 0.8119
  • Validation:
    • Accuracy: 43.3%
    • Average Loss: 1.4602
  • Test:
    • Accuracy: 52.5%
    • Average Loss: 1.2090

4. FourLayerExpandedNetwork

Architecture:

  • Input layer: 13 features
  • Hidden layers:
    • Layer 1: 512 neurons, ReLU activation
    • Layer 2: 256 neurons, ReLU activation
    • Layer 3: 128 neurons, ReLU activation
  • Output layer: 5 neurons (no activation applied)

Performance:

  • Training (Epoch 250):
    • Accuracy: 66.5%
    • Average Loss: 0.8162
  • Validation:
    • Accuracy: 40.0%
    • Average Loss: 1.4989
  • Test:
    • Accuracy: 57.4%
    • Average Loss: 1.1832

5. ThreeLayerSoftmaxNetwork

Architecture:

  • Input layer: 13 features
  • Hidden layers:
    • Layer 1: 128 neurons, ReLU activation
    • Layer 2: 64 neurons, ReLU activation
  • Output layer: 5 neurons, Softmax activation

Performance:

  • Training (Epoch 250):
    • Accuracy: 55.7%
    • Average Loss: 1.3482
  • Validation:
    • Accuracy: 46.7%
    • Average Loss: 1.4382
  • Test:
    • Accuracy: 52.5%
    • Average Loss: 1.3802

Advanced Stats for Model in model_with_advanced_stats.py

Final Epoch Results:

Training:

  • Regular Accuracy: 66.0%
  • Binary Accuracy: 83.0%
  • Precision (Multi-Class): 60.5%
  • Recall (Multi-Class): 66.0%
  • F1 Score (Multi-Class): 61.2%
  • Precision (Binary): 90.7%
  • Recall (Binary): 70.1%
  • F1 Score (Binary): 79.1%
  • Average Loss: 0.8250

Validation:

  • Regular Accuracy: 43.3%
  • Binary Accuracy: 83.3%
  • Precision: 38.2%
  • Recall: 30.0%
  • F1 Score: 22.5%
  • ROC AUC: 86.8%
  • Average Loss: 1.2962

Test:

  • Regular Accuracy: 59.0%
  • Binary Accuracy: 85.2%
  • Precision (Multi-Class): 43.2%
  • Recall (Multi-Class): 34.0%
  • F1 Score (Multi-Class): 24.9%
  • Precision (Binary): 79.2%
  • Recall (Binary): 82.6%
  • F1 Score (Binary): 80.9%
  • ROC AUC: 0.85
  • Average Loss: 0.9168

Summary of Results:

The FourLayerExpandedNetwork achieved the best test accuracy (57.4%) and lowest test loss (1.1832), demonstrating its capacity to generalize better on the test data compared to other models. However, the advanced stats model further emphasizes the utility of binary accuracy and ROC AUC metrics, showcasing strong performance in identifying positive cases while maintaining reasonable multi-class classification capability.

4. Random Forest Model Results

Model Performance

•	Overall Test Accuracy: 54.95%

Confusion Matrix

Predicted / Actual Class 0 Class 1 Class 2 Class 3 Class 4 Class 0 45 1 1 1 0 Class 1 10 2 3 1 1 Class 2 4 2 2 4 0 Class 3 1 6 2 1 0 Class 4 1 1 1 1 0

5. Decision Tree Model Results

Model Performance

Test Classification Report: precision recall f1-score support

       0       0.79      0.75      0.77        36
       1       0.08      0.07      0.07        15
       2       0.00      0.00      0.00         4
       3       0.00      0.00      0.00         5
       4       0.00      0.00      0.00         1

accuracy                           0.46        61

macro avg 0.17 0.16 0.17 61 weighted avg 0.49 0.46 0.47 61

Test Confusion Matrix: [[27 9 0 0 0] [ 5 1 4 5 0] [ 2 1 0 1 0] [ 0 1 2 0 2] [ 0 1 0 0 0]]

6. Support Vector Machine (SVM) Results

Best Parameters:

  • C: 10
  • Gamma: 0.1
  • Kernel: rbf

Binary Accuracy with Best Parameters:

  • 0.87

Classification Report:

Class Precision Recall F1-Score Support
0 0.88 0.94 0.91 31
1 0.86 0.78 0.82 23
2 0.88 0.88 0.88 16
3 0.75 0.79 0.77 19
4 0.96 0.93 0.94 27
  • Overall Accuracy: 0.87
  • Macro Average:
    • Precision: 0.86
    • Recall: 0.86
    • F1-Score: 0.86
  • Weighted Average:
    • Precision: 0.87
    • Recall: 0.87
    • F1-Score: 0.87

Confusion Matrix:

Predicted 0 Predicted 1 Predicted 2 Predicted 3 Predicted 4
Actual 0 29 1 0 1 0
Actual 1 2 18 1 2 0
Actual 2 1 0 14 1 0
Actual 3 1 1 1 15 1
Actual 4 0 1 0 1 25

This demonstrates the performance of the Support Vector Machine (SVM) model using the best parameters obtained from hyperparameter tuning.

5. Decision Tree Model Results

Model Performance

Test Classification Report: precision recall f1-score support

       0       0.79      0.75      0.77        36
       1       0.08      0.07      0.07        15
       2       0.00      0.00      0.00         4
       3       0.00      0.00      0.00         5
       4       0.00      0.00      0.00         1

accuracy                           0.46        61

macro avg 0.17 0.16 0.17 61 weighted avg 0.49 0.46 0.47 61

Test Confusion Matrix: [[27 9 0 0 0] [ 5 1 4 5 0] [ 2 1 0 1 0] [ 0 1 2 0 2] [ 0 1 0 0 0]]

6. XGBoost Results

Best Parameters:

  • 'colsample_bytree': 1.0
  • 'learning_rate': 0.2
  • 'max_depth': 5
  • 'n_estimators': 200
  • 'subsample': 0.8

Binary Accuracy with Best Parameters:

  • 0.94

Classification Report:

Class Precision Recall F1-Score Support
0 0.85 0.81 0.83 27
1 0.96 0.90 0.93 29
2 0.79 0.79 0.79 24
3 0.73 0.76 0.74 21
4 0.93 1.00 0.96 27
  • Overall Accuracy: 0.86
  • Macro Average:
    • Precision: 0.85
    • Recall: 0.85
    • F1-Score: 0.85
  • Weighted Average:
    • Precision: 0.86
    • Recall: 0.86
    • F1-Score: 0.86

Confusion Matrix:

Predicted 0 Predicted 1 Predicted 2 Predicted 3 Predicted 4
Actual 0 22 1 2 2 0
Actual 1 2 26 0 1 0
Actual 2 2 0 19 3 0
Actual 3 0 0 3 16 2
Actual 4 0 0 0 0 27

About

Machine Learning models trained to detect the presence and severity of heart disease

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors