Heart Disease Classification

A machine learning project comparing multiple classification models for predicting heart disease presence and severity using the UCI Heart Disease Dataset. Built as part of MSU's CSE 404 (Machine Learning) course.

Overview

This project implements and benchmarks five different model types — Neural Networks, SVM, Decision Tree, Random Forest, and XGBoost — on a 5-class classification task (no disease + 4 severity levels) as well as a binary presence/absence task.

Tech Stack

Python, PyTorch, scikit-learn, XGBoost, Pandas, Matplotlib, Seaborn

How to Run

Install dependencies:

pip install torch scikit-learn xgboost pandas matplotlib seaborn ucimlrepo imbalanced-learn

Run any model directly:

python model_with_advanced_stats.py
python svm_model_changed.py
python random_forest.py
python decision_tree_model.py

For XGBoost, open and run xgboost.ipynb in Jupyter Notebook. The dataset is fetched automatically from the UCI ML Repository — no manual download needed.

Heart Disease Classification Neural Network Structures and Performance Summary

1. ThreeLayerNetwork

Architecture:

Input layer: 13 features
Hidden layers:
- Layer 1: 128 neurons, ReLU activation
- Layer 2: 64 neurons, ReLU activation
Output layer: 5 neurons (no activation applied)

Performance:

Training (Epoch 250):
- Accuracy: 61.3%
- Average Loss: 0.9996
Validation:
- Accuracy: 36.7%
- Average Loss: 1.3301
Test:
- Accuracy: 47.5%
- Average Loss: 1.2531

2. FourLayerNetwork

Architecture:

Input layer: 13 features
Hidden layers:
- Layer 1: 128 neurons, ReLU activation
- Layer 2: 64 neurons, ReLU activation
- Layer 3: 32 neurons, ReLU activation
Output layer: 5 neurons (no activation applied)

Performance:

Training (Epoch 250):
- Accuracy: 63.7%
- Average Loss: 0.9793
Validation:
- Accuracy: 43.3%
- Average Loss: 1.3079
Test:
- Accuracy: 50.8%
- Average Loss: 1.2819

3. ThreeLayerExpandedNetwork

Architecture:

Input layer: 13 features
Hidden layers:
- Layer 1: 512 neurons, ReLU activation
- Layer 2: 128 neurons, ReLU activation
Output layer: 5 neurons (no activation applied)

Performance:

Training (Epoch 250):
- Accuracy: 73.1%
- Average Loss: 0.8119
Validation:
- Accuracy: 43.3%
- Average Loss: 1.4602
Test:
- Accuracy: 52.5%
- Average Loss: 1.2090

4. FourLayerExpandedNetwork

Architecture:

Input layer: 13 features
Hidden layers:
- Layer 1: 512 neurons, ReLU activation
- Layer 2: 256 neurons, ReLU activation
- Layer 3: 128 neurons, ReLU activation
Output layer: 5 neurons (no activation applied)

Performance:

Training (Epoch 250):
- Accuracy: 66.5%
- Average Loss: 0.8162
Validation:
- Accuracy: 40.0%
- Average Loss: 1.4989
Test:
- Accuracy: 57.4%
- Average Loss: 1.1832

5. ThreeLayerSoftmaxNetwork

Architecture:

Input layer: 13 features
Hidden layers:
- Layer 1: 128 neurons, ReLU activation
- Layer 2: 64 neurons, ReLU activation
Output layer: 5 neurons, Softmax activation

Performance:

Training (Epoch 250):
- Accuracy: 55.7%
- Average Loss: 1.3482
Validation:
- Accuracy: 46.7%
- Average Loss: 1.4382
Test:
- Accuracy: 52.5%
- Average Loss: 1.3802

Advanced Stats for Model in `model_with_advanced_stats.py`

Final Epoch Results:

Training:

Regular Accuracy: 66.0%
Binary Accuracy: 83.0%
Precision (Multi-Class): 60.5%
Recall (Multi-Class): 66.0%
F1 Score (Multi-Class): 61.2%
Precision (Binary): 90.7%
Recall (Binary): 70.1%
F1 Score (Binary): 79.1%
Average Loss: 0.8250

Validation:

Regular Accuracy: 43.3%
Binary Accuracy: 83.3%
Precision: 38.2%
Recall: 30.0%
F1 Score: 22.5%
ROC AUC: 86.8%
Average Loss: 1.2962

Test:

Regular Accuracy: 59.0%
Binary Accuracy: 85.2%
Precision (Multi-Class): 43.2%
Recall (Multi-Class): 34.0%
F1 Score (Multi-Class): 24.9%
Precision (Binary): 79.2%
Recall (Binary): 82.6%
F1 Score (Binary): 80.9%
ROC AUC: 0.85
Average Loss: 0.9168

Summary of Results:

The FourLayerExpandedNetwork achieved the best test accuracy (57.4%) and lowest test loss (1.1832), demonstrating its capacity to generalize better on the test data compared to other models. However, the advanced stats model further emphasizes the utility of binary accuracy and ROC AUC metrics, showcasing strong performance in identifying positive cases while maintaining reasonable multi-class classification capability.

4. Random Forest Model Results

Model Performance

•	Overall Test Accuracy: 54.95%

Confusion Matrix

Predicted / Actual Class 0 Class 1 Class 2 Class 3 Class 4 Class 0 45 1 1 1 0 Class 1 10 2 3 1 1 Class 2 4 2 2 4 0 Class 3 1 6 2 1 0 Class 4 1 1 1 1 0

5. Decision Tree Model Results

Model Performance

Test Classification Report: precision recall f1-score support

       0       0.79      0.75      0.77        36
       1       0.08      0.07      0.07        15
       2       0.00      0.00      0.00         4
       3       0.00      0.00      0.00         5
       4       0.00      0.00      0.00         1

accuracy                           0.46        61

macro avg 0.17 0.16 0.17 61 weighted avg 0.49 0.46 0.47 61

Test Confusion Matrix: [[27 9 0 0 0] [ 5 1 4 5 0] [ 2 1 0 1 0] [ 0 1 2 0 2] [ 0 1 0 0 0]]

6. Support Vector Machine (SVM) Results

Best Parameters:

C: 10
Gamma: 0.1
Kernel: rbf

Binary Accuracy with Best Parameters:

0.87

Classification Report:

Class	Precision	Recall	F1-Score	Support
0	0.88	0.94	0.91	31
1	0.86	0.78	0.82	23
2	0.88	0.88	0.88	16
3	0.75	0.79	0.77	19
4	0.96	0.93	0.94	27

Overall Accuracy: 0.87
Macro Average:
- Precision: 0.86
- Recall: 0.86
- F1-Score: 0.86
Weighted Average:
- Precision: 0.87
- Recall: 0.87
- F1-Score: 0.87

Confusion Matrix:

	Predicted 0	Predicted 1	Predicted 2	Predicted 3	Predicted 4
Actual 0	29	1	0	1	0
Actual 1	2	18	1	2	0
Actual 2	1	0	14	1	0
Actual 3	1	1	1	15	1
Actual 4	0	1	0	1	25

This demonstrates the performance of the Support Vector Machine (SVM) model using the best parameters obtained from hyperparameter tuning.

5. Decision Tree Model Results

Model Performance

Test Classification Report: precision recall f1-score support

       0       0.79      0.75      0.77        36
       1       0.08      0.07      0.07        15
       2       0.00      0.00      0.00         4
       3       0.00      0.00      0.00         5
       4       0.00      0.00      0.00         1

accuracy                           0.46        61

macro avg 0.17 0.16 0.17 61 weighted avg 0.49 0.46 0.47 61

Test Confusion Matrix: [[27 9 0 0 0] [ 5 1 4 5 0] [ 2 1 0 1 0] [ 0 1 2 0 2] [ 0 1 0 0 0]]

6. XGBoost Results

Best Parameters:

'colsample_bytree': 1.0
'learning_rate': 0.2
'max_depth': 5
'n_estimators': 200
'subsample': 0.8

Binary Accuracy with Best Parameters:

0.94

Classification Report:

Class	Precision	Recall	F1-Score	Support
0	0.85	0.81	0.83	27
1	0.96	0.90	0.93	29
2	0.79	0.79	0.79	24
3	0.73	0.76	0.74	21
4	0.93	1.00	0.96	27

Overall Accuracy: 0.86
Macro Average:
- Precision: 0.85
- Recall: 0.85
- F1-Score: 0.85
Weighted Average:
- Precision: 0.86
- Recall: 0.86
- F1-Score: 0.86

Confusion Matrix:

	Predicted 0	Predicted 1	Predicted 2	Predicted 3	Predicted 4
Actual 0	22	1	2	2	0
Actual 1	2	26	0	1	0
Actual 2	2	0	19	3	0
Actual 3	0	0	3	16	2
Actual 4	0	0	0	0	27

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
data_reader.py		data_reader.py
decision_tree_model.py		decision_tree_model.py
model.py		model.py
model_with_advanced_stats.py		model_with_advanced_stats.py
random_forest.py		random_forest.py
svm_model.py		svm_model.py
svm_model_changed.py		svm_model_changed.py
svm_one_split.py		svm_one_split.py
xgboost.ipynb		xgboost.ipynb

Folders and files

Latest commit

History

Repository files navigation

Heart Disease Classification

Overview

Tech Stack

How to Run

Heart Disease Classification Neural Network Structures and Performance Summary

1. ThreeLayerNetwork

Architecture:

Performance:

2. FourLayerNetwork

Architecture:

Performance:

3. ThreeLayerExpandedNetwork

Architecture:

Performance:

4. FourLayerExpandedNetwork

Architecture:

Performance:

5. ThreeLayerSoftmaxNetwork

Architecture:

Performance:

Advanced Stats for Model in model_with_advanced_stats.py

Final Epoch Results:

Training:

Validation:

Test:

Summary of Results:

4. Random Forest Model Results

5. Decision Tree Model Results

6. Support Vector Machine (SVM) Results

Best Parameters:

Binary Accuracy with Best Parameters:

Classification Report:

Confusion Matrix:

5. Decision Tree Model Results

6. XGBoost Results

Best Parameters:

Binary Accuracy with Best Parameters:

Classification Report:

Confusion Matrix:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Advanced Stats for Model in `model_with_advanced_stats.py`

Packages