This project implements a machine learning pipeline to detect Amyotrophic Lateral Sclerosis (ALS) from CSV-formatted acoustic data. It uses a shallow neural network (MLPClassifier) and Support Vector Machines (SVC) to distinguish between ALS and normal samples, achieving high accuracy.
This repository contains code and data for classifying ALS vs. normal samples using acoustic features. The model is trained on provided CSV files and evaluated for accuracy, sensitivity, and specificity.
- Data Preparation:
- Data is split into ALS (
A01.csv-A11.csv) and normal (N01.csv-N11.csv) samples in theALSDetection_data/folder. - Each file contains acoustic features as comma-separated values.
- For training, one pair (ALS and normal) is randomly left out for testing (leave-one-out cross-validation).
- Data is split into ALS (
- Model:
- Two approaches are provided:
- A shallow neural network (
MLPClassifierfrom scikit-learn) with early stopping and L2 regularization to prevent overfitting. - A support vector machine (SVM) classifier (
SVCfrom scikit-learn) with regularization and RBF kernel to help prevent overfitting.
- A shallow neural network (
- Data is standardized using
StandardScalerin both approaches.
- Two approaches are provided:
- Evaluation:
- Model performance is measured using accuracy, sensitivity, and specificity.
- Confusion matrix is printed for detailed analysis.
- Clone the repository:
git clone https://github.com/aryanjverma/als-detection.git cd als-detection
- Install dependencies:
Ensure you have Python 3.8+ installed. Then run:
pip install -r requirements.txt
-
Test the neural network model:
python test.py
This will process the data, load the neural network, and print the confusion matrix, sensitivity, specificity, and overall accuracy.
-
Test the SVM model:
python svm_test.py
This will print the confusion matrix, sensitivity, specificity, and overall accuracy for the SVM approach.
-
Predict on your own data (neural network): Ensure
model.joblibandsvm.joblibexists.python predict.py
The script will prompt you to enter the file location of the data you want to analyze, for example:
Enter the file you want to test (must be in correct format): ALSDetection_data\N04.csvThen, when prompted, choose either the SVM or NN. The model will then predict if the person has ALS or not.
- All data is stored in the
ALSDetection_data/directory. - Files are named
A01.csv-A11.csv(ALS) andN01.csv-N11.csv(Normal). - Each file contains rows of comma-separated acoustic features.
- Neural Network (MLPClassifier):
- Accuracy: ~94.6%
- Sensitivity: ~98.5%
- Specificity: ~84.9%
- SVM:
- Accuracy: ~96.5%
- Sensitivity: ~99.8%
- Specificity: ~88.2%
- The approach uses leave-one-out cross-validation for robust evaluation.
- By changing hyperparameters, I was able to optimize both the SVM and the NN to have high accuracy, specificity, and sensitivity.
- Make sure that samples are in the same format as those in the csv's in ALSDetection_data.
- ChatGPT helped me write the code and taught me about the different model types.