CS470 Final Project | Aiden Seay - Spring 2024

Problem

This program running in a Jupyter Notebook uses different ML algorithms to classify whether or not an email is spam.

About the Dataset

All data can be found in the Data directory

You can find the dataset here.

The dataset will be split into the following

Training (80%) Test (20%)
Inside Training complete 5-cross validation (80% Training 20% Evaluation)

Algorithms Implemented

All algorithms can be found in the Utilities directory

Each algorithm implemented is below:

Naive Bayes
Logistic Regression
KNN

Measuring Performance

Performance will be measured by the following:

Accuracy (acc): The ratio of correctly predicted observations to the total observations.
False Positive (fp): The number of negative instances wrongly predicted as positive.
True Positive (tp): The number of positive instances correctly predicted as positive.
Area Under the Curve (AUC): The area under the Receiver Operating Characteristic (ROC) curve, which shows the trade-off between sensitivity (true positive rate) and specificity (true negative rate).

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Data		Data
Sample		Sample
Utilities		Utilities
CS 470 Final Project Report.pdf		CS 470 Final Project Report.pdf
EmailClassifier.ipynb		EmailClassifier.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS470 Final Project | Aiden Seay - Spring 2024

Problem

About the Dataset

Algorithms Implemented

Measuring Performance

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CS470 Final Project | Aiden Seay - Spring 2024

Problem

About the Dataset

Algorithms Implemented

Measuring Performance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages