Skip to content

urielv1/email-classification-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

email-classification-models

https://github.com/UrielV1/email-classification-models/blob/master/ham_spam

Project objective

The goal of this project is to build a model which classifies emails as ham or spam

Project description

  • A dataset of 5574 emalis and 2 features (text, label) is given
  • The text was cleaned by removing punctuation and stopwords
  • The clean text was vectorized using TfidfVectorizer
  • Classic models were built such as: LogR, RF, GB
  • Accuracy, Precision and Recall measures were calculated
  • Resampling technique was applied to balance the data
  • Basic RNN model (sequential) was built
  • Dimensionality reduction was executed using PCA method

Conclusions

  • The highest scores were achieved for Random forest model with: acc = 0.976, pre = 0.972, rec = 1
  • The results got improved a bit using the resampling method, but computation time was increased
  • Models with high acc\pre but low rec are not relevant for this type of project
  • The fastest model was Logistic regression with very good results: acc = 0.95, pre = 0.95, rec = 0.995
  • Using PCA method to reduce dimensionality we got very nice results and less overfitting problem

The most accurate model for this dataset was RF, and the fastest was LogR

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors