A machine learning project that classifies messages (Email/SMS) as Spam or Not Spam.
The project involves EDA, feature engineering, model building, hyperparameter tuning, and a Streamlit-based UI for interactive testing.
-
Data Preprocessing
- Text normalization (lowercasing, tokenization, removing punctuation & stopwords).
- Stemming using NLTK.
- Feature extraction using TF-IDF and CountVectorizer.
-
Exploratory Data Analysis (EDA)
- Spam vs Ham distribution.
- Word frequency visualization.
- WordCloud generation.
-
Modeling & Evaluation
- Implemented multiple classifiers:
- SVC, K-Nearest Neighbors, Naive Bayes, Decision Tree, Logistic Regression, Random Forest
- AdaBoost, Bagging, Extra Trees
- Compared performance across models.
- Hyperparameter tuning for optimal accuracy.
- Implemented Stacking Classifier for ensemble learning.
- Implemented multiple classifiers:
-
User Interface
- Built with Streamlit.
- User-friendly input box to test custom messages.
- Languages: Python
- Libraries: NLTK, Scikit-learn, XGBoost, Pandas, NumPy, Matplotlib, Seaborn
- Deployment/UI: Streamlit

