Emotion Classification from Tweets Using TF-IDF and Logistic Regression
This project focuses on building a multi-class emotion classifier for English-language tweets using a lightweight, interpretable NLP pipeline.
Instead of relying on deep learning or transformer-based models, this experiment uses TF-IDF vectorization and Logistic Regression to predict the emotion expressed in a tweet. It's simple, fast, and surprisingly effective — showing that classical methods still have strong value in NLP.
- Detect the emotion behind a tweet (joy, sadness, anger, fear, love, or surprise)
- Use traditional NLP tools to keep the model interpretable and low-resource
- Serve as a baseline for future experimentation with deep learning
We use the mteb/emotion dataset from Hugging Face, which includes:
- ~20,000 tweets
- Six emotion labels:
joy,sadness,anger,fear,love,surprise - Pre-split into train, validation, and test sets
-
Preprocessing
Lowercasing, punctuation and stopword removal -
Feature Engineering
Text → TF-IDF representation -
Model Training
Logistic Regression classifier -
Evaluation
Accuracy, F1-score, and a confusion matrix -
Visualization
Emotion label distribution and confusion matrix heatmap
- Achieved strong performance using only classical tools
- Model is fast to train, runs on CPUs, and is easy to understand
- Demonstrates that traditional NLP methods are still useful and relevant
- A solid starting point for more advanced models like BERT or RoBERTa
- Clone the Repository
git clone https://github.com/isjustabhi/Emotion-Classification-Tweets.git
cd Emotion-Classification-Tweets