This project performs Sentiment Analysis on Twitter Data using Machine Learning techniques. It classifies tweets as positive or negative based on their content. The dataset used is Sentiment140, which contains labeled tweets.
- Preprocessing: Cleans tweets by removing URLs, mentions, hashtags, and special characters.
- Feature Engineering: Uses TF-IDF Vectorization to convert text into numerical form.
- Model Training: Implements Logistic Regression for classification.
- Evaluation: Assesses model performance using accuracy, precision, recall, and F1-score.
- Python
- Pandas
- NumPy
- NLTK
- Scikit-learn
The dataset used is Sentiment140, which contains 1.6 million labeled tweets:
0: Negative sentiment1: Positive sentiment- The dataset includes tweet text, polarity, and metadata
-
Clone the repository:
git clone https://github.com/your-username/twitter-sentiment-analysis.git cd twitter-sentiment-analysis -
Download the dataset and place it in the project directory.
- Preprocess the data
python preprocess.py
- Train the model
python train.py
- Evaluate the model
python evaluate.py
- Make predictions
python predict.py --text "I love this product!"
- Accuracy: ~85%
- F1-score: High performance on both positive and negative tweets.
- Add deep learning models (LSTMs, Transformers)
- Implement real-time Twitter API integration
- Enhance text preprocessing with advanced NLP techniques