Twitter Sentiment Analysis Tool
This tool consumes the Twitter Streaming API according to predifined keywords, determines the sentiment of incoming tweets, and allows a user to view statistics regarding the sentiment of the collected tweets.
The application is built from 5 components:
- Producer - subscribes to Twitter stream API and publishes incoming tweets to a message broker.
- Consumer - recives tweets from the message broker and determines their sentiment using TextBlob, the tweets are saved in a DB after being processed.
- Message broker - RabbitMQ.
- DB - MongoDB.
- SAT-API - simple flask REST-API which allows querying the DB.
Twitter API key and token as twiter-api-credentials.json file in config directory.
{
"consumer-key":{
"key":"your-key",
"secret":"your-secret"
},
"access-token":{
"token":"your-access-token",
"secret":"your-token-secret"
}
}Set keywords, hashtags and handles in config/person-config.json. this will
determine which tweets will be pushed from the streaming API, the default is:
{
"keywords":[
"Donald Trump",
"Trump",
"#potus",
"@therealdonalntrump"
]
}Since ther are two ways to deploy the app, the prerequisites depend on the deployment way of choice:
- Docker 18.*
- Docker-Compose 1.20.*
- Python 3
- pip
- MongoDB
- RabbitMQ
- open a terminal in the project's directory
- in the terminal enter:
docker-compose up --build(sudomaybe required)
The app image will be built and there should be three containers running:
- sat
- mongo
- rabbitmq
- go to
config/hosts.jsonand change"env":"docker"to"env":"local"(hosts differ bewteen Docker and local deployement) - open a terminal in the project's directory and run the following commands (
sudomaybe required for pip):
pip install -r requirements.txt
python -m textblob.download_corpora
python producer.py
python consumer.py
python sat-api.py
open your browser and enter the url http://localhost:5000/distribution, you should get the a response
of the following structure, with tweet distribution by sentiment and date ranges (last hour, day, and week):
{
"from": "Fri, 06 Apr 2018 12:55:52 GMT",
"to": "Fri, 06 Apr 2018 13:55:52 GMT",
"tweet-distribution": [
{
"_id": "negative",
"count": 2211
},
{
"_id": "positive",
"count": 2966
},
{
"_id": "neutral",
"count": 4062
}, ...
]
}you can also use http://localhost:5000/distribution/byspan?span=W to get data
of tweets in range of hour, day or week (H,D,W).
- Front-end was left out, but should be quite trivial using a modern js framework.
- Python was chosen due to low development overhead
- For a bigger project
- allow keyword configuration in runtime per user
- fine tune message broker
- DB authentication