M4R Project
This repository only contains the code. So training data, collected data, trained model weights etc will not be included.
Contents
- Data Harvesting
-
tweet_harvester.pyHarvests tweets based on search terms or tweet ids.
-
user_harvester.pyHarvests tweets from specific users or user account data (specifically to fill out retweet and reply networks - i.e. to be able to perform account level detection).
-
full_text_tokeniser.pyTokenises the tweets.
- Bot Detection Methods
-
feature_selection.pyPerforms feature selection techniques, including RFC feature importances, recursive feature elimination, and ANOVA.
-
account_level_detection.pyTrains account level detection model: compares models and resampling techniques. Also applies account level detection model to collected datasets.
-
tweet_level_detection.pyTrains tweet level detection: compares models.
- Application to Election Data
-
data_exploration.pyExplores summary statistics of the Georgia and US datasets, plots distributions of datasets, compares to training dataset.
-
sentiment_analysis.pyPerforms VADER sentiment analysis: distribution and plots over time.
-
reply_network.pyBuilds simple and more complicated reply networks for the Georgia dataset.
-
hashtag_cooccurrence_network.pyBuilds a hashtag co-occurrence network that can be plotted in Gephi.