Web crawler for Fiverr forums - data collection for observing censorship.
Extracts the following data from the forums: thread_name ,categories, replies, views, total_likes, creation_date, first_reply_date, last_reply_date, review_day, thread_url, frequent_posters, thread_author, thread_likes, thread_text, thread_images, thread_edits, latest_thread_edit_date, reply_author, reply_likes, reply_text, reply_images, reply_date, crawl_date
Required: Geckodriver- make sure it is added to the $PATH
Compile and run the driver.c
gcc driver.c
The Fiverr forums have 6 main categories:
The default execution of the program by using driver.c will scrape only the contents of categories 2-5.
You can alternatively run the program by issuing the following command inside the crawler directory:
python spider.py
There are options to scrape specific forum categories by using:
python spider.py 1 X
This will enable scraping specific threads only where X is one of the main categories listed above.
Example:
python spider.py 1 2
This will scrape and analyze only the COVID-19 Discussions forums.