Skip to content

reversegremlin/threads-scrape-chrome

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Threads Scraper

A Python script that scrapes posts and replies from a Threads.net account and saves them to PDF, JSON, or TXT format, including images from posts.

Features

  • Scrapes posts and replies from any public Threads account
  • Downloads and includes post images in the PDF output
  • Formats content with timestamps, text, and engagement stats
  • Multiple output formats (PDF, JSON, TXT)
  • Robust error handling with automatic fallback options
  • Chromium/Chrome support with automatic browser detection
  • Customizable scroll depth for controlling how many posts to fetch

Requirements

  • Python 3.7+
  • Chrome or Chromium browser
  • ChromeDriver (installed automatically)

Installation

  1. Clone this repository:
git clone https://github.com/yourusername/threads-scraper.git
cd threads-scraper
  1. Install the required dependencies:
pip install -r requirements.txt

Usage

Basic Usage

Scrape a Threads account:

python threads_scraper_fixed.py --username USERNAME

Command-line Arguments

Argument Description Default
--username Threads username to scrape (without @ symbol) Required
--output-dir Directory to save output files output
--max-scrolls Maximum number of scrolls to perform 10
--skip-replies Skip scraping replies False
--skip-posts Skip scraping posts False
--output-format Output format: pdf, json, or txt pdf

Examples

Scrape with more posts:

python threads_scraper_fixed.py --username USERNAME --max-scrolls 20

Only scrape replies:

python threads_scraper_fixed.py --username USERNAME --skip-posts

Save as JSON:

python threads_scraper_fixed.py --username USERNAME --output-format json

Save as plain text:

python threads_scraper_fixed.py --username USERNAME --output-format txt

Browser Setup

Chrome Installation

Ubuntu/Debian:

wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
sudo apt update
sudo apt install -y google-chrome-stable

Newer Debian versions:

wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo gpg --dearmor -o /usr/share/keyrings/google-chrome.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/google-chrome.gpg] http://dl.google.com/linux/chrome/deb/ stable main" | sudo tee /etc/apt/sources.list.d/google-chrome.list
sudo apt update
sudo apt install -y google-chrome-stable

Chromium Installation

Ubuntu/Debian:

sudo apt update
sudo apt install -y chromium-browser

Fedora:

sudo dnf install -y chromium

Browser Path Configuration

If the script can't find your browser automatically, set the CHROME_DRIVER_PATH environment variable:

export CHROME_DRIVER_PATH=/usr/bin/chromium

Or run in one line:

CHROME_DRIVER_PATH=/usr/bin/chromium python threads_scraper_fixed.py --username USERNAME

Troubleshooting

Browser Detection Issues

If you see "Could not initialize WebDriver" errors:

  1. Verify Chrome/Chromium installation:
which google-chrome
# or
which chromium-browser
  1. Set the browser path:
export CHROME_DRIVER_PATH=/path/to/browser

PDF Generation Issues

If the PDF output is unreadable:

  1. Try a different output format:
python threads_scraper_fixed.py --username USERNAME --output-format txt
  1. Reduce the number of posts:
python threads_scraper_fixed.py --username USERNAME --max-scrolls 5
  1. Check the debug files:
  • debug_False_page.png: Screenshot of posts page
  • debug_True_page.png: Screenshot of replies page
  • debug_False_page.html: HTML source of posts page
  • debug_True_page.html: HTML source of replies page

Limitations

  • Only works with public Threads accounts
  • May be affected by Threads website changes
  • Subject to rate limiting and anti-scraping measures
  • PDF generation may fail with very large amounts of content

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer

This tool is for educational purposes only. Be sure to comply with Threads' terms of service and respect rate limits when using this script.

About

I needed a thing to dump my Threads to PDF

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published