Threads Scraper

A Python script that scrapes posts and replies from a Threads.net account and saves them to PDF, JSON, or TXT format, including images from posts.

Features

Scrapes posts and replies from any public Threads account
Downloads and includes post images in the PDF output
Formats content with timestamps, text, and engagement stats
Multiple output formats (PDF, JSON, TXT)
Robust error handling with automatic fallback options
Chromium/Chrome support with automatic browser detection
Customizable scroll depth for controlling how many posts to fetch

Requirements

Python 3.7+
Chrome or Chromium browser
ChromeDriver (installed automatically)

Installation

Clone this repository:

git clone https://github.com/yourusername/threads-scraper.git
cd threads-scraper

Install the required dependencies:

pip install -r requirements.txt

Usage

Basic Usage

Scrape a Threads account:

python threads_scraper_fixed.py --username USERNAME

Command-line Arguments

Argument	Description	Default
`--username`	Threads username to scrape (without @ symbol)	Required
`--output-dir`	Directory to save output files	`output`
`--max-scrolls`	Maximum number of scrolls to perform	`10`
`--skip-replies`	Skip scraping replies	`False`
`--skip-posts`	Skip scraping posts	`False`
`--output-format`	Output format: `pdf`, `json`, or `txt`	`pdf`

Examples

Scrape with more posts:

python threads_scraper_fixed.py --username USERNAME --max-scrolls 20

Only scrape replies:

python threads_scraper_fixed.py --username USERNAME --skip-posts

Save as JSON:

python threads_scraper_fixed.py --username USERNAME --output-format json

Save as plain text:

python threads_scraper_fixed.py --username USERNAME --output-format txt

Browser Setup

Chrome Installation

Ubuntu/Debian:

wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
sudo apt update
sudo apt install -y google-chrome-stable

Newer Debian versions:

wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo gpg --dearmor -o /usr/share/keyrings/google-chrome.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/google-chrome.gpg] http://dl.google.com/linux/chrome/deb/ stable main" | sudo tee /etc/apt/sources.list.d/google-chrome.list
sudo apt update
sudo apt install -y google-chrome-stable

Chromium Installation

Ubuntu/Debian:

sudo apt update
sudo apt install -y chromium-browser

Fedora:

sudo dnf install -y chromium

Browser Path Configuration

If the script can't find your browser automatically, set the CHROME_DRIVER_PATH environment variable:

export CHROME_DRIVER_PATH=/usr/bin/chromium

Or run in one line:

CHROME_DRIVER_PATH=/usr/bin/chromium python threads_scraper_fixed.py --username USERNAME

Troubleshooting

Browser Detection Issues

If you see "Could not initialize WebDriver" errors:

Verify Chrome/Chromium installation:

which google-chrome
# or
which chromium-browser

Set the browser path:

export CHROME_DRIVER_PATH=/path/to/browser

PDF Generation Issues

If the PDF output is unreadable:

Try a different output format:

python threads_scraper_fixed.py --username USERNAME --output-format txt

Reduce the number of posts:

python threads_scraper_fixed.py --username USERNAME --max-scrolls 5

Check the debug files:

debug_False_page.png: Screenshot of posts page
debug_True_page.png: Screenshot of replies page
debug_False_page.html: HTML source of posts page
debug_True_page.html: HTML source of replies page

Limitations

Only works with public Threads accounts
May be affected by Threads website changes
Subject to rate limiting and anti-scraping measures
PDF generation may fail with very large amounts of content

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer

This tool is for educational purposes only. Be sure to comply with Threads' terms of service and respect rate limits when using this script.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.env		.env
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_scraper.sh		run_scraper.sh
threads_scraper.py		threads_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Threads Scraper

Features

Requirements

Installation

Usage

Basic Usage

Command-line Arguments

Examples

Browser Setup

Chrome Installation

Ubuntu/Debian:

Newer Debian versions:

Chromium Installation

Ubuntu/Debian:

Fedora:

Browser Path Configuration

Troubleshooting

Browser Detection Issues

PDF Generation Issues

Limitations

Contributing

License

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Languages

reversegremlin/threads-scrape-chrome

Folders and files

Latest commit

History

Repository files navigation

Threads Scraper

Features

Requirements

Installation

Usage

Basic Usage

Command-line Arguments

Examples

Browser Setup

Chrome Installation

Ubuntu/Debian:

Newer Debian versions:

Chromium Installation

Ubuntu/Debian:

Fedora:

Browser Path Configuration

Troubleshooting

Browser Detection Issues

PDF Generation Issues

Limitations

Contributing

License

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages