A Python script that scrapes posts and replies from a Threads.net account and saves them to PDF, JSON, or TXT format, including images from posts.
- Scrapes posts and replies from any public Threads account
- Downloads and includes post images in the PDF output
- Formats content with timestamps, text, and engagement stats
- Multiple output formats (PDF, JSON, TXT)
- Robust error handling with automatic fallback options
- Chromium/Chrome support with automatic browser detection
- Customizable scroll depth for controlling how many posts to fetch
- Python 3.7+
- Chrome or Chromium browser
- ChromeDriver (installed automatically)
- Clone this repository:
git clone https://github.com/yourusername/threads-scraper.git
cd threads-scraper- Install the required dependencies:
pip install -r requirements.txtScrape a Threads account:
python threads_scraper_fixed.py --username USERNAME| Argument | Description | Default |
|---|---|---|
--username |
Threads username to scrape (without @ symbol) | Required |
--output-dir |
Directory to save output files | output |
--max-scrolls |
Maximum number of scrolls to perform | 10 |
--skip-replies |
Skip scraping replies | False |
--skip-posts |
Skip scraping posts | False |
--output-format |
Output format: pdf, json, or txt |
pdf |
Scrape with more posts:
python threads_scraper_fixed.py --username USERNAME --max-scrolls 20Only scrape replies:
python threads_scraper_fixed.py --username USERNAME --skip-postsSave as JSON:
python threads_scraper_fixed.py --username USERNAME --output-format jsonSave as plain text:
python threads_scraper_fixed.py --username USERNAME --output-format txtwget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
sudo apt update
sudo apt install -y google-chrome-stablewget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo gpg --dearmor -o /usr/share/keyrings/google-chrome.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/google-chrome.gpg] http://dl.google.com/linux/chrome/deb/ stable main" | sudo tee /etc/apt/sources.list.d/google-chrome.list
sudo apt update
sudo apt install -y google-chrome-stablesudo apt update
sudo apt install -y chromium-browsersudo dnf install -y chromiumIf the script can't find your browser automatically, set the CHROME_DRIVER_PATH environment variable:
export CHROME_DRIVER_PATH=/usr/bin/chromiumOr run in one line:
CHROME_DRIVER_PATH=/usr/bin/chromium python threads_scraper_fixed.py --username USERNAMEIf you see "Could not initialize WebDriver" errors:
- Verify Chrome/Chromium installation:
which google-chrome
# or
which chromium-browser- Set the browser path:
export CHROME_DRIVER_PATH=/path/to/browserIf the PDF output is unreadable:
- Try a different output format:
python threads_scraper_fixed.py --username USERNAME --output-format txt- Reduce the number of posts:
python threads_scraper_fixed.py --username USERNAME --max-scrolls 5- Check the debug files:
debug_False_page.png: Screenshot of posts pagedebug_True_page.png: Screenshot of replies pagedebug_False_page.html: HTML source of posts pagedebug_True_page.html: HTML source of replies page
- Only works with public Threads accounts
- May be affected by Threads website changes
- Subject to rate limiting and anti-scraping measures
- PDF generation may fail with very large amounts of content
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
This tool is for educational purposes only. Be sure to comply with Threads' terms of service and respect rate limits when using this script.