ScrapeWeb is a GUI-based web scraper built with Electron and Puppeteer for Windows. It allows you to extract blog post titles and URLs using custom CSS selectors, then save the results in .json, .md, .html, or .csv formats.
- 🖥️ Modern cross-platform desktop interface
- 🔍 Scrapes blog titles and links with custom CSS selectors
- 📁 File browser to choose output location
- 🧾 Export to JSON, Markdown, or both
- 🧠 Auto-extracts metadata:
- Page title
- Meta author
- Meta description
og:updated_timeif available
- 📊 Taskbar-integrated progress bar
- 🔃 Supports dynamic JavaScript-rendered pages
- 📌 Timestamped filenames for each scrape session
Clone this repository and install dependencies:
git clone https://github.com/yourusername/ScrapeWeb.git
cd ScrapeWeb
npm installnpm startnpm run distThe resulting .exe will be created in the dist/ folder.
- Launch ScrapeWeb
- Enter the blog page URL
- Input:
- Title Selector — the CSS selector for the blog post title (e.g.,
.post-title,a.headline) - Link Selector — the CSS selector for the anchor or link (e.g.,
a)
- Title Selector — the CSS selector for the blog post title (e.g.,
- Select output format: JSON, Markdown, or both
- Choose an output folder
- Click Scrape Now
- The app will generate a timestamped file in the selected location
For Hacker News Site:
- Title Selector:
h2.home-title - Link Selector:
a.story-link
Markdown Output:
# The Hacker News | #1 Trusted Source for Cybersecurity News
**URL:** https://thehackernews.com/
**Author:**
**Updated:**
**Desc:** The Hacker News is the top cybersecurity news platform, delivering real-time updates, threat intelligence, data breach reports, expert analysis, and actionable insights for infosec professionals and decision-makers.
## Posts
1. [Chinese Smishing Kit Powers Widespread Toll Fraud Campaign Targeting U.S. Users in 8 States](https://thehackernews.com/2025/04/chinese-smishing-kit-behind-widespread.html)
2. [Multi-Stage Malware Attack Uses .JSE and PowerShell to Deploy Agent Tesla and XLoader](https://thehackernews.com/2025/04/multi-stage-malware-attack-uses-jse-and.html)
...JSON Output:
{
"url": "https://...",
"title": "The Hacker News...",
"author": "",
"description": "",
"updated": "",
"posts": [
{ "title": "Post 1", "url": "https://..." },
{ "title": "Post 2", "url": "https://..." }
]
}This project is licensed under the GNU General Public License v3.0 (GPL-3.0).
© 2025 Garrett Spear. Free to use, modify, and distribute under the terms of GPLv3.

