Skip to content

ubikcan/rss-reader-old

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub-Hosted RSS Feed Aggregator

This repository scaffolds an automated RSS aggregator that fetches public RSS feeds and (later) scrapes sites without feeds. Use scripts/generate_feeds.py to fetch RSS sources defined in feeds.yaml and write output XML files to output/.

Quick start

  1. Create a Python virtual environment and install dependencies:
python -m venv .venv
# on Windows:
.venv\Scripts\activate
pip install -r requirements.txt
  1. Run the generator:
python scripts/generate_feeds.py

Next steps: add site scraping logic, a GitHub Actions workflow, and a GitHub Pages landing page.

Manual scraping

You can configure manual scraping for sources that do not publish RSS. In feeds.yaml set scrape: true for the source and provide selectors (CSS selectors) to locate items. See the commented example at the bottom of feeds.yaml.

Selectors example keys:

  • list: CSS selector matching each article container (required)
  • title: selector for title within the container
  • link: selector for link element (href will be joined to the source URL)
  • content: selector for content/teaser text
  • date: selector for published date text

When scraping is enabled the generator will use requests + BeautifulSoup to extract items and emit an RSS XML file for that source.

About

RSS Reader

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors