Skip to content

capjamesg/getsitemap

Repository files navigation

getsitemap

Documentation Status image image image image

getsitemap is a Python library that retrieves all of the URLs that are found in all of the sitemaps on a website.

This project may be useful if you are building a search crawler or sitemap URL status code validators.

You can read the documentation for this project on Read the Docs.

Installation 💻

To get started, pip install getsitemap:

pip install getsitemap

Quickstart ⚡

Get all URLs recursively in all sitemaps

import getsitemap

urls = getsitemap.get_individual_sitemap("https://jamesg.blog/sitemap.xml")

print(urls)

Get all URLs in a single sitemap

import getsitemap

all_urls = getsitemap.retrieve_sitemap_urls("https://sitemap")

print(all_urls)

Code Quality

This library uses tox, pytest, and flake8 to assure code quality.

To run code quality checks, run the following command:

tox

License 👩

This project is licensed under an MIT License.

Contributing 🛠️

We would love to have your help in improving [getsitemap]{.title-ref}. Have an idea for a new feature or a bug to fix? Leave information in a GitHub Issue to start a discussion!

If you have

Contributors 💻

  • capjamesg

About

A Python library that retrieves all URLs in the sitemaps on a website.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors