getsitemap is a Python library that retrieves all of the URLs that are found in all of the sitemaps on a website.
This project may be useful if you are building a search crawler or sitemap URL status code validators.
You can read the documentation for this project on Read the Docs.
To get started, pip install getsitemap:
pip install getsitemap
import getsitemap
urls = getsitemap.get_individual_sitemap("https://jamesg.blog/sitemap.xml")
print(urls)import getsitemap
all_urls = getsitemap.retrieve_sitemap_urls("https://sitemap")
print(all_urls)This library uses tox, pytest, and flake8 to assure code quality.
To run code quality checks, run the following command:
toxThis project is licensed under an MIT License.
We would love to have your help in improving [getsitemap]{.title-ref}. Have an idea for a new feature or a bug to fix? Leave information in a GitHub Issue to start a discussion!
If you have
- capjamesg