Add a new web source type, which is able to:
- Accept a valid URL,
- Fetch the page using python packages like
trafilatura or readability-lxml,
- Strip the boilerplate,
- Ingest the cleaned text.
This would enable companies to ingest their own content from their blog posts, documentation sites, articles, etc.
Add a new
websource type, which is able to:trafilaturaorreadability-lxml,This would enable companies to ingest their own content from their blog posts, documentation sites, articles, etc.