Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
readme.md		readme.md

Repository files navigation

Blazingly Fast HTML Content Extractor

For future use in the slopless search engine project.

goals

Extract main content from HTML pages with high accuracy
Normalize encoding to UTF-8 and unescape HTML entities
Split content into semantically meaningful chunks
Detect and handle different languages

things to try

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

Contributors

Languages

Rust 100.0%