Skip to content

fix(mastodon): replace hand-rolled HTML sanitizer (CRITICAL UTF-8 panic)#27

Open
mduongvandinh wants to merge 2 commits intoRightNow-AI:mainfrom
mduongvandinh:fix/mastodon-html-sanitizer
Open

fix(mastodon): replace hand-rolled HTML sanitizer (CRITICAL UTF-8 panic)#27
mduongvandinh wants to merge 2 commits intoRightNow-AI:mainfrom
mduongvandinh:fix/mastodon-html-sanitizer

Conversation

@mduongvandinh
Copy link

Summary

  • Fix CRITICAL bug: strip_html_tags() panics on multi-byte UTF-8 content (emoji, CJK)
  • Replace hand-rolled HTML entity decoder with html-escape crate for comprehensive entity support

Bug Detail

The old code used html[result.len()..] byte indexing on a char-iterated string. When result contained multi-byte chars (e.g. emoji 🦀 = 4 bytes), result.len() returned a byte offset that didn't correspond to the current position in html, causing index-out-of-bounds panic.

Changes

  • Replace strip_html_tags() with safe char-based state machine (no source string back-indexing)
  • Add html-escape = "0.2" dependency for proper entity decoding (named + numeric + hex)
  • Add block-level tag support for </div> and </li> (in addition to existing <br>, </p>)
  • Add 4 new tests: emoji, CJK, numeric entities, basic tags

Test plan

  • All 15 mastodon tests pass (cargo test -p openfang-channels)
  • Clippy clean (cargo clippy -p openfang-channels --all-targets -- -D warnings)
  • No regressions — all pre-existing tests still pass

Files changed

  • crates/openfang-channels/src/mastodon.rs (+54, -21)
  • crates/openfang-channels/Cargo.toml (+1)

The previous strip_html_tags() used html[result.len()..] byte indexing
on a char-iterated string, causing panics on multi-byte UTF-8 content
(emoji, CJK). Replace with a safe char-based tag stripper and use
html_escape::decode_html_entities for comprehensive entity decoding
including numeric entities (&#8217;, &#x2019;).
Move html-escape from direct crate dependency to workspace-level
declaration in root Cargo.toml, per CONTRIBUTING.md convention.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant