-
Notifications
You must be signed in to change notification settings - Fork 301
Fix HTML entity encoding bypass in @mention sanitization #15014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,137 @@ | ||||||
| # HTML Entity Encoding Bypass Fix for @mention Sanitization | ||||||
|
|
||||||
| ## Problem | ||||||
|
|
||||||
| The safe-outputs sanitization system had a vulnerability where HTML entities could bypass @mention detection. If entities were decoded after @mention neutralization, an attacker could use entity-encoded @ symbols to trigger unwanted user notifications. | ||||||
|
|
||||||
| ### Attack Vectors | ||||||
|
|
||||||
| 1. Named entity: `@user` → `@user` | ||||||
| 2. Decimal entity: `@user` → `@user` | ||||||
| 3. Hexadecimal entity: `@user` or `@user` → `@user` | ||||||
| 4. Double-encoded: `@user`, `@user`, `@user` → `@user` | ||||||
| 5. Mixed encoding: `@user` → `@user` | ||||||
| 6. Fully encoded: `@user` → `@user` | ||||||
|
|
||||||
| ## Solution | ||||||
|
|
||||||
| Added `decodeHtmlEntities()` function in `actions/setup/js/sanitize_content_core.cjs` that: | ||||||
|
|
||||||
| 1. **Decodes named entities**: `@` → `@` (case-insensitive) | ||||||
| 2. **Decodes decimal entities**: `&#NNN;` → corresponding Unicode character | ||||||
| 3. **Decodes hexadecimal entities**: `&#xHHH;` or `&#XHHH;` → corresponding Unicode character | ||||||
| 4. **Handles double-encoding**: `@`, `@`, `@` | ||||||
| 5. **Validates code points**: Only accepts valid Unicode range (0x0 - 0x10FFFF) | ||||||
|
|
||||||
| ### Integration | ||||||
|
|
||||||
| The `decodeHtmlEntities()` function is integrated into `hardenUnicodeText()` at **Step 2**, ensuring HTML entities are decoded **before** @mention detection occurs: | ||||||
|
|
||||||
| ```javascript | ||||||
| function hardenUnicodeText(text) { | ||||||
| // Step 1: Normalize Unicode (NFC) | ||||||
| result = result.normalize("NFC"); | ||||||
|
|
||||||
| // Step 2: Decode HTML entities (CRITICAL - must be early) | ||||||
| result = decodeHtmlEntities(result); | ||||||
|
|
||||||
| // Step 3: Strip zero-width characters | ||||||
| // Step 4: Remove bidirectional overrides | ||||||
| // Step 5: Convert full-width ASCII | ||||||
|
|
||||||
| return result; | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| ### Sanitization Pipeline | ||||||
|
|
||||||
| ``` | ||||||
| Input Text | ||||||
| ↓ | ||||||
| hardenUnicodeText() | ||||||
| ├─ Unicode normalization (NFC) | ||||||
| ├─ HTML entity decoding ← decodeHtmlEntities() | ||||||
| ├─ Zero-width character removal | ||||||
| ├─ Bidirectional control removal | ||||||
| └─ Full-width ASCII conversion | ||||||
| ↓ | ||||||
| ANSI escape sequence removal | ||||||
| ↓ | ||||||
| neutralizeMentions() or neutralizeAllMentions() | ||||||
| ↓ | ||||||
| Other sanitization steps | ||||||
| ↓ | ||||||
| Output (safe text) | ||||||
| ``` | ||||||
|
|
||||||
| ## Test Coverage | ||||||
|
|
||||||
| Comprehensive test suite in `actions/setup/js/sanitize_content.test.cjs` covers: | ||||||
|
|
||||||
| - ✅ Named entity decoding (`@`) | ||||||
| - ✅ Double-encoded named entities (`@`) | ||||||
| - ✅ Decimal entity decoding (`@`) | ||||||
| - ✅ Double-encoded decimal entities (`@`) | ||||||
| - ✅ Hexadecimal entity decoding (lowercase `@`, uppercase `@`) | ||||||
| - ✅ Double-encoded hex entities (`@`, `@`) | ||||||
| - ✅ Multiple encoded mentions in one string | ||||||
| - ✅ Mixed encoded and normal mentions | ||||||
| - ✅ Org/team mentions with entities | ||||||
| - ✅ General entity decoding (non-@ characters) | ||||||
| - ✅ Invalid code point handling | ||||||
| - ✅ Malformed entity handling | ||||||
| - ✅ Case-insensitive named entities | ||||||
| - ✅ Interaction with other sanitization steps | ||||||
| - ✅ Allowed aliases with encoded mentions | ||||||
|
|
||||||
| Total: 25+ test cases | ||||||
|
|
||||||
| ## Examples | ||||||
|
|
||||||
| ```javascript | ||||||
| // Named entity | ||||||
| sanitizeContent("@pelikhan") | ||||||
| // → "`@pelikhan`" | ||||||
|
|
||||||
| // Decimal entity | ||||||
| sanitizeContent("@pelikhan") | ||||||
| // → "`@pelikhan`" | ||||||
|
|
||||||
| // Hexadecimal entity | ||||||
| sanitizeContent("@pelikhan") | ||||||
| // → "`@pelikhan`" | ||||||
|
|
||||||
| // Mixed encoding in username | ||||||
| sanitizeContent("@user") | ||||||
| // → "`@user`" | ||||||
|
|
||||||
| // Fully encoded | ||||||
| sanitizeContent("@user") | ||||||
| // → "`@user`" | ||||||
|
|
||||||
| // Double-encoded | ||||||
| sanitizeContent("@pelikhan") | ||||||
| // → "`@pelikhan`" | ||||||
| ``` | ||||||
|
|
||||||
| ## Security Impact | ||||||
|
|
||||||
| - **Risk Level**: MEDIUM → **RESOLVED** | ||||||
| - **Attack Surface**: Entity-encoded @ symbols could bypass mention detection | ||||||
| - **Fix**: All HTML entity encoding variants now decoded before @mention processing | ||||||
| - **Coverage**: Universal - applies to both `sanitizeContent()` and `sanitizeIncomingText()` | ||||||
|
|
||||||
| ## Files Modified | ||||||
|
|
||||||
| - `actions/setup/js/sanitize_content_core.cjs` - Added `decodeHtmlEntities()` function and integrated into `hardenUnicodeText()` | ||||||
| - `actions/setup/js/sanitize_content.test.cjs` - Added 25+ test cases for HTML entity decoding | ||||||
| - Exported `decodeHtmlEntities` from module for potential standalone use | ||||||
|
|
||||||
| ## Defense in Depth | ||||||
|
|
||||||
| This fix follows defense-in-depth principles: | ||||||
| 1. **Early decoding**: Entities decoded at Step 2 of Unicode hardening | ||||||
| 2. **Comprehensive coverage**: Handles all entity types and double-encoding | ||||||
| 3. **Validation**: Rejects invalid Unicode code points | ||||||
|
||||||
| 3. **Validation**: Rejects invalid Unicode code points | |
| 3. **Validation**: Leaves invalid Unicode entities undecoded (preserves original text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
hardenUnicodeText()example snippet is not valid JavaScript as written: it usesresultbefore initialization/declaring it, and it ignores thetextparameter. Since this is documentation meant to illustrate the actual implementation, update the snippet to match the real flow (let result = text;, then normalize/decode, etc.) so readers don’t copy a broken example.