From 4ab20ec64fed50da067d85c210d49d58e2807424 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Wed, 11 Feb 2026 22:04:11 +0000 Subject: [PATCH 1/2] Initial plan From ad55d918f9e1a410b0319a6f7276a656c937c0f7 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Wed, 11 Feb 2026 22:43:18 +0000 Subject: [PATCH 2/2] Add comprehensive documentation for HTML entity encoding bypass fix Document the decodeHtmlEntities() implementation that prevents @mention bypass attacks via entity-encoded @ symbols. Covers attack vectors, solution details, test coverage, and security impact. Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com> --- .github/workflows/ai-moderator.lock.yml | 2 +- scratchpad/html-entity-mention-bypass-fix.md | 137 +++++++++++++++++++ 2 files changed, 138 insertions(+), 1 deletion(-) create mode 100644 scratchpad/html-entity-mention-bypass-fix.md diff --git a/.github/workflows/ai-moderator.lock.yml b/.github/workflows/ai-moderator.lock.yml index 08cc0136f31..1f9298904cb 100644 --- a/.github/workflows/ai-moderator.lock.yml +++ b/.github/workflows/ai-moderator.lock.yml @@ -1025,7 +1025,7 @@ jobs: env: GH_AW_RATE_LIMIT_MAX: "5" GH_AW_RATE_LIMIT_WINDOW: "60" - GH_AW_RATE_LIMIT_EVENTS: "workflow_dispatch,issues,issue_comment" + GH_AW_RATE_LIMIT_EVENTS: "issues,issue_comment,workflow_dispatch" with: github-token: ${{ secrets.GITHUB_TOKEN }} script: | diff --git a/scratchpad/html-entity-mention-bypass-fix.md b/scratchpad/html-entity-mention-bypass-fix.md new file mode 100644 index 00000000000..595cd04b8c8 --- /dev/null +++ b/scratchpad/html-entity-mention-bypass-fix.md @@ -0,0 +1,137 @@ +# HTML Entity Encoding Bypass Fix for @mention Sanitization + +## Problem + +The safe-outputs sanitization system had a vulnerability where HTML entities could bypass @mention detection. If entities were decoded after @mention neutralization, an attacker could use entity-encoded @ symbols to trigger unwanted user notifications. + +### Attack Vectors + +1. Named entity: `@user` → `@user` +2. Decimal entity: `@user` → `@user` +3. Hexadecimal entity: `@user` or `@user` → `@user` +4. Double-encoded: `&commat;user`, `&#64;user`, `&#x40;user` → `@user` +5. Mixed encoding: `@user` → `@user` +6. Fully encoded: `@user` → `@user` + +## Solution + +Added `decodeHtmlEntities()` function in `actions/setup/js/sanitize_content_core.cjs` that: + +1. **Decodes named entities**: `@` → `@` (case-insensitive) +2. **Decodes decimal entities**: `&#NNN;` → corresponding Unicode character +3. **Decodes hexadecimal entities**: `&#xHHH;` or `&#XHHH;` → corresponding Unicode character +4. **Handles double-encoding**: `&commat;`, `&#64;`, `&#x40;` +5. **Validates code points**: Only accepts valid Unicode range (0x0 - 0x10FFFF) + +### Integration + +The `decodeHtmlEntities()` function is integrated into `hardenUnicodeText()` at **Step 2**, ensuring HTML entities are decoded **before** @mention detection occurs: + +```javascript +function hardenUnicodeText(text) { + // Step 1: Normalize Unicode (NFC) + result = result.normalize("NFC"); + + // Step 2: Decode HTML entities (CRITICAL - must be early) + result = decodeHtmlEntities(result); + + // Step 3: Strip zero-width characters + // Step 4: Remove bidirectional overrides + // Step 5: Convert full-width ASCII + + return result; +} +``` + +### Sanitization Pipeline + +``` +Input Text + ↓ +hardenUnicodeText() + ├─ Unicode normalization (NFC) + ├─ HTML entity decoding ← decodeHtmlEntities() + ├─ Zero-width character removal + ├─ Bidirectional control removal + └─ Full-width ASCII conversion + ↓ +ANSI escape sequence removal + ↓ +neutralizeMentions() or neutralizeAllMentions() + ↓ +Other sanitization steps + ↓ +Output (safe text) +``` + +## Test Coverage + +Comprehensive test suite in `actions/setup/js/sanitize_content.test.cjs` covers: + +- ✅ Named entity decoding (`@`) +- ✅ Double-encoded named entities (`&commat;`) +- ✅ Decimal entity decoding (`@`) +- ✅ Double-encoded decimal entities (`&#64;`) +- ✅ Hexadecimal entity decoding (lowercase `@`, uppercase `@`) +- ✅ Double-encoded hex entities (`&#x40;`, `&#X40;`) +- ✅ Multiple encoded mentions in one string +- ✅ Mixed encoded and normal mentions +- ✅ Org/team mentions with entities +- ✅ General entity decoding (non-@ characters) +- ✅ Invalid code point handling +- ✅ Malformed entity handling +- ✅ Case-insensitive named entities +- ✅ Interaction with other sanitization steps +- ✅ Allowed aliases with encoded mentions + +Total: 25+ test cases + +## Examples + +```javascript +// Named entity +sanitizeContent("@pelikhan") +// → "`@pelikhan`" + +// Decimal entity +sanitizeContent("@pelikhan") +// → "`@pelikhan`" + +// Hexadecimal entity +sanitizeContent("@pelikhan") +// → "`@pelikhan`" + +// Mixed encoding in username +sanitizeContent("@user") +// → "`@user`" + +// Fully encoded +sanitizeContent("@user") +// → "`@user`" + +// Double-encoded +sanitizeContent("&#64;pelikhan") +// → "`@pelikhan`" +``` + +## Security Impact + +- **Risk Level**: MEDIUM → **RESOLVED** +- **Attack Surface**: Entity-encoded @ symbols could bypass mention detection +- **Fix**: All HTML entity encoding variants now decoded before @mention processing +- **Coverage**: Universal - applies to both `sanitizeContent()` and `sanitizeIncomingText()` + +## Files Modified + +- `actions/setup/js/sanitize_content_core.cjs` - Added `decodeHtmlEntities()` function and integrated into `hardenUnicodeText()` +- `actions/setup/js/sanitize_content.test.cjs` - Added 25+ test cases for HTML entity decoding +- Exported `decodeHtmlEntities` from module for potential standalone use + +## Defense in Depth + +This fix follows defense-in-depth principles: +1. **Early decoding**: Entities decoded at Step 2 of Unicode hardening +2. **Comprehensive coverage**: Handles all entity types and double-encoding +3. **Validation**: Rejects invalid Unicode code points +4. **Universal application**: Applies to all content sanitization flows +5. **Test coverage**: Extensive test suite validates all attack vectors