fix: clean email formatting for Discord messages by riderx · Pull Request #1441 · Cap-go/capgo

riderx · 2026-01-15T00:00:13Z

Summary

Fixes email-to-Discord transformation to remove MIME artifacts and properly handle attachments. Emails now display clean text without boundaries, headers, and HTML tags. Also prevents infinite recursion in nested multipart email parsing.

Changes

Strip MIME boundaries (--xxxxx) and Content-* headers from email bodies
Decode HTML entities and properly convert HTML to text
Extract boundary from Content-Type header for accurate MIME parsing
Detect inline images with Content-ID headers (Gmail)
Add recursion depth limit (max 10) to prevent stack overflow

Test plan

Send email with multipart/mixed content (text + HTML + attachment)
Verify Discord message shows clean text without MIME artifacts
Verify attached images appear in Discord thread
Test reply handling for email threads

Checklist

Code builds without errors
Requires manual testing with live email flow

Summary by CodeRabbit

New Features
- Broader attachment handling including audio, video and application files; improved inline image handling.
- Better handling of encoded subjects and HTML entities for accurate display.
Bug Fixes
- Cleaner email bodies: removes MIME boundaries, headers and leaked HTML; collapses excess whitespace.
- More reliable multipart parsing and boundary detection with added guardrails and parsing robustness.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…achment detection - Extract boundary from Content-Type header for proper MIME parsing - Add cleanEmailBody() to strip MIME boundaries, headers, and HTML tags - Decode HTML entities in email body text - Improve inline image detection with Content-ID support - Add RFC 2047 decoding for encoded subject lines Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Pass nested boundary explicitly to recursive calls - Add depth limit (max 10) to prevent stack overflow - Fixes RangeError: Maximum call stack size exceeded Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai · 2026-01-15T00:00:29Z

Caution

Review failed

The pull request is closed.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds RFC2047 subject decoding, header-derived boundary-aware multipart parsing with recursion limits and richer attachment detection, plus new HTML-entity decoding and MIME-artifact cleaning used when formatting email bodies for Discord.

Changes

Cohort / File(s)	Summary
Email parser core `cloudflare_workers/email/email-parser.ts`	Adds RFC2047 subject decoding, header boundary extraction, charset-aware/base64 decoding helpers, boundary-aware `parseEmailBodyAndAttachments(rawEmail, headerBoundary?, depth)` with max-depth guard, enhanced multipart and attachment detection, and additional parsing/logging.
Discord formatting & sanitization `cloudflare_workers/email/discord.ts`	Adds `cleanEmailBody` and `decodeHtmlEntities`; derive bodyText robustly from text/html, detect MIME boundaries, strip HTML when needed, clean MIME headers/boundaries, decode entities, and truncate/fallback for empty messages; integrates helpers into post/format flows.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Raw Email Source
    participant Parser as Email Parser
    participant Boundary as Boundary Extractor
    participant Multipart as Multipart Handler
    participant Formatter as Discord Formatter

    Client->>Parser: submit rawEmail
    Parser->>Boundary: parse Content-Type header
    Boundary-->>Parser: headerBoundary
    Parser->>Parser: decode RFC2047 subject
    Parser->>Multipart: parse body with headerBoundary, depth=0
    Multipart->>Multipart: split parts, detect nested boundaries
    Multipart-->>Parser: text/html, text/plain, attachments
    Parser->>Formatter: body parts + attachments
    Formatter->>Formatter: select text/html vs text/plain, detect MIME artifacts
    Formatter->>Formatter: cleanEmailBody + decodeHtmlEntities
    Formatter->>External: post formatted message to Discord

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix: improve email encoding and HTML rendering #1435: Modifies overlapping email parsing and HTML-to-text decoding paths (charset-aware/base64 decoding and Turndown/entity decoding), likely touching the same code areas.

Poem

🐰
Boundaries found and headers unspun,
Encoded subjects now read in the sun,
MIME bits scrubbed and entities freed,
Threads sleep tidy — a rabbit's good deed. 🥕✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: cleaning email formatting for Discord messages by removing MIME artifacts.
Description check	✅ Passed	The description covers the main objectives, includes a detailed changes section, provides a test plan with steps, and has a checklist with relevant items completed.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68e804b and 0e36412.

📒 Files selected for processing (2)

cloudflare_workers/email/discord.ts
cloudflare_workers/email/email-parser.ts

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 68e804b03b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-15T00:02:06Z

+      else if (encoding.toUpperCase() === 'Q') {
+        // Quoted-printable encoding
+        const decoded = encoded
+          .replace(/_/g, ' ')
+          .replace(/=([0-9A-Fa-f]{2})/g, (_: string, hex: string) => String.fromCharCode(Number.parseInt(hex, 16)))
+        return decoded


Decode Q-encoded RFC2047 subjects as UTF-8 bytes

This Q-decoding branch converts each =XX byte directly with String.fromCharCode, which treats bytes as Unicode code points rather than decoding the byte sequence using the declared charset. For UTF-8 subjects like =?UTF-8?Q?Fran=C3=A7ois?=, this yields mojibake (FranÃ§ois) instead of the intended text. Any RFC2047 Q-encoded subject with multibyte UTF-8 (or other multibyte charsets) will be displayed incorrectly in Discord.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-01-15T00:02:06Z

+  // Strip any remaining HTML tags
+  cleaned = stripTagsSafely(cleaned)
+


Avoid stripping plain-text angle-bracket content

cleanEmailBody always runs stripTagsSafely even when the body came from the plain-text part, so any legitimate text enclosed in <...> (email addresses like <support@example.com>, XML snippets, code, or Markdown) gets removed entirely. Because cleanEmailBody is applied to email.body.text in both thread posts and embeds, plain-text emails can lose meaningful content that previously rendered correctly.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@cloudflare_workers/email/discord.ts`:
- Around line 429-440: The decodeHtmlEntities function can double-unescape
sequences like "&amp;lt;" because &amp; is decoded before other entities; update
decodeHtmlEntities so it decodes numeric/hex entities and named entities like
&lt;, &gt;, &quot;, &#39;, &nbsp; first and perform the &amp; → & replacement
last (i.e., move the .replace(/&amp;/g, '&') to the end), and ensure callers
(stripHtml, cleanEmailBody, stripTagsSafely) continue to call decodeHtmlEntities
only once per processing step to avoid repeated decoding.

In `@cloudflare_workers/email/email-parser.ts`:
- Around line 74-96: The Q-encoding branch in decodeRfc2047 decodes hex escapes
into individual JS code units which breaks multi-byte charsets like UTF-8;
update the Q branch in decodeRfc2047 to parse =HH sequences into a byte array
(preserve non-escaped ASCII bytes), then use TextDecoder with the specified
charset (fallback to 'utf-8') to decode the resulting Uint8Array into a proper
string; ensure charset is normalized (e.g., lowercased) and reuse the existing
decodeBase64Utf8 approach or its charset handling conventions so Q-decoded
multi-byte sequences are decoded correctly for charsets such as 'utf-8'.

🧹 Nitpick comments (2)

cloudflare_workers/email/email-parser.ts (1)
64-69: Consider making the boundary regex more robust.

The current regex may fail to capture boundaries containing certain characters. RFC 2046 allows boundaries to contain characters like =, :, and . which are not matched by [^"\s;]+.
♻️ Suggested improvement
 function extractBoundaryFromHeader(contentType: string): string | undefined {
   if (!contentType) return undefined
 
-  const boundaryMatch = contentType.match(/boundary="?([^"\s;]+)"?/i)
+  // Handle both quoted and unquoted boundaries per RFC 2046
+  const quotedMatch = contentType.match(/boundary="([^"]+)"/i)
+  if (quotedMatch) return quotedMatch[1]
+  
+  const unquotedMatch = contentType.match(/boundary=([^\s;]+)/i)
+  return unquotedMatch?.[1]
-  return boundaryMatch?.[1]
 }
cloudflare_workers/email/discord.ts (1)
290-302: Duplicated MIME detection logic - consider extracting to a helper.

The same MIME boundary detection and fallback logic appears in both postToThread (lines 216-222) and formatEmailForDiscord (lines 294-300). Consider extracting this to a shared helper function.
♻️ Suggested refactor
/**
 * Extracts clean body text from email, handling MIME artifacts
 */
function extractCleanBodyText(email: ParsedEmail): string {
  let bodyText = email.body.text || ''
  
  // If text body is empty or looks like it contains MIME boundaries, try HTML
  const hasMimeBoundaries = bodyText.includes('--') && bodyText.includes('Content-Type:')
  if (!bodyText || hasMimeBoundaries) {
    bodyText = stripHtml(email.body.html || '')
  }
  
  // Clean up any MIME boundaries, headers, and HTML that leaked through
  return cleanEmailBody(bodyText)
}
Then use extractCleanBodyText(email) in both functions.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f40d2a9 and 68e804b.

📒 Files selected for processing (2)

cloudflare_workers/email/discord.ts
cloudflare_workers/email/email-parser.ts

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{vue,ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Run bun lint to lint Vue, TypeScript, and JavaScript files; use bun lint:fix to auto-fix issues

Files:

cloudflare_workers/email/discord.ts
cloudflare_workers/email/email-parser.ts

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use single quotes and no semicolons per @antfu/eslint-config rules

Files:

cloudflare_workers/email/discord.ts
cloudflare_workers/email/email-parser.ts

**/*.{ts,tsx,js,jsx,vue}

📄 CodeRabbit inference engine (AGENTS.md)

Run bun lint or lint/format command before validating any backend or frontend task to ensure consistent formatting

Files:

cloudflare_workers/email/discord.ts
cloudflare_workers/email/email-parser.ts

🧬 Code graph analysis (1)

cloudflare_workers/email/email-parser.ts (1)

cloudflare_workers/email/types.ts (1)

EmailAttachment (50-55)

🪛 GitHub Check: CodeQL

cloudflare_workers/email/discord.ts

[failure] 430-433: Double escaping or unescaping
This replacement may produce '&' characters that are double-unescaped here.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run tests

🔇 Additional comments (8)

cloudflare_workers/email/email-parser.ts (5)

29-46: LGTM! Good additions for subject decoding and boundary extraction.

The RFC 2047 decoding for subject lines and extracting boundary from the Content-Type header are solid improvements. The logging provides useful debugging information for troubleshooting email parsing issues.

164-169: Good recursion depth limit to prevent stack overflow.

The MAX_DEPTH = 10 limit is a sensible safeguard against maliciously crafted emails with deeply nested multipart structures. The warning log helps with debugging when this limit is hit.

212-227: Nested multipart parsing correctly passes boundary and increments depth.

The explicit boundary passing and depth increment ensure proper parsing of nested structures while preventing infinite recursion. This addresses the PR objective of preventing stack overflow.

229-238: Good improvement for inline image detection via Content-ID.

Detecting inline images using Content-ID headers (common in Gmail) ensures attachments render correctly in Discord threads. The fallback logic for images without explicit disposition is sensible.

248-268: Prefer first text/plain and text/html parts - good approach.

Setting body text/html only if not already set ensures the first (typically most relevant) content part is used, which aligns with RFC 2046 recommendations for multipart/alternative.

cloudflare_workers/email/discord.ts (3)

213-223: MIME artifact detection is a reasonable heuristic.

The check for -- combined with Content-Type: is a sensible way to detect when MIME boundaries have leaked into the text body. The fallback to HTML conversion and subsequent cleanup ensures cleaner Discord messages.

384-424: cleanEmailBody function is well-structured for MIME artifact removal.

The sequential cleanup of MIME boundaries, headers, HTML tags, and whitespace normalization addresses the PR objectives effectively. The character-safe stripTagsSafely usage avoids regex-based sanitization vulnerabilities.

451-473: Good integration of entity decoding in HTML-to-Markdown conversion.

Calling decodeHtmlEntities after Turndown conversion and in the fallback path ensures consistent handling of HTML entities across all code paths.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

- Move & decoding to end of decodeHtmlEntities to prevent double-unescaping - Fix RFC 2047 Q-encoding to properly decode multi-byte UTF-8 charsets - Remove stripTagsSafely from cleanEmailBody to preserve <email@example.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

sonarqubecloud · 2026-01-15T01:52:20Z

Quality Gate failed

Failed conditions
3 Security Hotspots

See analysis details on SonarQube Cloud

riderx and others added 2 commits January 14, 2026 23:57

fix: prevent infinite recursion in nested multipart email parsing

68e804b

- Pass nested boundary explicitly to recursive calls - Add depth limit (max 10) to prevent stack overflow - Fixes RangeError: Maximum call stack size exceeded Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-advanced-security AI found potential problems Jan 15, 2026

View reviewed changes

Comment thread cloudflare_workers/email/discord.ts Fixed

chatgpt-codex-connector Bot reviewed Jan 15, 2026

View reviewed changes

coderabbitai Bot reviewed Jan 15, 2026

View reviewed changes

Comment thread cloudflare_workers/email/discord.ts

Comment thread cloudflare_workers/email/email-parser.ts

riderx merged commit 6230214 into main Jan 15, 2026
8 of 9 checks passed

riderx deleted the riderx/fix-email-discord branch January 15, 2026 01:47

This was referenced Jan 15, 2026

Handle large email attachments via R2 storage #1443

Merged

feat: add security email filter to email worker #1446

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: clean email formatting for Discord messages#1441

fix: clean email formatting for Discord messages#1441
riderx merged 3 commits into
mainfrom
riderx/fix-email-discord

riderx commented Jan 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jan 15, 2026 •

edited

Loading

Review failed

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jan 15, 2026

Uh oh!

chatgpt-codex-connector Bot Jan 15, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// Strip any remaining HTML tags
		cleaned = stripTagsSafely(cleaned)

Uh oh!

Conversation

riderx commented Jan 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jan 15, 2026

Quality Gate failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

riderx commented Jan 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 15, 2026 •

edited

Loading