Skip to content

fix: clean email formatting for Discord messages#1441

Merged
riderx merged 3 commits into
mainfrom
riderx/fix-email-discord
Jan 15, 2026
Merged

fix: clean email formatting for Discord messages#1441
riderx merged 3 commits into
mainfrom
riderx/fix-email-discord

Conversation

@riderx
Copy link
Copy Markdown
Member

@riderx riderx commented Jan 15, 2026

Summary

Fixes email-to-Discord transformation to remove MIME artifacts and properly handle attachments. Emails now display clean text without boundaries, headers, and HTML tags. Also prevents infinite recursion in nested multipart email parsing.

Changes

  • Strip MIME boundaries (--xxxxx) and Content-* headers from email bodies
  • Decode HTML entities and properly convert HTML to text
  • Extract boundary from Content-Type header for accurate MIME parsing
  • Detect inline images with Content-ID headers (Gmail)
  • Add recursion depth limit (max 10) to prevent stack overflow

Test plan

  1. Send email with multipart/mixed content (text + HTML + attachment)
  2. Verify Discord message shows clean text without MIME artifacts
  3. Verify attached images appear in Discord thread
  4. Test reply handling for email threads

Checklist

  • Code builds without errors
  • Requires manual testing with live email flow

Summary by CodeRabbit

  • New Features

    • Broader attachment handling including audio, video and application files; improved inline image handling.
    • Better handling of encoded subjects and HTML entities for accurate display.
  • Bug Fixes

    • Cleaner email bodies: removes MIME boundaries, headers and leaked HTML; collapses excess whitespace.
    • More reliable multipart parsing and boundary detection with added guardrails and parsing robustness.

✏️ Tip: You can customize this high-level summary in your review settings.

riderx and others added 2 commits January 14, 2026 23:57
…achment detection

- Extract boundary from Content-Type header for proper MIME parsing
- Add cleanEmailBody() to strip MIME boundaries, headers, and HTML tags
- Decode HTML entities in email body text
- Improve inline image detection with Content-ID support
- Add RFC 2047 decoding for encoded subject lines

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Pass nested boundary explicitly to recursive calls
- Add depth limit (max 10) to prevent stack overflow
- Fixes RangeError: Maximum call stack size exceeded

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 15, 2026

Caution

Review failed

The pull request is closed.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds RFC2047 subject decoding, header-derived boundary-aware multipart parsing with recursion limits and richer attachment detection, plus new HTML-entity decoding and MIME-artifact cleaning used when formatting email bodies for Discord.

Changes

Cohort / File(s) Summary
Email parser core
cloudflare_workers/email/email-parser.ts
Adds RFC2047 subject decoding, header boundary extraction, charset-aware/base64 decoding helpers, boundary-aware parseEmailBodyAndAttachments(rawEmail, headerBoundary?, depth) with max-depth guard, enhanced multipart and attachment detection, and additional parsing/logging.
Discord formatting & sanitization
cloudflare_workers/email/discord.ts
Adds cleanEmailBody and decodeHtmlEntities; derive bodyText robustly from text/html, detect MIME boundaries, strip HTML when needed, clean MIME headers/boundaries, decode entities, and truncate/fallback for empty messages; integrates helpers into post/format flows.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Raw Email Source
    participant Parser as Email Parser
    participant Boundary as Boundary Extractor
    participant Multipart as Multipart Handler
    participant Formatter as Discord Formatter

    Client->>Parser: submit rawEmail
    Parser->>Boundary: parse Content-Type header
    Boundary-->>Parser: headerBoundary
    Parser->>Parser: decode RFC2047 subject
    Parser->>Multipart: parse body with headerBoundary, depth=0
    Multipart->>Multipart: split parts, detect nested boundaries
    Multipart-->>Parser: text/html, text/plain, attachments
    Parser->>Formatter: body parts + attachments
    Formatter->>Formatter: select text/html vs text/plain, detect MIME artifacts
    Formatter->>Formatter: cleanEmailBody + decodeHtmlEntities
    Formatter->>External: post formatted message to Discord
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰
Boundaries found and headers unspun,
Encoded subjects now read in the sun,
MIME bits scrubbed and entities freed,
Threads sleep tidy — a rabbit's good deed. 🥕✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: cleaning email formatting for Discord messages by removing MIME artifacts.
Description check ✅ Passed The description covers the main objectives, includes a detailed changes section, provides a test plan with steps, and has a checklist with relevant items completed.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings


📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68e804b and 0e36412.

📒 Files selected for processing (2)
  • cloudflare_workers/email/discord.ts
  • cloudflare_workers/email/email-parser.ts

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread cloudflare_workers/email/discord.ts Fixed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 68e804b03b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +83 to +88
else if (encoding.toUpperCase() === 'Q') {
// Quoted-printable encoding
const decoded = encoded
.replace(/_/g, ' ')
.replace(/=([0-9A-Fa-f]{2})/g, (_: string, hex: string) => String.fromCharCode(Number.parseInt(hex, 16)))
return decoded
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Decode Q-encoded RFC2047 subjects as UTF-8 bytes

This Q-decoding branch converts each =XX byte directly with String.fromCharCode, which treats bytes as Unicode code points rather than decoding the byte sequence using the declared charset. For UTF-8 subjects like =?UTF-8?Q?Fran=C3=A7ois?=, this yields mojibake (François) instead of the intended text. Any RFC2047 Q-encoded subject with multibyte UTF-8 (or other multibyte charsets) will be displayed incorrectly in Discord.

Useful? React with 👍 / 👎.

Comment thread cloudflare_workers/email/discord.ts Outdated
Comment on lines +403 to +405
// Strip any remaining HTML tags
cleaned = stripTagsSafely(cleaned)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid stripping plain-text angle-bracket content

cleanEmailBody always runs stripTagsSafely even when the body came from the plain-text part, so any legitimate text enclosed in <...> (email addresses like <support@example.com>, XML snippets, code, or Markdown) gets removed entirely. Because cleanEmailBody is applied to email.body.text in both thread posts and embeds, plain-text emails can lose meaningful content that previously rendered correctly.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@cloudflare_workers/email/discord.ts`:
- Around line 429-440: The decodeHtmlEntities function can double-unescape
sequences like "&amp;lt;" because &amp; is decoded before other entities; update
decodeHtmlEntities so it decodes numeric/hex entities and named entities like
&lt;, &gt;, &quot;, &#39;, &nbsp; first and perform the &amp; → & replacement
last (i.e., move the .replace(/&amp;/g, '&') to the end), and ensure callers
(stripHtml, cleanEmailBody, stripTagsSafely) continue to call decodeHtmlEntities
only once per processing step to avoid repeated decoding.

In `@cloudflare_workers/email/email-parser.ts`:
- Around line 74-96: The Q-encoding branch in decodeRfc2047 decodes hex escapes
into individual JS code units which breaks multi-byte charsets like UTF-8;
update the Q branch in decodeRfc2047 to parse =HH sequences into a byte array
(preserve non-escaped ASCII bytes), then use TextDecoder with the specified
charset (fallback to 'utf-8') to decode the resulting Uint8Array into a proper
string; ensure charset is normalized (e.g., lowercased) and reuse the existing
decodeBase64Utf8 approach or its charset handling conventions so Q-decoded
multi-byte sequences are decoded correctly for charsets such as 'utf-8'.
🧹 Nitpick comments (2)
cloudflare_workers/email/email-parser.ts (1)

64-69: Consider making the boundary regex more robust.

The current regex may fail to capture boundaries containing certain characters. RFC 2046 allows boundaries to contain characters like =, :, and . which are not matched by [^"\s;]+.

♻️ Suggested improvement
 function extractBoundaryFromHeader(contentType: string): string | undefined {
   if (!contentType) return undefined
 
-  const boundaryMatch = contentType.match(/boundary="?([^"\s;]+)"?/i)
+  // Handle both quoted and unquoted boundaries per RFC 2046
+  const quotedMatch = contentType.match(/boundary="([^"]+)"/i)
+  if (quotedMatch) return quotedMatch[1]
+  
+  const unquotedMatch = contentType.match(/boundary=([^\s;]+)/i)
+  return unquotedMatch?.[1]
-  return boundaryMatch?.[1]
 }
cloudflare_workers/email/discord.ts (1)

290-302: Duplicated MIME detection logic - consider extracting to a helper.

The same MIME boundary detection and fallback logic appears in both postToThread (lines 216-222) and formatEmailForDiscord (lines 294-300). Consider extracting this to a shared helper function.

♻️ Suggested refactor
/**
 * Extracts clean body text from email, handling MIME artifacts
 */
function extractCleanBodyText(email: ParsedEmail): string {
  let bodyText = email.body.text || ''
  
  // If text body is empty or looks like it contains MIME boundaries, try HTML
  const hasMimeBoundaries = bodyText.includes('--') && bodyText.includes('Content-Type:')
  if (!bodyText || hasMimeBoundaries) {
    bodyText = stripHtml(email.body.html || '')
  }
  
  // Clean up any MIME boundaries, headers, and HTML that leaked through
  return cleanEmailBody(bodyText)
}

Then use extractCleanBodyText(email) in both functions.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f40d2a9 and 68e804b.

📒 Files selected for processing (2)
  • cloudflare_workers/email/discord.ts
  • cloudflare_workers/email/email-parser.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{vue,ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Run bun lint to lint Vue, TypeScript, and JavaScript files; use bun lint:fix to auto-fix issues

Files:

  • cloudflare_workers/email/discord.ts
  • cloudflare_workers/email/email-parser.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use single quotes and no semicolons per @antfu/eslint-config rules

Files:

  • cloudflare_workers/email/discord.ts
  • cloudflare_workers/email/email-parser.ts
**/*.{ts,tsx,js,jsx,vue}

📄 CodeRabbit inference engine (AGENTS.md)

Run bun lint or lint/format command before validating any backend or frontend task to ensure consistent formatting

Files:

  • cloudflare_workers/email/discord.ts
  • cloudflare_workers/email/email-parser.ts
🧬 Code graph analysis (1)
cloudflare_workers/email/email-parser.ts (1)
cloudflare_workers/email/types.ts (1)
  • EmailAttachment (50-55)
🪛 GitHub Check: CodeQL
cloudflare_workers/email/discord.ts

[failure] 430-433: Double escaping or unescaping
This replacement may produce '&' characters that are double-unescaped here.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run tests
🔇 Additional comments (8)
cloudflare_workers/email/email-parser.ts (5)

29-46: LGTM! Good additions for subject decoding and boundary extraction.

The RFC 2047 decoding for subject lines and extracting boundary from the Content-Type header are solid improvements. The logging provides useful debugging information for troubleshooting email parsing issues.


164-169: Good recursion depth limit to prevent stack overflow.

The MAX_DEPTH = 10 limit is a sensible safeguard against maliciously crafted emails with deeply nested multipart structures. The warning log helps with debugging when this limit is hit.


212-227: Nested multipart parsing correctly passes boundary and increments depth.

The explicit boundary passing and depth increment ensure proper parsing of nested structures while preventing infinite recursion. This addresses the PR objective of preventing stack overflow.


229-238: Good improvement for inline image detection via Content-ID.

Detecting inline images using Content-ID headers (common in Gmail) ensures attachments render correctly in Discord threads. The fallback logic for images without explicit disposition is sensible.


248-268: Prefer first text/plain and text/html parts - good approach.

Setting body text/html only if not already set ensures the first (typically most relevant) content part is used, which aligns with RFC 2046 recommendations for multipart/alternative.

cloudflare_workers/email/discord.ts (3)

213-223: MIME artifact detection is a reasonable heuristic.

The check for -- combined with Content-Type: is a sensible way to detect when MIME boundaries have leaked into the text body. The fallback to HTML conversion and subsequent cleanup ensures cleaner Discord messages.


384-424: cleanEmailBody function is well-structured for MIME artifact removal.

The sequential cleanup of MIME boundaries, headers, HTML tags, and whitespace normalization addresses the PR objectives effectively. The character-safe stripTagsSafely usage avoids regex-based sanitization vulnerabilities.


451-473: Good integration of entity decoding in HTML-to-Markdown conversion.

Calling decodeHtmlEntities after Turndown conversion and in the fallback path ensures consistent handling of HTML entities across all code paths.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment thread cloudflare_workers/email/discord.ts
Comment thread cloudflare_workers/email/email-parser.ts
- Move &amp; decoding to end of decodeHtmlEntities to prevent double-unescaping
- Fix RFC 2047 Q-encoding to properly decode multi-byte UTF-8 charsets
- Remove stripTagsSafely from cleanEmailBody to preserve <email@example.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@riderx riderx merged commit 6230214 into main Jan 15, 2026
8 of 9 checks passed
@riderx riderx deleted the riderx/fix-email-discord branch January 15, 2026 01:47
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
3 Security Hotspots

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants