Skip to content

Handle large email attachments via R2 storage#1443

Merged
riderx merged 2 commits into
mainfrom
riderx/email-large-attachments
Jan 15, 2026
Merged

Handle large email attachments via R2 storage#1443
riderx merged 2 commits into
mainfrom
riderx/email-large-attachments

Conversation

@riderx
Copy link
Copy Markdown
Member

@riderx riderx commented Jan 15, 2026

Summary (AI generated)

Added R2 storage support for large email attachments (≥25MB). Files that exceed Discord's 25MB limit are now uploaded to R2 with time-limited private URLs instead of being silently discarded. Small attachments continue to be uploaded directly to Discord.

Test plan (AI generated)

  • Send an email with a file >25MB to the email worker
  • Verify the file is uploaded to R2
  • Verify a private signed URL is generated and displayed in Discord
  • Verify the URL works and serves the correct file
  • Verify the URL expires after 7 days
  • Test with mixed attachments (some <25MB, some >25MB)

Checklist

  • Code follows project style
  • Manual testing completed with test file

Summary by CodeRabbit

  • New Features

    • Improved email→Discord attachment handling: files ≤25MB attach directly; larger files are uploaded to secure storage and shared as expiring download links included in messages and embeds.
    • Added a public file-serving endpoint for those expiring links so recipients can download large attachments.
  • Chores

    • Added storage binding and a configurable base URL for expiring file links.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 15, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Splits email attachments by size: small (≤25MB) sent directly to Discord; large (>25MB) uploaded to R2 with 7-day TTL and served via a new /files/{fileKey}/{filename} route. Discord messages/embed content now include R2 links for large attachments.

Changes

Cohort / File(s) Summary
R2 storage module
cloudflare_workers/email/r2-storage.ts
New module providing uploadLargeAttachment and serveR2File. Generates unique file keys, uploads to R2 with metadata, writes KV metadata with expiry TTL, returns public /files/... URLs, serves files with expiry checks and cleanup.
Discord attachment flow
cloudflare_workers/email/discord.ts
Added ProcessedAttachments type and processAttachmentsForDiscord(env, attachments, emailMessageId). createForumThread and postToThread now call processor; small attachments attached to Discord, large ones uploaded to R2 and returned as r2Links. formatEmailForDiscord signature updated to accept optional r2Links and embeds include large-attachment links. Multipart upload helper updated to upload provided attachments array; logging adjusted.
HTTP routing / serve endpoint
cloudflare_workers/email/index.ts
Added route handling for GET /files/{fileKey}/{filename} and import of serveR2File to return R2-stored files (200), 404 for not-found, and 410 for expired files (triggers cleanup).
Types & config
cloudflare_workers/email/types.ts, cloudflare_workers/email/wrangler.jsonc
Added EMAIL_ATTACHMENTS: R2Bucket and optional EMAIL_WORKER_URL?: string to Env. Wrangler moved attachment bucket into top-level r2_buckets binding and removed several per-environment env stanzas.
Index routing changes
cloudflare_workers/email/index.ts (fetch flow)
Import serveR2File and dispatch /files/{fileKey}/{filename} requests to serve files from R2 instead of returning 404.

Sequence Diagram(s)

sequenceDiagram
    participant Sender as Email Sender
    participant Worker as Cloudflare Worker
    participant R2 as R2 Storage
    participant KV as KV Store
    participant Discord as Discord API

    Sender->>Worker: Incoming email + attachments
    Worker->>Worker: processAttachmentsForDiscord(classify by size)
    alt Large attachments (>25MB)
        Worker->>R2: uploadLargeAttachment(attachment, emailMessageId)
        R2-->>Worker: stored & returns /files/{fileKey}/{filename} URL
        Worker->>KV: write metadata with TTL
        KV-->>Worker: ack
    end
    alt Small attachments (≤25MB)
        Worker->>Worker: keep for direct multipart upload
    end
    Worker->>Worker: formatEmailForDiscord(email, r2Links)
    Worker->>Discord: create thread message (attachments + r2 links in content/embeds)
    Discord-->>Worker: thread created
Loading
sequenceDiagram
    participant Client as Browser/User
    participant Worker as Cloudflare Worker
    participant KV as KV Store
    participant R2 as R2 Storage

    Client->>Worker: GET /files/{fileKey}/{filename}
    Worker->>KV: fetch metadata for fileKey
    alt Metadata not found
        Worker-->>Client: 404 Not Found
    else Metadata found
        Worker->>KV: check expiry
        alt Expired
            Worker->>R2: delete object
            R2-->>Worker: deleted
            Worker->>KV: delete metadata
            KV-->>Worker: deleted
            Worker-->>Client: 410 Gone
        else Valid
            Worker->>R2: fetch object
            R2-->>Worker: file content
            Worker-->>Client: 200 + file (Content-Type, Content-Disposition, Cache-Control, X-Expires-At)
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped the bytes from Discord to R2,

Big carrots cached with TTL for a few,
Small snacks stay inline in the message stream,
Links trail like breadcrumbs down the thread,
I tidy expired bites when morning's due.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: implementing R2 storage for large email attachments, which is the primary objective of the PR.
Description check ✅ Passed The description covers the summary and test plan sections from the template, with a checklist partially completed. However, the Screenshots section is missing, though it is noted as skippable for backend changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

🧹 Recent nitpick comments
cloudflare_workers/email/r2-storage.ts (2)

69-72: Sanitize filename in Content-Disposition header.

If attachment.filename contains double quotes, backslashes, or non-ASCII characters, the header value could be malformed or cause unexpected behavior. Consider sanitizing the filename for header use.

♻️ Proposed fix
+function sanitizeFilenameForHeader(filename: string): string {
+  // Remove/replace characters that could break Content-Disposition header
+  return filename.replace(/["\\]/g, '_').replace(/[^\x20-\x7E]/g, '_')
+}
+
 // In uploadLargeAttachment:
       httpMetadata: {
         contentType: attachment.contentType,
-        contentDisposition: `attachment; filename="${attachment.filename}"`,
+        contentDisposition: `attachment; filename="${sanitizeFilenameForHeader(attachment.filename)}"`,
       },

136-143: Consider using R2 object size for Content-Length.

metadata.size relies on the value stored during upload, but object.size from the R2 response reflects the actual stored file size. Using the R2 object's size is more accurate and resilient to any metadata inconsistencies.

♻️ Proposed fix
     headers.set('Content-Type', metadata.contentType || 'application/octet-stream')
     headers.set('Content-Disposition', `attachment; filename="${metadata.originalFilename}"`)
-    headers.set('Content-Length', metadata.size.toString())
+    headers.set('Content-Length', object.size.toString())
     headers.set('Cache-Control', 'private, max-age=3600')

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81638c6 and eb5bdbd.

📒 Files selected for processing (2)
  • cloudflare_workers/email/r2-storage.ts
  • cloudflare_workers/email/types.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • cloudflare_workers/email/types.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{vue,ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Run bun lint to lint Vue, TypeScript, and JavaScript files; use bun lint:fix to auto-fix issues

Files:

  • cloudflare_workers/email/r2-storage.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use single quotes and no semicolons per @antfu/eslint-config rules

Files:

  • cloudflare_workers/email/r2-storage.ts
**/*.{ts,tsx,js,jsx,vue}

📄 CodeRabbit inference engine (AGENTS.md)

Run bun lint or lint/format command before validating any backend or frontend task to ensure consistent formatting

Files:

  • cloudflare_workers/email/r2-storage.ts
🧠 Learnings (2)
📚 Learning: 2025-12-23T02:53:12.055Z
Learnt from: CR
Repo: Cap-go/capgo PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-23T02:53:12.055Z
Learning: Applies to cloudflare_workers/files/index.ts : Files Worker (port 8789) handles file upload/download operations

Applied to files:

  • cloudflare_workers/email/r2-storage.ts
📚 Learning: 2025-12-23T02:53:12.055Z
Learnt from: CR
Repo: Cap-go/capgo PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-23T02:53:12.055Z
Learning: Applies to cloudflare_workers/{api,plugin,files}/index.ts : Cloudflare Workers are split across three ports: API Worker (8787), Plugin Worker (8788), Files Worker (8789); see routing in `cloudflare_workers/{api,plugin,files}/index.ts`

Applied to files:

  • cloudflare_workers/email/r2-storage.ts
🧬 Code graph analysis (1)
cloudflare_workers/email/r2-storage.ts (1)
cloudflare_workers/email/types.ts (2)
  • Env (1-25)
  • EmailAttachment (54-59)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run tests
🔇 Additional comments (7)
cloudflare_workers/email/r2-storage.ts (7)

1-14: LGTM!

The constants and interface are well-defined. The expiry calculation is correct, and the metadata interface captures all necessary fields for tracking file lifecycle.


16-21: LGTM!

crypto.randomUUID() is appropriate for generating unique file keys in the Cloudflare Workers environment.


86-91: Good implementation of the base URL fix.

Using getBaseUrl(env) with EMAIL_WORKER_URL fallback addresses the earlier concern about hardcoded production URLs in non-prod environments.


102-106: Good fix: Using metadata.originalFilename for consistent lookups.

The _filename parameter is intentionally unused, and using metadata.originalFilename from KV ensures consistent R2 key construction, avoiding URL-encoding mismatches. This properly addresses the previous review feedback.


154-168: LGTM!

The cleanup function correctly deletes both the R2 object and KV metadata. The fix to pass metadata.originalFilename from serveR2File ensures consistent cleanup behavior.


170-176: Good fix for environment-specific URL configuration.

Using EMAIL_WORKER_URL from the environment allows non-prod deployments to generate correct URLs. This properly addresses the previous review feedback about hardcoded production URLs.


178-187: LGTM!

Simple and readable file size formatting. The KB/MB range is appropriate for email attachment sizes.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

"env": {
"prod": {
"name": "capgo_email-prod",
"vars": {

P2 Badge Restore non-prod envs in email wrangler config

The email worker README still documents wrangler deploy --env preprod (Deploy section), but the updated wrangler.jsonc now only defines env.prod. That means wrangler deploy --env preprod (and any alpha/local deployments) will fail with “environment not found,” breaking staging workflows and scripts that rely on those environments.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cloudflare_workers/email/r2-storage.ts Outdated
Comment on lines +173 to +175
function getBaseUrl(_env: Env): string {
return 'https://email.capgo.app'
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid hardcoded prod base URL for R2 links

getBaseUrl() always returns https://email.capgo.app, so attachments uploaded in preprod/dev (or when running wrangler dev) will generate links that point to prod, where the file doesn’t exist. Recipients will see 404s even though the upload succeeded. Derive the base URL from the request host or an env var so non-prod deployments generate usable links.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@cloudflare_workers/email/r2-storage.ts`:
- Around line 154-168: cleanupExpiredFile currently builds the R2 key from the
URL-provided filename which can mismatch stored metadata; change callers (e.g.,
serveR2File) to pass the metadata.originalFilename instead of the URL filename,
and update cleanupExpiredFile to accept and use that originalFilename when
constructing r2Key (`attachments/${fileKey}/${originalFilename}`) for
env.EMAIL_ATTACHMENTS.delete (and still delete `r2:${fileKey}` from
env.EMAIL_THREAD_MAPPING); ensure any other callers of cleanupExpiredFile are
updated to pass the correct originalFilename from the file metadata.
- Around line 126-133: The fetch path may use a URL-decoded filename that
doesn't match the stored R2 key created in uploadLargeAttachment; change the key
construction in the download/serve flow to use the stored metadata
originalFilename (e.g. metadata.originalFilename) or the same source used when
uploading (instead of the incoming filename param) so the lookup via
EMAIL_ATTACHMENTS.get(...) uses the exact key created by uploadLargeAttachment
(r2Key should be built from fileKey and the stored attachment filename/metadata,
not the request filename).
🧹 Nitpick comments (3)
cloudflare_workers/email/r2-storage.ts (3)

67-78: Consider sanitizing the filename before using it in the R2 key.

The attachment.filename is used directly in the R2 key path without sanitization. While R2 keys are not filesystem paths, filenames from email attachments could contain unexpected characters (e.g., slashes, null bytes) that might cause issues or confusion. Consider sanitizing or validating the filename.

♻️ Suggested sanitization
+/**
+ * Sanitizes filename for safe use in R2 keys
+ */
+function sanitizeFilename(filename: string): string {
+  return filename
+    .replace(/[/\\]/g, '_')  // Replace path separators
+    .replace(/\0/g, '')       // Remove null bytes
+    .substring(0, 255)        // Limit length
+}
+
 // Upload to R2 with metadata
-const r2Key = `attachments/${fileKey}/${attachment.filename}`
+const r2Key = `attachments/${fileKey}/${sanitizeFilename(attachment.filename)}`

173-175: Hardcoded base URL may complicate local development.

The base URL is hardcoded to 'https://email.capgo.app'. Consider deriving it from the request URL or environment configuration to support local development and testing scenarios.


180-186: Duplicate formatFileSize function.

This function is also defined in discord.ts (lines 414-418). Consider extracting it to a shared utility module to avoid duplication.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6230214 and 1c1df69.

📒 Files selected for processing (5)
  • cloudflare_workers/email/discord.ts
  • cloudflare_workers/email/index.ts
  • cloudflare_workers/email/r2-storage.ts
  • cloudflare_workers/email/types.ts
  • cloudflare_workers/email/wrangler.jsonc
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{vue,ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Run bun lint to lint Vue, TypeScript, and JavaScript files; use bun lint:fix to auto-fix issues

Files:

  • cloudflare_workers/email/index.ts
  • cloudflare_workers/email/types.ts
  • cloudflare_workers/email/r2-storage.ts
  • cloudflare_workers/email/discord.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use single quotes and no semicolons per @antfu/eslint-config rules

Files:

  • cloudflare_workers/email/index.ts
  • cloudflare_workers/email/types.ts
  • cloudflare_workers/email/r2-storage.ts
  • cloudflare_workers/email/discord.ts
**/*.{ts,tsx,js,jsx,vue}

📄 CodeRabbit inference engine (AGENTS.md)

Run bun lint or lint/format command before validating any backend or frontend task to ensure consistent formatting

Files:

  • cloudflare_workers/email/index.ts
  • cloudflare_workers/email/types.ts
  • cloudflare_workers/email/r2-storage.ts
  • cloudflare_workers/email/discord.ts
🧠 Learnings (4)
📚 Learning: 2025-12-23T02:53:12.055Z
Learnt from: CR
Repo: Cap-go/capgo PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-23T02:53:12.055Z
Learning: Applies to cloudflare_workers/files/index.ts : Files Worker (port 8789) handles file upload/download operations

Applied to files:

  • cloudflare_workers/email/index.ts
  • cloudflare_workers/email/r2-storage.ts
📚 Learning: 2025-12-23T02:53:12.055Z
Learnt from: CR
Repo: Cap-go/capgo PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-23T02:53:12.055Z
Learning: Applies to cloudflare_workers/{api,plugin,files}/index.ts : Cloudflare Workers are split across three ports: API Worker (8787), Plugin Worker (8788), Files Worker (8789); see routing in `cloudflare_workers/{api,plugin,files}/index.ts`

Applied to files:

  • cloudflare_workers/email/index.ts
  • cloudflare_workers/email/r2-storage.ts
📚 Learning: 2025-12-23T02:53:12.055Z
Learnt from: CR
Repo: Cap-go/capgo PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-23T02:53:12.055Z
Learning: Applies to cloudflare_workers/api/index.ts : API Worker (port 8787) routes: `/bundle`, `/app`, `/device`, `/channel`, `/private/*`, `/triggers`

Applied to files:

  • cloudflare_workers/email/index.ts
📚 Learning: 2025-12-23T02:53:12.055Z
Learnt from: CR
Repo: Cap-go/capgo PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-23T02:53:12.055Z
Learning: Applies to cloudflare_workers/plugin/index.ts : Plugin Worker (port 8788) routes: `/updates`, `/channel_self`, `/stats`

Applied to files:

  • cloudflare_workers/email/index.ts
🧬 Code graph analysis (3)
cloudflare_workers/email/index.ts (1)
cloudflare_workers/email/r2-storage.ts (1)
  • serveR2File (102-149)
cloudflare_workers/email/r2-storage.ts (1)
cloudflare_workers/email/types.ts (2)
  • Env (1-24)
  • EmailAttachment (53-58)
cloudflare_workers/email/discord.ts (3)
cloudflare_workers/email/types.ts (3)
  • EmailAttachment (53-58)
  • Env (1-24)
  • DiscordMessage (70-76)
cloudflare_workers/email/r2-storage.ts (1)
  • uploadLargeAttachment (27-97)
cloudflare_workers/email/index.ts (1)
  • email (22-165)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (java-kotlin)
🔇 Additional comments (11)
cloudflare_workers/email/types.ts (1)

5-7: LGTM!

The R2Bucket type binding is correctly added to the Env interface, properly aligning with the wrangler.jsonc configuration.

cloudflare_workers/email/wrangler.jsonc (1)

42-47: R2 bucket binding defined at both top-level and environment-specific level.

The r2_buckets configuration appears both at the top level (lines 56-61) and under env.prod (lines 42-47). This is valid for Wrangler configurations where the top-level applies to local development/default, while env.prod applies to production deployments. Ensure this duplication is intentional and that the same bucket should be used across all environments.

Also applies to: 56-61

cloudflare_workers/email/r2-storage.ts (2)

1-14: LGTM!

Constants and interface are well-defined. The 7-day expiry aligns with the Discord thread archive duration as documented.


16-21: LGTM!

Using crypto.randomUUID() is appropriate for generating unique file keys in Cloudflare Workers.

cloudflare_workers/email/index.ts (2)

6-6: LGTM!

Import added correctly.


175-181: Route handler looks correct.

The file serving route is properly integrated. The decodeURIComponent is necessary for handling URL-encoded filenames. Note that security around the filename parameter is addressed by the suggested fix in r2-storage.ts to use metadata.originalFilename instead of the URL parameter.

cloudflare_workers/email/discord.ts (5)

15-60: LGTM with observation on error handling.

The attachment classification and processing logic is correct. Note that if an R2 upload fails (lines 53-55), the large attachment is silently dropped rather than causing a failure. This is acceptable for a non-critical feature, but users won't be notified if a large attachment failed to upload.


72-82: LGTM!

Clean refactoring to check for empty attachments array.


145-156: LGTM!

The integration of processAttachmentsForDiscord is well-implemented, with proper logging of attachment counts.


259-276: LGTM!

Good implementation for reply messages. The R2 link formatting with the expiry notice is helpful for users.


337-387: LGTM!

The updated formatEmailForDiscord function cleanly separates small attachments (uploaded to Discord) from large attachments (R2 links) in the embed. The expiry notice is consistently shown for R2 links.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment thread cloudflare_workers/email/r2-storage.ts Outdated
Comment thread cloudflare_workers/email/r2-storage.ts
- Files ≥25MB uploaded to R2 instead of Discord
- Serve files via worker with expiring signed URLs
- Links expire after 7 days (matching thread archive)
- Small attachments still uploaded directly to Discord
- Simplified to prod environment only

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@riderx riderx force-pushed the riderx/email-large-attachments branch from 1c1df69 to 81638c6 Compare January 15, 2026 03:14
- Use metadata.originalFilename for R2 key lookup (avoid URL encoding mismatch)
- Pass originalFilename to cleanupExpiredFile for consistent cleanup
- Add EMAIL_WORKER_URL env var for configurable base URL

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

@riderx riderx merged commit f0947b0 into main Jan 15, 2026
10 of 11 checks passed
@riderx riderx deleted the riderx/email-large-attachments branch January 15, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant