Skip to content

fix(storage): add R2 bucket migration workflow#2333

Open
riderx wants to merge 5 commits into
mainfrom
codex/r2-bucket-migration
Open

fix(storage): add R2 bucket migration workflow#2333
riderx wants to merge 5 commits into
mainfrom
codex/r2-bucket-migration

Conversation

@riderx
Copy link
Copy Markdown
Member

@riderx riderx commented May 23, 2026

Summary (AI generated)

  • Add split upload/download/fallback R2 bucket bindings for the files Worker.
  • Add matching Supabase S3 bucket routing via S3_UPLOAD_BUCKET, S3_DOWNLOAD_BUCKET, and S3_FALLBACK_BUCKET.
  • Add scripts/migrate_r2_bucket.mjs to copy only active DB-referenced bundle and manifest objects to a clean R2 bucket using server-side copy, resumable worker cursors, verification, and failed CSV output.

Motivation (AI generated)

R2 contains significantly more data than the active bundle storage counted by the database, which indicates stale/orphaned objects from previous cleanup failures. We need a safe migration path that stops new writes to the dirty bucket first, copies only keys still referenced by Capgo, verifies the clean bucket, then switches downloads without local object downloads.

Business Impact (AI generated)

This reduces storage waste and gives Capgo an operationally safe way to move production traffic to a clean R2 bucket while preserving download availability during the migration. It also reduces future debugging risk by making upload/download bucket routing explicit and reversible during staged migrations.

Test Plan (AI generated)

  • bun run lint:backend
  • bun run typecheck
  • bun scripts/migrate_r2_bucket.mjs --help
  • bun scripts/migrate_r2_bucket.mjs --phase deploy-upload --source-bucket capgo --target-bucket capgo-clean-test --target prod
  • bun scripts/migrate_r2_bucket.mjs --phase deploy-download --source-bucket capgo --target-bucket capgo-clean-test --target prod
  • bun scripts/migrate_r2_bucket.mjs --phase create-bucket --source-bucket capgo --target-bucket capgo-clean-test --target prod
  • bun scripts/migrate_r2_bucket.mjs --phase deploy-final --source-bucket capgo --target-bucket capgo-clean-test --target prod

Generated with AI

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for multiple attachment buckets (upload, download, and fallback) enabling better resource management and redundancy.
    • Added migration tool (admin:migrate-r2-bucket) to facilitate data transfer between buckets.
  • Chores

    • Updated bucket configuration to support new bucket bindings across all environments.

Review Change Stack

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 23, 2026

Warning

Review limit reached

@riderx, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 3 reviews/hour. Refill in 13 minutes and 48 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ce4d33b4-15d3-4d18-a605-3740194056c7

📥 Commits

Reviewing files that changed from the base of the PR and between afdbf51 and e88d5f2.

📒 Files selected for processing (5)
  • .github/workflows/tests.yml
  • package.json
  • scripts/check-supabase-migration-order.sh
  • scripts/migrate_r2_bucket.mjs
  • supabase/functions/_backend/utils/s3.ts
📝 Walkthrough

Walkthrough

This PR introduces multi-bucket R2 attachment support for Capgo. It expands Wrangler config to bind dedicated upload, download, and fallback R2 buckets per environment; updates file handlers to read from multiple download buckets; refactors S3 utilities to operate across bucket collections; and adds a staged data migration script with planning, copying, verification, and deployment phases.

Changes

Multi-bucket R2 and S3 attachment support

Layer / File(s) Summary
Bucket configuration and type contracts
supabase/functions/_backend/utils/cloudflare.ts, supabase/functions/_backend/files/buckets.ts, cloudflare_workers/files/wrangler.jsonc
Cloudflare Worker Bindings type makes attachment bucket bindings optional. New AttachmentBucketBindings interface and getAttachmentUploadBucket(), getAttachmentDownloadBuckets() helpers select upload and download buckets with fallback chains. Wrangler config adds ATTACHMENT_UPLOAD_BUCKET, ATTACHMENT_DOWNLOAD_BUCKET, and ATTACHMENT_FALLBACK_BUCKET bindings to all environments (prod, preprod, alpha, local), each pointing to the same per-environment bucket/preview name.
R2 file attachment reads from multiple download buckets
supabase/functions/_backend/files/files.ts, supabase/functions/_backend/files/preview.ts
File and preview handlers now derive a list of download buckets, construct RetryBucket instances for retry behavior, and iterate through buckets and candidate keys to fetch attachments. Added headFirstExistingAttachmentCandidateInBuckets() helper to find first existing object across multiple bucket readers. Cache restoration detects object existence across retry buckets and uploads to chosen upload bucket when missing. Range/HEAD checks and main object retrieval paths iterate buckets first, then candidate keys.
R2 file attachment uploads using selected upload bucket
supabase/functions/_backend/files/uploadHandler.ts
Upload handler resolves bucket via getAttachmentUploadBucket() and throws explicit error when binding is missing, replacing direct env.ATTACHMENT_BUCKET usage.
S3 backend multi-bucket operations
supabase/functions/_backend/utils/s3.ts
initS3 now accepts optional bucketName parameter. Upload and delete flows are bucket-aware: getUploadUrl selects upload bucket; deleteObject presigns and executes DELETEs concurrently across all mutation buckets, treating 2xx and 404 as successful. moveObjectToTrash and deleteObjectsWithPrefix compute copy/delete operations across mutation buckets with missing-object resilience. getSignedUrl uses findExistingObject() to locate the actual bucket containing the object, then presigns a GET. Size-resolution logic (getSizeFromRangeFallback, getSizeForKey, getSize) threads bucket parameter through and iterates download buckets + manifest candidate keys. getObject fetches via presigned GETs across download buckets and candidate keys, returning first successful response.

Data migration from source to target R2 bucket

Layer / File(s) Summary
Migration script CLI, environment, and DB setup
scripts/migrate_r2_bucket.mjs (lines 1–279)
Parses CLI flags (--phase, --apply, --workers, --batch-size, --skip-existing, --verify-after-copy, multipart params); loads target-specific .env files; validates and selects DB connection URL (refusing local URLs for default prod); creates Postgres pool with sizing from worker/batch settings and SSL for prod.
Migration script infrastructure and helpers
scripts/migrate_r2_bucket.mjs (lines 280–416)
Lazy S3/R2 client init with endpoint/credentials; path sanitization; temp/output directory + state/report/failed-csv setup; CSV/error serialization; storage-path encoding/decoding and candidate key generation; DB query retry wrapper with backoff and error classification.
Migration script DB discovery
scripts/migrate_r2_bucket.mjs (lines 417–589)
DB query builders for two object kinds (versions, manifest); range/page SQL; worker range splitting; resumable state init/load/update with cursor/done semantics; progress logging fields.
Migration script object copy and multipart ops
scripts/migrate_r2_bucket.mjs (lines 593–722)
HEAD checks for target size-based match; small-object server-side copy; multipart create + concurrent part copy + completion with abort-on-error.
Migration script per-row processing and worker loops
scripts/migrate_r2_bucket.mjs (lines 724–867)
Failure record construction; copy flow with optional skip-existing; candidate-source attempts with fallback for missing vs too-large; optional post-copy verification; batch processing; per-kind concurrent worker loops over DB pages with persisted state updates.
Migration script phase orchestration
scripts/migrate_r2_bucket.mjs (lines 868–925)
Plan phase reporting ID ranges and byte totals; copy/verify phase setup with state handling per mode; CSV header writing; sequential worker set execution with progress flushing.
Migration script deployment and finalization
scripts/migrate_r2_bucket.mjs (lines 926–1161)
Bucket creation/checking; Wrangler config generation with R2 binding substitution; files worker deployment; Supabase S3 secret setting (prod only); "all" phase with pre-deploy verification gate and post-deploy keyspace verification; deploy-final phase; lifecycle cleanup with JSON report writing, state persistence, DB pool close, and summary printing. Supports dry-run mode throughout.
Migration script npm command
package.json
Adds admin:migrate-r2-bucket npm script entry invoking bun scripts/migrate_r2_bucket.mjs.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • Cap-go/capgo#2297: Both PRs modify moveObjectToTrash in S3 utilities; this PR extends trash-handling to be bucket-aware with non-fatal missing-object behavior.
  • Cap-go/capgo#2320: Both PRs update manifest storage candidate key handling in S3 size and object lookup paths.
  • Cap-go/capgo#2292: Both PRs refactor object size-lookup and range fallback logic in S3 utilities used by manifest retry flows.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 1.05% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main change: adding an R2 bucket migration workflow, which is the primary feature introduced in this PR.
Description check ✅ Passed The description includes a summary of changes, motivation, business impact, and test plan with specific testing steps executed. However, it is AI-generated and lacks a proper manual test reproduction section.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread scripts/migrate_r2_bucket.mjs Fixed
Comment thread scripts/migrate_r2_bucket.mjs Fixed
Comment thread scripts/migrate_r2_bucket.mjs Fixed
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@package.json`:
- Line 99: The new npm script "admin:migrate-r2-bucket" in package.json is
inserted between Stripe backfill scripts, breaking logical grouping; relocate
the "admin:migrate-r2-bucket" entry so it sits with other admin/backfill scripts
(immediately after the admin backfill block) or move it to follow all
Stripe-related scripts (i.e., after the Stripe scripts block) to keep related
scripts grouped together and preserve ordering consistency.

In `@scripts/migrate_r2_bucket.mjs`:
- Around line 836-841: processRows currently starts up to the whole batch
concurrently via Promise.all(rows.map(...)), which can overwhelm S3; change it
to run row tasks with a concurrency limiter (reuse the existing
mapWithConcurrency helper used elsewhere) so verifyRow/copyRow are executed with
a fixed parallelism (e.g., 10-50) instead of all at once, then collect failures,
call appendFailedRows, increment report.batches and call logProgress as before;
specifically replace Promise.all(rows.map(...)) with mapWithConcurrency(rows,
concurrency, row => mode === 'verify' ? verifyRow(row) : copyRow(row)) and keep
the rest of the function unchanged.
- Around line 1097-1115: The two functions runKindWorkersWithMode and
runKindWorkers contain duplicate looping and paging logic; consolidate them by
refactoring runKindWorkers to accept optional parameters (e.g., mode,
limitRecords, persistState) or an options object and move the shared logic
(building pageQuery via buildPageQuery, the paging loop, calling processRows,
updating workerState.cursor/done) into that single function; then reimplement
runKindWorkersWithMode as a simple wrapper that calls runKindWorkers with the
appropriate options (ensuring it passes mode and any special behavior for record
limiting/state persistence) so there is one maintained implementation for
pagination and worker state handling.
- Around line 857-863: runKindWorkers currently relies on the global phase
variable to choose mode; change it to accept an explicit mode parameter (e.g.,
mode) and use that instead of phase when calling processRows (replace phase ===
'verify' ? 'verify' : 'copy' with mode === 'verify' ? 'verify' : 'copy'). Update
runKindWorkers signature and all callers (including any places that call it
indirectly like runKindWorkersWithMode) to pass the appropriate mode value, and
preserve existing workerState and saveState behavior unchanged.
- Around line 272-278: The production DB pool currently sets ssl: target ===
'prod' ? { rejectUnauthorized: false } : undefined which disables TLS
certificate verification; update the pool creation (symbol: pool) to not disable
verification for prod — either remove rejectUnauthorized or set it to true
(e.g., ssl: target === 'prod' ? { rejectUnauthorized: true } : undefined), and
if disabling was intentional for Supabase compatibility, add a concise inline
comment next to the pool creation explaining the reason and risk so reviewers
understand why rejectUnauthorized is false.
- Around line 1042-1045: The deployFilesWorker function hardcodes the --env=prod
flag causing local deployments to target production; update deployFilesWorker
(and the runCommand invocation) to conditionally include the env flag based on
the stage/target: if stage equals 'prod' (or the canonical production
identifier) include ['--env', 'prod'] (or '--env=prod'), otherwise omit the
--env flag or pass ['--env', stage] for non-prod targets (e.g., 'local' should
not pass '--env=prod'). Modify the arguments built for runCommand in
deployFilesWorker so tempConfig and stage are preserved but the env flag is only
added when appropriate.

In `@supabase/functions/_backend/utils/s3.ts`:
- Around line 61-63: getOptionalEnv calls getEnv which throws for missing keys,
causing optional lookups to fail; change getOptionalEnv (params: c: Context,
key: string) to call getEnv inside a try/catch (or check existence without
throwing) and return null when getEnv would throw or returns empty/whitespace,
so optional bucket/env lookups can fall back gracefully; refer to getOptionalEnv
and getEnv to locate the code to modify.
- Line 248: The exists check currently treats zero-byte objects as missing by
requiring contentLength > 0; change the logic so zero-byte objects count as
existing—update the assignment where exists is computed (referencing exists,
response.status and contentLength) to treat any 200 response as existing (e.g.,
remove the contentLength > 0 requirement or use contentLength >= 0) so valid
zero-byte objects are not marked missing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9275db5d-f985-41fb-b092-b845fd5693ff

📥 Commits

Reviewing files that changed from the base of the PR and between 9318f92 and afdbf51.

📒 Files selected for processing (9)
  • cloudflare_workers/files/wrangler.jsonc
  • package.json
  • scripts/migrate_r2_bucket.mjs
  • supabase/functions/_backend/files/buckets.ts
  • supabase/functions/_backend/files/files.ts
  • supabase/functions/_backend/files/preview.ts
  • supabase/functions/_backend/files/uploadHandler.ts
  • supabase/functions/_backend/utils/cloudflare.ts
  • supabase/functions/_backend/utils/s3.ts

Comment thread package.json Outdated
Comment thread scripts/migrate_r2_bucket.mjs
Comment thread scripts/migrate_r2_bucket.mjs
Comment thread scripts/migrate_r2_bucket.mjs Outdated
Comment thread scripts/migrate_r2_bucket.mjs
Comment thread scripts/migrate_r2_bucket.mjs
Comment thread supabase/functions/_backend/utils/s3.ts
Comment thread supabase/functions/_backend/utils/s3.ts Outdated
@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented May 23, 2026

Merging this PR will not alter performance

✅ 43 untouched benchmarks
⏩ 2 skipped benchmarks1


Comparing codex/r2-bucket-migration (e88d5f2) with main (875fd87)

Open in CodSpeed

Footnotes

  1. 2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@sonarqubecloud
Copy link
Copy Markdown

@sonarqubecloud
Copy link
Copy Markdown

deployStage('upload')
}
else if (phase === 'deploy-download') {
deployStage('download')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we put the same verification guard on the standalone download switch?

--phase all --apply runs runVerifyAfterAll() before deployStage('download'), but --phase deploy-download --apply reaches this branch and switches the worker/Supabase bucket config immediately. Since deploy-download is exposed as a standalone phase, an operator can promote the target bucket before the target has passed the full DB-key verification. The source fallback reduces outage risk, but it also means an incomplete target could be silently promoted and rely on fallback.

Could deploy-download either run the same verification gate before applying, require a persisted successful verify result for this source/target pair, or require an explicit force flag with a warning?

Assisted by Codesx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants