fix(backend): retry manifest size lookup#2292
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughAdds S3 sizing/fallback refactors, trigger retry logic for missing manifest sizes, queue consumer tuning (batch/concurrency and queue headers), a backfill CLI to populate missing sizes, and unit tests validating retry and queue behavior. ChangesManifest file size handling and queue optimization
Sequence DiagramsequenceDiagram
participant QC as Queue Consumer
participant Tuning as getQueueBatchSize
participant Pool as mapWithConcurrency
participant Helper as http_post_helper
participant Trigger as on_manifest_create
participant S3 as S3/R2 Storage
QC->>Tuning: getQueueBatchSize(manifest)
Tuning-->>QC: capped batch (e.g., 100)
QC->>Pool: process messages with bounded workers (e.g., 10)
Pool->>Helper: dispatch POST with QueueMessageMetadata headers
Helper->>Trigger: invoke on_manifest_create
Trigger->>S3: getSize(s3_path) via statObject (HEAD)
alt HEAD returns size
S3-->>Trigger: object size
else HEAD insufficient
S3->>Trigger: Range GET fallback (bytes=0-0)
S3-->>Trigger: size or 0
end
Trigger->>Trigger: shouldRetryManifestSizeLookup?
alt missing and no valid persisted file_size
Trigger-->>Helper: respond 503 manifest_size_not_found (retain for retry)
else valid or existing fileSize present
Trigger-->>Helper: respond 200 success (update or keep)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
supabase/functions/_backend/triggers/queue_consumer.ts (1)
220-225: ⚡ Quick winKeep the new queue tuning logs structured.
These new
cloudlog()calls droprequestIdand the batch metrics into an unstructured string, which makes the new concurrency/budget behavior harder to trace in production.Proposed fix
- cloudlog(`[${queueName}] Processing ${messagesToProcess.length} messages and skipping ${messagesToSkip.length} messages with concurrency ${processConcurrency}.`) + cloudlog({ + requestId: c.get('requestId'), + message: `[${queueName}] Processing queue batch.`, + processingCount: messagesToProcess.length, + skippedCount: messagesToSkip.length, + concurrency: processConcurrency, + }) // Archive messages after the configured retry budget is exhausted. if (messagesToSkip.length > 0) { - cloudlog(`[${queueName}] Archiving ${messagesToSkip.length} messages that exceeded the retry budget.`) + cloudlog({ + requestId: c.get('requestId'), + message: `[${queueName}] Archiving messages that exceeded the retry budget.`, + archiveCount: messagesToSkip.length, + }) await archive_queue_messages(c, db, queueName, messagesToSkip.map(msg => msg.msg_id)) }As per coding guidelines, "All endpoints must receive Hono
Context<MiddlewareKeyVariables>object and usec.get('requestId')for structured logging withcloudlog()."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@supabase/functions/_backend/triggers/queue_consumer.ts` around lines 220 - 225, The new cloudlog calls that log concurrency and archiving (using getQueueHttpConcurrency, processConcurrency, messagesToProcess, messagesToSkip, queueName) must be converted to structured logs that include the Hono requestId and batch metrics: obtain requestId via c.get('requestId') and pass it plus metrics (e.g., queueName, processConcurrency, messagesToProcess.length, messagesToSkip.length, and any batch id) as structured fields to cloudlog instead of interpolated strings; update the cloudlog invocations around where processConcurrency is computed and where archiving is logged to use the context-derived requestId and explicit key/value fields so production tracing remains consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/backfill_manifest_file_sizes.mjs`:
- Around line 52-54: The current getArgValue logic returns the next argv token
even if it's another flag (e.g., "--apply"), causing silent no-ops; update
getArgValue to validate the candidate value: after finding const index =
process.argv.indexOf(name) ensure process.argv[index + 1] exists and that it
does NOT start with '--' (or otherwise indicate a flag), and if the check fails
return undefined or throw a clear error; reference the getArgValue function and
the process.argv[index + 1] lookup so you replace the unconditional return with
a guarded check that rejects missing/flag tokens.
- Around line 141-160: The current code always calls parseObjectSizeFromHeaders
even for failed (4xx/5xx) or partial responses, which lets error XML bodies or
206 responses without Content-Range produce bogus sizes; change the logic to
only derive size for successful range responses: check response.status first and
if status is >= 400 return without a size, and only call
parseObjectSizeFromHeaders when response.status === 206 AND contentRange exists
(do not accept Content-Length as the full object size for 206 if Content-Range
is missing); ensure you still cancel response.body and return
contentLength/contentRange/method/status/statusText but leave size undefined
when not safely derivable (use the existing parseObjectSizeFromHeaders,
contentRange, contentLength and response.status symbols to locate and adjust the
logic).
- Around line 24-31: The current loop calling config({ path: resolve(__dirname,
envPath) }) loads env files in an order where earlier files win; change the
loading so later files override earlier ones by either reversing the envPath
array order or (preferably) passing override: true for subsequent loads; update
the loop that iterates over envPath (the config call) to call config({ path:
resolve(__dirname, envPath), override: true }) for files that should take
precedence (e.g., .env.local and cloudflare .envs) so .local values override
base .env values.
In `@supabase/functions/_backend/utils/s3.ts`:
- Around line 218-242: When doing the ranged GET fallback, don't parse
Content-Length/Content-Range for non-success responses: after awaiting
fetch(url...) check res.ok (or res.status) and if the response is not
successful, cancel res.body, log the failure via the existing cloudlog call
(including status/statusText/requestId/fileId/reason) and return a sentinel
(e.g. undefined/null) instead of calling parseObjectSizeFromHeaders; only call
parseObjectSizeFromHeaders and return size when res.ok is true. Ensure you use
the same local symbols (res, parseObjectSizeFromHeaders, cloudlog, fileId,
reason, c.get('requestId')) so the change is localized.
---
Nitpick comments:
In `@supabase/functions/_backend/triggers/queue_consumer.ts`:
- Around line 220-225: The new cloudlog calls that log concurrency and archiving
(using getQueueHttpConcurrency, processConcurrency, messagesToProcess,
messagesToSkip, queueName) must be converted to structured logs that include the
Hono requestId and batch metrics: obtain requestId via c.get('requestId') and
pass it plus metrics (e.g., queueName, processConcurrency,
messagesToProcess.length, messagesToSkip.length, and any batch id) as structured
fields to cloudlog instead of interpolated strings; update the cloudlog
invocations around where processConcurrency is computed and where archiving is
logged to use the context-derived requestId and explicit key/value fields so
production tracing remains consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 68a850c4-d1ac-498f-b4a1-f16cbac57b41
📒 Files selected for processing (6)
scripts/backfill_manifest_file_sizes.mjssupabase/functions/_backend/triggers/on_manifest_create.tssupabase/functions/_backend/triggers/queue_consumer.tssupabase/functions/_backend/utils/s3.tstests/backend-alert-resilience.unit.test.tstests/queue-consumer-message-shape.unit.test.ts
Merging this PR will not alter performance
Comparing Footnotes
|
|



Summary (AI generated)
on_manifest_createqueue batches/concurrency to reduce large-manifest storage bursts.Motivation (AI generated)
Manifest rows could stay at
file_size = 0because the worker returned success when R2/S3 metadata lookup returned no size. Large manifests amplified this by letting the queue consumer fan out hundreds of storage lookups at once.Business Impact (AI generated)
This restores reliable manifest size metadata, improves billing/storage accounting accuracy, and gives support/ops a repair path for affected historical bundles without trusting client-provided file sizes.
Test Plan (AI generated)
bun run lint:backendbun run lintbun run cli:lintbun run typecheckbunx vitest run tests/backend-alert-resilience.unit.test.ts tests/queue-consumer-message-shape.unit.test.tsbun scripts/backfill_manifest_file_sizes.mjs --helpbunx eslint --no-ignore scripts/backfill_manifest_file_sizes.mjsgit diff --checkSummary by CodeRabbit
New Features
Bug Fixes
Tests
Chores