fix(script): fan out full manifest backfill batches#2318
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthrough
ChangesStorage concurrency refactor
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Merging this PR will not alter performance
Comparing Footnotes
|
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/backfill_manifest_file_sizes.mjs`:
- Line 654: Update the help text for the --concurrency option (the string
currently saying "Backward-compatible no-op; each worker checks its full
1000-row batch at once.") to also explain the rationale for its removal—e.g.,
note that storage requests now use full-batch parallelism for maximum throughput
and that concurrency was previously limited per-worker—so the help message
clarifies why --concurrency is a no-op.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 74782cf1-1b67-4e27-a7c4-d5051b9e01fd
📒 Files selected for processing (1)
scripts/backfill_manifest_file_sizes.mjs
|
hadessunxy-code
left a comment
There was a problem hiding this comment.
I checked the backfill change and the decoded-path fallback looks sensible, but I would be careful with the new fan-out behavior before merging.
--all defaults to 8 workers and each worker now Promise.alls a full 1000-row batch, so the default path can issue up to ~8,000 storage metadata requests at once. If R2/S3 starts returning transient 429/5xx responses, the retry loop will also sleep and retry those failed calls in large synchronized waves. That is a pretty big behavior change from the previous --concurrency cap, and it also makes the still-accepted --concurrency flag unable to protect production runs.
I would either keep a real storage concurrency limiter around the per-row getObjectSizeWithRetry calls, or make the full-batch fan-out opt-in / bounded by a new explicit high-watermark. That keeps the legacy backfill safe for production operators while still allowing the faster mode when the storage backend can absorb it.



Summary (AI generated)
--concurrencyaccepted as a backward-compatible no-op so older commands still run.Motivation (AI generated)
The backfill needs to process millions of manifest rows quickly. The DB/API candidate read remains bounded at 1000 rows, but storage metadata checks should use the full batch capacity per worker instead of a smaller extra throttle.
Business Impact (AI generated)
This increases production backfill throughput so missing manifest file sizes can be corrected faster, while preserving indexed manifest.id scans and bulk DB updates.
Test Plan (AI generated)
Summary by CodeRabbit