Problem
The nightly cron only captures campaigns sorted by newest, so any campaign that was already live before the system was first deployed is never ingested (it will have scrolled far past page 10 in the newest listing). Users who set keyword alerts on day 1 will get no matches for campaigns that have been running for weeks.
Expected Behaviour
A one-time (or periodic catch-up) backfill run that fetches campaigns across all sort orders and deeper page depths to seed the database with historically active campaigns.
Proposed Fix
- Add a
/admin/backfill endpoint (or a CLI flag) that triggers a deep crawl:
- Run once after deploy; subsequent nightly crons maintain freshness
- Rate-limit to avoid hammering ScrapingBee (existing
RateLimiter can be reused)
Notes
- One-time cost estimate: 15 categories × 3 sorts × 25 pages = 1,125 ScrapingBee requests × 5 credits = 5,625 credits (< 3% of monthly allowance)
Problem
The nightly cron only captures campaigns sorted by
newest, so any campaign that was already live before the system was first deployed is never ingested (it will have scrolled far past page 10 in thenewestlisting). Users who set keyword alerts on day 1 will get no matches for campaigns that have been running for weeks.Expected Behaviour
A one-time (or periodic catch-up) backfill run that fetches campaigns across all sort orders and deeper page depths to seed the database with historically active campaigns.
Proposed Fix
/admin/backfillendpoint (or a CLI flag) that triggers a deep crawl:magic,end_date,most_backedRateLimitercan be reused)Notes