Summary
The async-refresh handler (#257, PR #260) and the scheduler use last_collection_started_at as a single in-flight flag. Two callers can race and clear each other's marker:
T+0 cron-scheduler: ClearCollectionStarted (cleans up its previous run)
T+1 user clicks Refresh → handler: MarkCollectionStarted (ok=true, started_at = T+1)
T+2 cron-scheduler: enters NEW run, MarkCollectionStarted (ok=false, no-op)
T+3 cron-scheduler completes earlier batch (?), defer fires:
ClearCollectionStarted ← wipes the user-triggered run's marker
T+4 user-triggered run still runs in the background, frontend stops
polling because last_collection_started_at is null
The current safeguard is the 5-min auto-recovery window in MarkCollectionStarted: even if a marker is wiped, a fresh refresh after 5 min works again. So the practical impact is bounded — frontend banner clears too early, but the underlying state self-heals.
Proposal
Replace the single boolean with an owner token (UUID) per collection:
MarkCollectionStarted returns (token uuid.UUID, ok bool, err error).
ClearCollectionStarted(ctx, token) does a compare-and-clear: only clears the marker if the persisted owner matches token. Mismatches are silent no-ops.
- Defer-clear in
scheduler.CollectRecommendations and the rollback in handler_recommendations_refresh.go::asyncInvokeSelf failure path both pass their own token.
Scope
- New column
last_collection_owner_id UUID on recommendations_state (migration 000048).
StoreInterface signature change: MarkCollectionStarted and ClearCollectionStarted both gain a token param.
- 5 mock impls in
internal/{analytics,api,purchase,scheduler,server} packages need the new signature.
- Callers in
internal/api/handler_recommendations_refresh.go and internal/scheduler/scheduler.go thread the token through.
Why deferred from #260
CR pass-1 raised this as priority/major on PR #260. The bookkeeping refactor is bigger than a CR-pass-1 fix should be (touches a migration, the StoreInterface, all 5 mocks, and both call sites), and the practical impact is bounded by the existing 5-min recovery window. Tracking here so it's not lost.
Why this matters
When CUDly grows beyond a single user / single account, concurrent refresh clicks become more common. The current behaviour silently confuses the polling banner; an owner-token model makes it deterministic.
Summary
The async-refresh handler (#257, PR #260) and the scheduler use
last_collection_started_atas a single in-flight flag. Two callers can race and clear each other's marker:The current safeguard is the 5-min auto-recovery window in
MarkCollectionStarted: even if a marker is wiped, a fresh refresh after 5 min works again. So the practical impact is bounded — frontend banner clears too early, but the underlying state self-heals.Proposal
Replace the single boolean with an owner token (UUID) per collection:
MarkCollectionStartedreturns(token uuid.UUID, ok bool, err error).ClearCollectionStarted(ctx, token)does a compare-and-clear: only clears the marker if the persisted owner matchestoken. Mismatches are silent no-ops.scheduler.CollectRecommendationsand the rollback inhandler_recommendations_refresh.go::asyncInvokeSelffailure path both pass their own token.Scope
last_collection_owner_id UUIDonrecommendations_state(migration 000048).StoreInterfacesignature change:MarkCollectionStartedandClearCollectionStartedboth gain a token param.internal/{analytics,api,purchase,scheduler,server}packages need the new signature.internal/api/handler_recommendations_refresh.goandinternal/scheduler/scheduler.gothread the token through.Why deferred from #260
CR pass-1 raised this as
priority/majoron PR #260. The bookkeeping refactor is bigger than a CR-pass-1 fix should be (touches a migration, theStoreInterface, all 5 mocks, and both call sites), and the practical impact is bounded by the existing 5-min recovery window. Tracking here so it's not lost.Why this matters
When CUDly grows beyond a single user / single account, concurrent refresh clicks become more common. The current behaviour silently confuses the polling banner; an owner-token model makes it deterministic.