fix(onboarding): revert #2291 trigger that wipes production data#2296
Conversation
PR #2291 added a trigger that flips `apps.need_onboarding` to FALSE on the first real bundle upload, which cascades into `clear_onboarding_app_data()` and destroys all channels, bundles, devices, deploy history, and daily metrics for the app. Any app where `need_onboarding` was still TRUE -- which is the case for every app provisioned via the dashboard or CI without ever running `capgo init` -- was silently armed for irreversible data loss on its next upload after the migration deployed. At least one production incident has been reported (#2295). This change: - Adds a forward migration that drops the new trigger `complete_onboarding_after_first_upload` on `app_versions` and its underlying function. The pre-existing `cleanup_onboarding_app_data_on_complete` trigger on `apps` stays in place; it is safe on its own because the only remaining caller of the flag transition is an explicit `capgo init`. - Restores `cli/src/init/command.ts` to its pre-#2291 behavior so `capgo init` once again completes pending onboarding unconditionally, matching the now-restored backend contract. - Removes `tests/onboarding-first-upload.test.ts`, which tested the removed trigger. - Removes the dropped function from the security-definer hardening test's expected list. The two-arg `clear_onboarding_app_data(uuid, bigint)` overload that #2291 introduced is left in the database; it is dormant without the trigger that called it, and a forward-only migration should avoid unnecessary churn. Refs: #2295, #2291
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughReverts a DB trigger/function that auto-cleared onboarding on first upload, replaces legacy CLI onboarding wrappers with a spinner-wrapped helper calling completePendingOnboardingApp, removes the onboarding-first-upload integration tests, and updates security-definer test fixtures. ChangesOnboarding Auto-Complete Revert and CLI Refactor
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 SQLFluff (4.2.1)supabase/migrations/20260519065534_revert_complete_onboarding_after_first_upload_trigger.sqlUser Error: No dialect was specified. You must configure a dialect or specify one on the command line using --dialect after the command. Available dialects: Comment |
Merging this PR will not alter performance
Comparing Footnotes
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c2e917b97b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (useExistingApp === true) { | ||
| if (existingApp.need_onboarding) | ||
| await maybeCompletePendingOnboardingAppForLegacyBackend(supabase, organization, appId) | ||
| await completeExistingAppPendingOnboarding(supabase, organization, appId) |
There was a problem hiding this comment.
Do not auto-complete armed existing apps
For an existing production app that still has need_onboarding = true (the exact armed state described in the new migration comments for apps created via dashboard/CI without capgo init), choosing “use existing” now unconditionally flips the flag. The migration only drops the first-upload trigger and explicitly leaves cleanup_onboarding_app_data_on_complete active, so this update still fires the cleanup path and deletes channels/bundles/devices/metrics for that app. This keeps a user-triggered data-loss path open for the affected apps; avoid calling completePendingOnboardingApp here unless the app is known to be an empty onboarding placeholder or after the flag has been safely backfilled.
Useful? React with 👍 / 👎.
…ion apps The earlier commit on this branch closed the auto-fire path (the upload-driven trigger from PR #2291) but left a user-triggered path open: `capgo init` against an existing app with `need_onboarding = TRUE` still calls `completePendingOnboardingApp`, which flips the flag and fires `cleanup_onboarding_app_data_on_complete`, which calls `clear_onboarding_app_data` -- still wiping the app. Because the exact cohort described in #2295 is "apps provisioned via dashboard or CI without ever running `capgo init`," and the natural follow-up for those users is to run `capgo init` and pick "use existing app," that path needs a real safety check, not a circular "the only caller is `capgo init` so it's fine" argument. Add a production-safety guard at the top of `clear_onboarding_app_data(uuid, bigint)`. It refuses (RAISE WARNING plus early RETURN) when any of four independent indicators of real production usage exists: * any row in `public.devices` * any row in `public.channel_devices` * any row in `public.deploy_history` beyond the preserved version * any channel whose `version` is set and != the preserved version This is defense-in-depth at the function level: every present and future caller of either overload (`uuid` or `uuid, bigint`) is now protected, regardless of how the flag transition got triggered. Genuine onboarding placeholder apps -- the original intent -- pass the guard and are cleaned up as before. Refs: #2295
|



TL;DR
Stops the production data-loss cascade introduced by #2291. Two changes
in this PR:
complete_onboarding_after_first_uploadtrigger via a newforward migration. This closes the auto-fire path (every bundle
upload).
clear_onboarding_app_dataitself. This closes every other path -- including
capgo init-> "use existing app" against a long-livedneed_onboarding = TRUEapp, which would otherwise still wipe dataeven with the upload-side trigger gone.
Tracking: #2295.
What broke
The trigger added in #2291 (migration
20260518131054_complete_onboarding_after_first_upload.sql,deployed as part of
capgo-12.140.2) fires on the first real bundleupload and flips
apps.need_onboardingfromTRUE→FALSE. Thatflip is observed by the pre-existing
cleanup_onboarding_app_data_on_completetrigger, which calls
clear_onboarding_app_data(app_uuid, preserve_id).That function unconditionally
DELETEs from:channel_devices,channels,devicesapp_versions(except the just-uploaded id +builtin/unknown),app_versions_metadeploy_history,build_requestsdaily_version,daily_bandwidth,daily_storage,daily_mau,daily_build_timeThe only guard before this PR was
WHERE need_onboarding IS TRUE.Because
need_onboardingwas historically only cleared bycapgo init,every app provisioned via the dashboard or CI without an explicit
capgo initwas silently armed. At least one production app(issue #2295) was wiped on its next routine upload.
Fix shape
Forward-only revert -- the destructive migration has already been
applied on production databases, so a
git revertof the originalcommit would not actually deactivate the trigger on prod. Everything in
this PR runs forward:
20260519065534_revert_complete_onboarding_after_first_upload_trigger.sql:DROP TRIGGER+DROP FUNCTIONfor the items added by20260518131054. Deactivates the upload-side trip-wire.CREATE OR REPLACE FUNCTION clear_onboarding_app_data(uuid, bigint)with a production-safety guard prepended. Refuses to delete
(RAISE WARNING + early RETURN) when any of the following is
true for the target app:
public.devicespublic.channel_devicespublic.deploy_historybeyond the preservedversion
versionis set and ≠ the preserved versionclear_onboarding_app_data(uuid)wrapper that[codex] Complete onboarding after first upload #2291 introduced is left in place; it delegates to the two-arg
overload and so inherits the guard automatically.
cli/src/init/command.tsis restored to itspre-[codex] Complete onboarding after first upload #2291 state so
capgo initunconditionally completes pendingonboarding again. The
CAPGO_CLI_COMPLETE_ONBOARDINGenv-var gateis removed. Safe to do because the guard in Launch paid offers #1 now makes that path
non-destructive for real apps.
tests/onboarding-first-upload.test.tsis removed -- it testedthe now-dropped trigger.
tests/security-definer-execute-hardening.test.tsno longerreferences
complete_onboarding_after_first_upload()(stilllists both
clear_onboarding_app_dataoverloads, which remain).Why a function-level guard (and not e.g. just a backfill)
The guard is defense-in-depth. A one-shot
UPDATE apps SET need_onboarding = FALSE WHERE …backfill (issue #2295's suggestion #4)would also disarm currently-armed apps, but it doesn't protect against
future code paths that might flip the flag again. The guard makes the
cleanup function itself safe to call against any app -- the strongest
invariant we can express in the schema.
A backfill is still a reasonable follow-up (cleans up the
need_onboarding = TRUEflag state on real apps so they look correctin queries), but it's no longer blocking for data safety. Left for a
separate PR.
What this does NOT do
requires per-account point-in-time backup restore.
apps.need_onboarding = FALSEon apps thatstill have it set. With the guard, those apps are safe even if the
flag flips -- but they remain in an unusual flag state. Follow-up PR.
re-implementation (re-enable an auto-complete path, this time with
the guard already in place) can land separately once we are out of
incident mode.
Test plan
bun run supabase:db:reset-- migrations apply cleanly end to end.SELECT tgname FROM pg_trigger WHERE tgname = 'complete_onboarding_after_first_upload'returns 0 rows;SELECT proname FROM pg_proc WHERE proname = 'complete_onboarding_after_first_upload'returns 0 rows.need_onboarding = TRUEplus 1 row indevices. Flip the flag toFALSE. Confirm the trigger fires,clear_onboarding_app_dataraises aWARNING, and no rows aredeleted from
channels,app_versions,devices,daily_version, etc.(
need_onboarding = TRUE, no devices, no deploy_history, onlybuiltin/unknown bundles). Flip the flag. Confirm cleanup runs as
before and
builtin/unknownrows are recreated.need_onboarding = TRUE.Confirm the flag is NOT flipped (no auto-trigger) and no data
is deleted.
capgo initagainst an app with pending onboardingand confirm it still transitions the app to a complete state.
bun run lint:backendbun run cli:lint && bun run cli:typecheck && bun run cli:buildbun run supabase:with-env -- bunx vitest run tests/security-definer-execute-hardening.test.tsRefs
capgo-12.140.2Summary by CodeRabbit
Refactor
Revert
Tests