Skip to content

fix(onboarding): revert #2291 trigger that wipes production data#2296

Merged
WcaleNieWolny merged 2 commits into
mainfrom
revert-2291-onboarding-data-loss
May 19, 2026
Merged

fix(onboarding): revert #2291 trigger that wipes production data#2296
WcaleNieWolny merged 2 commits into
mainfrom
revert-2291-onboarding-data-loss

Conversation

@WcaleNieWolny
Copy link
Copy Markdown
Contributor

@WcaleNieWolny WcaleNieWolny commented May 19, 2026

TL;DR

Stops the production data-loss cascade introduced by #2291. Two changes
in this PR:

  1. Drops the complete_onboarding_after_first_upload trigger via a new
    forward migration. This closes the auto-fire path (every bundle
    upload).
  2. Adds a production-safety guard inside clear_onboarding_app_data
    itself. This closes every other path -- including
    capgo init -> "use existing app" against a long-lived
    need_onboarding = TRUE app, which would otherwise still wipe data
    even with the upload-side trigger gone.

Tracking: #2295.

What broke

The trigger added in #2291 (migration
20260518131054_complete_onboarding_after_first_upload.sql,
deployed as part of capgo-12.140.2) fires on the first real bundle
upload and flips apps.need_onboarding from TRUEFALSE. That
flip is observed by the pre-existing cleanup_onboarding_app_data_on_complete
trigger, which calls clear_onboarding_app_data(app_uuid, preserve_id).
That function unconditionally DELETEs from:

  • channel_devices, channels, devices
  • app_versions (except the just-uploaded id + builtin/unknown),
    app_versions_meta
  • deploy_history, build_requests
  • daily_version, daily_bandwidth, daily_storage, daily_mau,
    daily_build_time

The only guard before this PR was WHERE need_onboarding IS TRUE.
Because need_onboarding was historically only cleared by capgo init,
every app provisioned via the dashboard or CI without an explicit
capgo init was silently armed
. At least one production app
(issue #2295) was wiped on its next routine upload.

Fix shape

Forward-only revert -- the destructive migration has already been
applied on production databases, so a git revert of the original
commit would not actually deactivate the trigger on prod. Everything in
this PR runs forward:

  1. New migration
    20260519065534_revert_complete_onboarding_after_first_upload_trigger.sql:
    • DROP TRIGGER + DROP FUNCTION for the items added by
      20260518131054. Deactivates the upload-side trip-wire.
    • CREATE OR REPLACE FUNCTION clear_onboarding_app_data(uuid, bigint)
      with a production-safety guard prepended. Refuses to delete
      (RAISE WARNING + early RETURN) when any of the following is
      true for the target app:
      • any row in public.devices
      • any row in public.channel_devices
      • any row in public.deploy_history beyond the preserved
        version
      • any channel whose version is set and ≠ the preserved version
    • The single-arg clear_onboarding_app_data(uuid) wrapper that
      [codex] Complete onboarding after first upload #2291 introduced is left in place; it delegates to the two-arg
      overload and so inherits the guard automatically.
  2. CLI restore: cli/src/init/command.ts is restored to its
    pre-[codex] Complete onboarding after first upload #2291 state so capgo init unconditionally completes pending
    onboarding again. The CAPGO_CLI_COMPLETE_ONBOARDING env-var gate
    is removed. Safe to do because the guard in Launch paid offers #1 now makes that path
    non-destructive for real apps.
  3. Test cleanup:
    • tests/onboarding-first-upload.test.ts is removed -- it tested
      the now-dropped trigger.
    • tests/security-definer-execute-hardening.test.ts no longer
      references complete_onboarding_after_first_upload() (still
      lists both clear_onboarding_app_data overloads, which remain).

Why a function-level guard (and not e.g. just a backfill)

The guard is defense-in-depth. A one-shot UPDATE apps SET need_onboarding = FALSE WHERE … backfill (issue #2295's suggestion #4)
would also disarm currently-armed apps, but it doesn't protect against
future code paths that might flip the flag again. The guard makes the
cleanup function itself safe to call against any app -- the strongest
invariant we can express in the schema.

A backfill is still a reasonable follow-up (cleans up the
need_onboarding = TRUE flag state on real apps so they look correct
in queries), but it's no longer blocking for data safety. Left for a
separate PR.

What this does NOT do

  • It does not restore data already deleted by the cascade. That
    requires per-account point-in-time backup restore.
  • It does not backfill apps.need_onboarding = FALSE on apps that
    still have it set. With the guard, those apps are safe even if the
    flag flips -- but they remain in an unusual flag state. Follow-up PR.
  • It does not redesign the onboarding-completion flow. A safer
    re-implementation (re-enable an auto-complete path, this time with
    the guard already in place) can land separately once we are out of
    incident mode.

Test plan

  • bun run supabase:db:reset -- migrations apply cleanly end to end.
  • After reset, confirm in the running DB:
    SELECT tgname FROM pg_trigger WHERE tgname = 'complete_onboarding_after_first_upload' returns 0 rows;
    SELECT proname FROM pg_proc WHERE proname = 'complete_onboarding_after_first_upload' returns 0 rows.
  • Manual: seed an app with need_onboarding = TRUE plus 1 row in
    devices. Flip the flag to FALSE. Confirm the trigger fires,
    clear_onboarding_app_data raises a WARNING, and no rows are
    deleted from channels, app_versions, devices,
    daily_version, etc.
  • Manual: seed a genuine onboarding placeholder app
    (need_onboarding = TRUE, no devices, no deploy_history, only
    builtin/unknown bundles). Flip the flag. Confirm cleanup runs as
    before and builtin/unknown rows are recreated.
  • Manual: upload a bundle to an app with need_onboarding = TRUE.
    Confirm the flag is NOT flipped (no auto-trigger) and no data
    is deleted.
  • Manual: run capgo init against an app with pending onboarding
    and confirm it still transitions the app to a complete state.
  • bun run lint:backend
  • bun run cli:lint && bun run cli:typecheck && bun run cli:build
  • bun run supabase:with-env -- bunx vitest run tests/security-definer-execute-hardening.test.ts

Refs

Summary by CodeRabbit

  • Refactor

    • Streamlined CLI onboarding completion flow with a new completion path and simplified UX.
  • Revert

    • Reverted automatic onboarding completion on first bundle upload and added a production-safety guard to prevent unintended data deletions.
  • Tests

    • Removed the onboarding-first-upload test suite and updated security-related test fixtures to account for the adjusted onboarding cleanup procedure.

Review Change Stack

PR #2291 added a trigger that flips `apps.need_onboarding` to FALSE on
the first real bundle upload, which cascades into
`clear_onboarding_app_data()` and destroys all channels, bundles,
devices, deploy history, and daily metrics for the app.

Any app where `need_onboarding` was still TRUE -- which is the case for
every app provisioned via the dashboard or CI without ever running
`capgo init` -- was silently armed for irreversible data loss on its
next upload after the migration deployed. At least one production
incident has been reported (#2295).

This change:

- Adds a forward migration that drops the new trigger
  `complete_onboarding_after_first_upload` on `app_versions` and its
  underlying function. The pre-existing
  `cleanup_onboarding_app_data_on_complete` trigger on `apps` stays
  in place; it is safe on its own because the only remaining caller
  of the flag transition is an explicit `capgo init`.
- Restores `cli/src/init/command.ts` to its pre-#2291 behavior so
  `capgo init` once again completes pending onboarding unconditionally,
  matching the now-restored backend contract.
- Removes `tests/onboarding-first-upload.test.ts`, which tested the
  removed trigger.
- Removes the dropped function from the security-definer hardening
  test's expected list.

The two-arg `clear_onboarding_app_data(uuid, bigint)` overload that
#2291 introduced is left in the database; it is dormant without the
trigger that called it, and a forward-only migration should avoid
unnecessary churn.

Refs: #2295, #2291
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4094955a-0c08-4c36-8bd9-f9d6c885f776

📥 Commits

Reviewing files that changed from the base of the PR and between c2e917b and f3746ac.

📒 Files selected for processing (1)
  • supabase/migrations/20260519065534_revert_complete_onboarding_after_first_upload_trigger.sql
🚧 Files skipped from review as they are similar to previous changes (1)
  • supabase/migrations/20260519065534_revert_complete_onboarding_after_first_upload_trigger.sql

📝 Walkthrough

Walkthrough

Reverts a DB trigger/function that auto-cleared onboarding on first upload, replaces legacy CLI onboarding wrappers with a spinner-wrapped helper calling completePendingOnboardingApp, removes the onboarding-first-upload integration tests, and updates security-definer test fixtures.

Changes

Onboarding Auto-Complete Revert and CLI Refactor

Layer / File(s) Summary
Database schema and cleanup function
supabase/migrations/20260519065534_revert_complete_onboarding_after_first_upload_trigger.sql
Drops the complete_onboarding_after_first_upload trigger and its function, and replaces public.clear_onboarding_app_data with a guarded implementation that aborts on production signals and otherwise deletes onboarding-related rows, reinserts preserved app_versions, and updates public.apps caches and metrics cache.
CLI onboarding completion refactor
cli/src/init/command.ts
Removes legacy-backend gating wrappers and introduces completeExistingAppPendingOnboarding(...), which calls completePendingOnboardingApp(...) wrapped in a spinner; pending-app preparation and existing-app "use existing" onboarding branches now use this helper.
Test updates and removals
tests/security-definer-execute-hardening.test.ts, tests/onboarding-first-upload.test.ts
Adds public.cleanup_onboarding_app_data_on_complete() to SERVICE_ONLY_PROCS in the security-definer test. The integration test file tests/onboarding-first-upload.test.ts that asserted the previous auto-complete-on-first-upload behavior was removed.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Cap-go/capgo#1966: Overlaps with security-definer grant hardening and the SERVICE_ONLY_PROCS test fixture changes.
  • Cap-go/capgo#2291: Previously introduced the auto-complete-on-first-upload trigger and related CLI gating that this PR reverts.

Poem

🐰 A trigger leapt when bundles flew,

but data slipped away, askew.
We rolled it back, then patched the CLI,
a helper spins to finish nigh.
Hoppity hops—onboarding true.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main change: reverting the trigger from #2291 that was causing production data loss. It directly matches the PR's primary objective.
Description check ✅ Passed The description provides comprehensive context: it clearly explains what broke, the fix shape with specific migration details, what is NOT done, and a detailed test plan with multiple manual verification steps.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch revert-2291-onboarding-data-loss

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 SQLFluff (4.2.1)
supabase/migrations/20260519065534_revert_complete_onboarding_after_first_upload_trigger.sql

User Error: No dialect was specified. You must configure a dialect or specify one on the command line using --dialect after the command. Available dialects:
ansi, athena, bigquery, clickhouse, databricks, db2, doris, duckdb, exasol, flink, greenplum, hive, impala, mariadb, materialize, mysql, oracle, postgres, redshift, snowflake, soql, sparksql, sqlite, starrocks, teradata, trino, tsql, vertica


Comment @coderabbitai help to get the list of available commands and usage tips.

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented May 19, 2026

Merging this PR will not alter performance

✅ 43 untouched benchmarks
⏩ 2 skipped benchmarks1


Comparing revert-2291-onboarding-data-loss (f3746ac) with main (478d3d9)2

Open in CodSpeed

Footnotes

  1. 2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (4f09f51) during the generation of this report, so 478d3d9 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2e917b97b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cli/src/init/command.ts
if (useExistingApp === true) {
if (existingApp.need_onboarding)
await maybeCompletePendingOnboardingAppForLegacyBackend(supabase, organization, appId)
await completeExistingAppPendingOnboarding(supabase, organization, appId)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Do not auto-complete armed existing apps

For an existing production app that still has need_onboarding = true (the exact armed state described in the new migration comments for apps created via dashboard/CI without capgo init), choosing “use existing” now unconditionally flips the flag. The migration only drops the first-upload trigger and explicitly leaves cleanup_onboarding_app_data_on_complete active, so this update still fires the cleanup path and deletes channels/bundles/devices/metrics for that app. This keeps a user-triggered data-loss path open for the affected apps; avoid calling completePendingOnboardingApp here unless the app is known to be an empty onboarding placeholder or after the flag has been safely backfilled.

Useful? React with 👍 / 👎.

…ion apps

The earlier commit on this branch closed the auto-fire path (the
upload-driven trigger from PR #2291) but left a user-triggered path
open: `capgo init` against an existing app with `need_onboarding = TRUE`
still calls `completePendingOnboardingApp`, which flips the flag and
fires `cleanup_onboarding_app_data_on_complete`, which calls
`clear_onboarding_app_data` -- still wiping the app.

Because the exact cohort described in #2295 is "apps provisioned via
dashboard or CI without ever running `capgo init`," and the natural
follow-up for those users is to run `capgo init` and pick "use existing
app," that path needs a real safety check, not a circular "the only
caller is `capgo init` so it's fine" argument.

Add a production-safety guard at the top of
`clear_onboarding_app_data(uuid, bigint)`. It refuses (RAISE WARNING
plus early RETURN) when any of four independent indicators of real
production usage exists:

  * any row in `public.devices`
  * any row in `public.channel_devices`
  * any row in `public.deploy_history` beyond the preserved version
  * any channel whose `version` is set and != the preserved version

This is defense-in-depth at the function level: every present and
future caller of either overload (`uuid` or `uuid, bigint`) is now
protected, regardless of how the flag transition got triggered. Genuine
onboarding placeholder apps -- the original intent -- pass the guard
and are cleaned up as before.

Refs: #2295
@WcaleNieWolny WcaleNieWolny merged commit bba5f71 into main May 19, 2026
33 of 35 checks passed
@WcaleNieWolny WcaleNieWolny deleted the revert-2291-onboarding-data-loss branch May 19, 2026 07:13
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant