feat(gastown): add debug dry-run endpoint with event draining#1370
Conversation
Moves useFeatureFlagVariantKey('button-vs-card') from ClawDashboard
(which renders for all users including those with existing instances)
to CreateInstanceCard (which only renders for users who haven't
provisioned yet). This scopes the experiment exposure to users who
can actually see the create CTA, avoiding population dilution.
…nt (#1338) ## Summary Evaluates the `button-vs-card` PostHog feature flag in `CreateInstanceCard` so the SDK attaches `$feature/button-vs-card` to subsequent events (including `claw_create_instance_clicked`). Without this, the cloud app's PostHog SDK never evaluates the flag, so the experiment gets 0 conversions even though users are clicking. The flag is evaluated in `CreateInstanceCard` (not `ClawDashboard`) so only users who can actually see the create CTA are marked as exposed. `ClawDashboard` also renders for users with existing instances, mid-onboarding, or viewing settings — evaluating there would dilute the experiment population. ## Verification - [x] Verified formatting with `oxfmt` on changed files - [x] Typecheck passes (no new errors from this change) - [x] Confirmed `useFeatureFlagVariantKey` is exported by `posthog-js/react` (v1.360.2) ## Visual Changes N/A ## Reviewer Notes - No UI or behavior changes. The hook return value is intentionally unused — the sole purpose is flag evaluation so PostHog auto-attaches `$feature/button-vs-card` to tracked events. - Users who reach `CreateInstanceCard` without coming through the landing page will get a variant assigned by the cloud app SDK. This is expected — PostHog uses the same hash for the same distinct_id, so the variant will be consistent.
Add a debug endpoint that runs the reconciler against current live state and returns the actions it would emit without applying them. This enables inspecting what the reconciler thinks should happen at any given moment. - Add debugDryRun() method to TownDO that calls reconciler.reconcile() and returns actions + metrics without calling applyAction() - Add POST /debug/towns/:townId/reconcile-dry-run route following the same unauthenticated debug pattern as GET /debug/towns/:townId/status - Response includes actions array, actionsEmitted count, actionsByType breakdown, and pendingEventCount
Wrap debugDryRun() in a SQLite savepoint so it can drain and apply pending town_events (Phase 0) before running reconcile (Phase 1), matching the real alarm loop behavior. The savepoint is rolled back in a finally block so the endpoint remains fully side-effect-free. Adds eventsDrained to the returned metrics.
| return c.json({ alarmStatus, agentMeta, beadSummary }); | ||
| }); | ||
|
|
||
| app.post('/debug/towns/:townId/reconcile-dry-run', async c => { |
There was a problem hiding this comment.
WARNING: This debug route is publicly reachable and returns raw reconcile actions
Because this route is registered before any auth middleware, anyone who knows a townId can trigger a full dry-run reconcile and read the returned Action[]. That response can include internal bead/agent IDs, PR URLs, nudge or mayor messages, and it also lets unauthenticated callers force the worker through the full drain/apply/reconcile path. Please gate this endpoint behind the same auth as other town operator routes, or keep it behind perimeter-only protection before merging.
Code Review SummaryStatus: 2 Issues Found | Recommendation: Address before merge Overview
Fix these issues in Kilo Cloud Issue Details (click to expand)WARNING
Other Observations (not in diff)Issues found in unchanged code that cannot receive inline comments:
Files Reviewed (1 files)
Reviewed by gpt-5.4-20260305 · 264,919 tokens |
…e/head' into gt/toast/c127ebe8
| // Phase 0: Drain and apply pending events (same as real alarm loop) | ||
| const pending = events.drainEvents(this.sql); | ||
| for (const event of pending) { | ||
| reconciler.applyEvent(this.sql, event); |
There was a problem hiding this comment.
WARNING: Dry run no longer matches the real alarm loop when an event is bad
debugDryRun() now applies each drained event without the per-event try/catch that the real alarm loop uses in cloudflare-gastown/src/dos/Town.do.ts:2938. If one pending event throws here, the whole endpoint returns an error and you never see the reconcile actions for the remaining queue, even though the actual alarm tick would log the failure, skip that event, and continue. That makes this preview unreliable հենց when the queue contains the malformed event you're trying to inspect.
c8a756f
into
convoy/reconciler-phase-5-debug-endpoints-grafa/4763028e/head
#1373) * fix: skip container_status events for running containers (#1368) Filter out 'running' status in the alarm pre-phase before calling upsertContainerStatus(). Running is the steady-state for healthy agents and a no-op in applyEvent(), so recording it just bloats the event table (~720 events/hour/agent). Non-running statuses (stopped, error, unknown) still get inserted for reconciler detection. * feat(gastown): add POST /debug/reconcile-dry-run endpoint (#1367) Add a debug endpoint that runs the reconciler against current live state and returns the actions it would emit without applying them. This enables inspecting what the reconciler thinks should happen at any given moment. - Add debugDryRun() method to TownDO that calls reconciler.reconcile() and returns actions + metrics without calling applyAction() - Add POST /debug/towns/:townId/reconcile-dry-run route following the same unauthenticated debug pattern as GET /debug/towns/:townId/status - Response includes actions array, actionsEmitted count, actionsByType breakdown, and pendingEventCount * feat(gastown): add debug dry-run endpoint with event draining (#1370) * feat(claw): evaluate button-vs-card feature flag for PostHog experiment tracking * fix(claw): move button-vs-card flag eval to CreateInstanceCard Moves useFeatureFlagVariantKey('button-vs-card') from ClawDashboard (which renders for all users including those with existing instances) to CreateInstanceCard (which only renders for users who haven't provisioned yet). This scopes the experiment exposure to users who can actually see the create CTA, avoiding population dilution. * feat(gastown): add POST /debug/reconcile-dry-run endpoint Add a debug endpoint that runs the reconciler against current live state and returns the actions it would emit without applying them. This enables inspecting what the reconciler thinks should happen at any given moment. - Add debugDryRun() method to TownDO that calls reconciler.reconcile() and returns actions + metrics without calling applyAction() - Add POST /debug/towns/:townId/reconcile-dry-run route following the same unauthenticated debug pattern as GET /debug/towns/:townId/status - Response includes actions array, actionsEmitted count, actionsByType breakdown, and pendingEventCount * fix(gastown): drain pending events in debugDryRun() before reconciling Wrap debugDryRun() in a SQLite savepoint so it can drain and apply pending town_events (Phase 0) before running reconcile (Phase 1), matching the real alarm loop behavior. The savepoint is rolled back in a finally block so the endpoint remains fully side-effect-free. Adds eventsDrained to the returned metrics. --------- Co-authored-by: kiloconnect[bot] <240665456+kiloconnect[bot]@users.noreply.github.com> Co-authored-by: Pedro Heyerdahl <pedro@kilocode.ai> Co-authored-by: Pedro Heyerdahl <61753986+pedroheyerdahl@users.noreply.github.com> * feat(gastown): add POST /debug/replay-events endpoint for event replay debugging Adds debugReplayEvents(from, to) method to Town.do.ts that queries all town_events in a time range (regardless of processed_at), applies them to reconstruct state transitions, runs the reconciler, and returns the computed actions and a state snapshot. Uses a SQLite SAVEPOINT that is rolled back so the endpoint remains fully side-effect-free. Route: POST /debug/towns/:townId/replay-events Body: { from: ISO, to: ISO } Response: { eventsReplayed, actions, stateSnapshot } * feat(gastown): emit reconciler metrics to Analytics Engine and add Grafana dashboard panels (#1372) - Extend writeEvent() to support double3-double10 fields for reconciler metrics - Emit reconciler_tick event after each alarm tick with all 9 metrics - Add Reconciler row to Grafana dashboard with 6 panels: 1. Events drained per tick (timeseries) 2. Actions emitted per tick by type (stacked bar) 3. Side effects attempted/succeeded/failed (timeseries) 4. Invariant violations (stat with >0 alert threshold) 5. Reconciler wall clock time (timeseries with >500ms threshold) 6. Pending event queue depth (gauge with >50 threshold) * fix(gastown): add replay caveat and fix Grafana pending-events gauge query Add a caveat comment and response field to debugReplayEvents explaining that events are re-applied on top of live state, not from a pre-window snapshot — results are approximate, useful for debugging event flow but not faithful historical reconstruction. Fix the Grafana 'Pending Event Queue Depth' gauge to show the latest row's double8 value instead of averaging across the time window. * feat(gastown): add Sentry capture for reconciler invariant violations Each invariant violation now triggers Sentry.captureMessage with structured context (invariant number, message, townId) as both extra data and tags. Existing analytics event emission is preserved. Added TODO for future auto-recovery of invariant #7 (working agent with no hook). --------- Co-authored-by: kiloconnect[bot] <240665456+kiloconnect[bot]@users.noreply.github.com> Co-authored-by: Pedro Heyerdahl <pedro@kilocode.ai> Co-authored-by: Pedro Heyerdahl <61753986+pedroheyerdahl@users.noreply.github.com>
|
Refinery code review passed. All quality gates pass. |
Summary
Adds a
POST /debug/towns/:townId/reconcile-dry-runendpoint that executes the reconciler's Phase 0 (drain events → apply → mark processed) and Phase 1 (reconcile) against current state, returning the actions it would emit without applying them. The entire operation runs inside a SQLite SAVEPOINT that is rolled back in afinallyblock, keeping the endpoint fully side-effect-free.This gives operators a way to preview what the next alarm tick would do — including the effect of pending unprocessed events — without triggering any side effects.
Verification
Pick<ReconcilerMetrics, ...>fields all exist on the typeVisual Changes
N/A
Reviewer Notes
GET /debug/towns/:townId/statusendpointapplyEventfailure will propagate up and trigger the savepoint rollback. This is intentional for a debug tool where you want errors to be visibleActiontype import was added alongside the existingApplyActionContextimport