chore(gastown): promote gastown-staging to main#2974
Open
Conversation
…e-dev git-token-service's wrangler env.dev overrides the worker name to 'git-token-service-dev', but gastown's env.dev.services binding was still referencing the base 'git-token-service' name. Wrangler's local dev registry does exact-name matching, so the binding showed as [not connected] whenever both workers were running side by side. Every other consumer in the repo (cloud-agent-next, security-sync, security-auto-analysis) already uses 'git-token-service-dev' in their env.dev block; gastown was the outlier.
Contributor
Code Review SummaryStatus: No Issues Found | Recommendation: Merge Files Reviewed (9 files)
Reviewed by gpt-5.5-20260423 · 550,886 tokens |
…2999) When a user changes the mayor's model in town settings, updateAgentModel restarts the SDK server with new KILO_CONFIG_CONTENT and resumes the existing session from kilo.db. Commit 9785570 intentionally stopped sending any session.prompt on resume to avoid duplicating the MAYOR_STARTUP_PROMPT, but that also dropped the model param — so the resumed session kept its prior per-session model until the user ran /model manually. Extract the fresh vs. resumed session-prompt logic into applyModelToSession and on resume send a noReply:true prompt carrying only the new model param. This updates the SDK server's per-session model without replaying the startup prompt. Errors on the resume path are swallowed so the hot-swap still succeeds; the SDK server fell back to the config-loaded model at startup, which was already updated. Add container tests covering both fresh and resumed paths. Co-authored-by: John Fawcett <john@kilcoode.ai>
…vents (#3047) Two independent bugs compose to flood production logs every alarm tick with 'Bead <id> not found' errors: 1. deleteBead / deleteBeads did not clean up the town_events queue, leaving bead_cancelled and container_status rows pointing at deleted beads/agents. 2. applyEvent threw on missing beads and the drain loop never marked the failing event processed — so it retried forever. Fix 1: purge town_events rows (by bead_id OR agent_id, since agents are beads) from deleteBead and the deleteBeads bulk path. Fix 2a: reconciler.applyEvent('bead_cancelled') checks for the target bead up front and returns (with a warn) when it's missing, instead of throwing. Fix 2b: the Town.do.ts drain loop recognises 'Bead/Agent <uuid> not found' terminal errors, logs them at warn, and marks the offending event processed so it stops retrying. Adds debug RPCs (debugTownEvents, debugInsertTownEvent, debugRecordContainerStatus) and integration coverage in event-cleanup.test.ts. Co-authored-by: John Fawcett <john@kilcoode.ai>
…#3055) * feat(gastown-container): add crash visibility + per-agent start mutex Diagnostic changes to investigate frequent container restarts for town 4d82f099-ccb7-4eaf-8676-73562e0a27eb (~1.5–2 min boot-hydration loops). - main.ts: add unhandledRejection listener that logs full error/stack without exiting (Bun/Node silently drop rejections without a handler, making fire-and-forget failures like void saveDbSnapshot()/void subscribeToEvents() invisible). Include uptime and active-agent count for correlation. - main.ts: improve uncaughtException log with name/uptime/agent count. - main.ts: 30s periodic container.memory_usage log (rss/heap/external) so OOM-class failures (external SIGKILL from Cloudflare Containers runtime when the memory ceiling is hit) become observable — these leave no exception behind. - main.ts: wrap bootHydration() in try/catch so a rare synchronous throw before the first await doesn't crash the process. - process-manager.ts: add per-agentId mutex for startAgent. Production logs show two /agents/start requests for the same agentId logged at the same millisecond; both pass the re-entrancy check before either commits a 'starting' record, then race on startupAbortController, session creation, idle timers, and SDK sessionCount. Serialising per agentId makes the re-entrant path observe a consistent snapshot. - process-manager.test.ts: three tests for the mutex — same-id serialisation, different-id concurrency, lock release on throw. * fix(container): replace Promise.withResolvers with explicit new Promise Promise.withResolvers is a newer API not available on older Bun runtimes. Since process-manager.ts is imported during container startup, a missing global would throw before crash handlers are registered and prevent the control server from starting. Use the same explicit new Promise pattern as the existing sdkServerLock. * feat(gastown/container): include townId in crash and memory logs Per review feedback, attach the container's GASTOWN_TOWN_ID to unhandled_rejection, uncaught_exception, cold_start, memory_usage, and boot_hydration_failed log entries so production crash logs can be correlated with a specific town without needing to also have an agent registered. --------- Co-authored-by: John Fawcett <john@kilcoode.ai>
chore(gastown): fix format and lint CI failures on staging Co-authored-by: John Fawcett <john@kilcoode.ai>
…dler (#3074) Co-authored-by: John Fawcett <john@kilcoode.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Batch promote of
gastown-stagingtomain. Four commits since the last promotion, each addressing a distinct gastown issue:21a14d04c532f40edb58406647e95a8bConstituent changes
fix(gastown): point dev
GIT_TOKEN_SERVICEbinding atgit-token-service-dev(21a14d04, no PR — direct push)Local-dev binding fix.
services/git-token-service/wrangler.jsoncoverrides its worker name togit-token-service-devinenv.dev, but gastown'senv.dev.servicesbinding still referenced the basegit-token-servicename. Wrangler's local dev registry does exact-name matching, so the binding showed[not connected]whenever both workers ran side by side. Production binding (top-levelservices) is untouched; only theenv.devblock was changed.fix(gastown): push new model onto resumed mayor session on hot-swap (#2999)
Regression where changing the mayor's model in town settings persisted the config but the running mayor kept using the previous model — users had to run
/modelmanually. Root cause: a prior fix (9785570b9) skipped thesession.prompt(...)call on resumed sessions to avoid a duplicate startup turn, but that call was also responsible for pushing the newmodelfield onto the session. Fix sends asession.promptwithnoReply: trueand the new model on resume, so the model updates without injecting a synthetic user turn.fix(gastown): stop reconciler log spam from orphaned
bead_cancelledevents (#3047)Production logs were filling with
reconciler: applyEvent failed … Bead <id> not foundrepeating every alarm tick forever. Two cooperating bugs: (1)deleteBeadcascaded cleanup to satellite tables but not thetown_eventsreconciler queue, leaving orphan events; (2) the drain loop inTown.do.tsintentionally never marked failed events as processed, expecting retries to eventually succeed — but for a deleted bead they never can. Fix cleans uptown_eventsindeleteBead/deleteBeads, pre-checks bead existence inapplyEvent, and classifiesBead/Agent … not founderrors as terminal in the drain loop.feat(gastown-container): add crash visibility + per-agent start mutex (#3055)
Investigation hooks for repeated container restarts seen on a specific town (~1.5–2 min cadence). Adds an
unhandledRejectionlistener with full stack logging (noprocess.exit), periodic RSS memory logging via the heartbeat path, and a per-agentId mutex instartAgentthat fixes a real concurrency race exposed by duplicate/agents/startlog lines arriving in the same millisecond for the same mayor.Verification
gastown-stagingindependently — see PR links above for per-change verification details.21a14d04was manually verified locally (wrangler dev --env devfor both workers, binding now connects).Reviewer notes
services/gastown(container + DO/worker code) and theenv.devblock ofservices/gastown/wrangler.jsonc. No cross-service changes.