Persist a __ps_name fallback so cold-alarm wakes recover this.name#391
Merged
Merged
Conversation
Reintroduces a one-time-per-DO `__ps_name` storage record written during initialization whenever `ctx.id.name` is observed. This restores the safety net 0.5.0 dropped: when an alarm fires on a cold DO instance after a dev-server restart, workerd does not propagate the name into `ctx.id.name`, so without a fallback `this.name` throws inside `onStart()` and frameworks like Cloudflare Agents surface the error reported in #390. Adds a self-contained reproduction at `fixtures/alarm-restart-e2e/` that runs three Durable Objects side-by-side under `vite dev` + `@cloudflare/vite-plugin`: a raw `DurableObject`, `partyserver@0.5.3` (aliased as `partyserver-stock`), and the workspace partyserver. The fixture's README walks through the cold-alarm-after-restart procedure and shows that stock partyserver throws while the workspace version recovers via the fallback record. Also adds a "Raw runtime contract" describe block in the partyserver test suite that pins workerd's behavior for `ctx.id.name` in `alarm()` on a warm DO under `compatibility_date < 2026-03-15`. Made-with: Cursor
🦋 Changeset detectedLatest commit: f83d968 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
hono-party
partyfn
partyserver
partysocket
partysub
partysync
partytracks
partywhen
y-partyserver
commit: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reintroduces a one-time-per-DO
__ps_namestorage record written during initialization wheneverctx.id.nameis observed. This restores the safety net 0.5.0 dropped: when an alarm fires on a cold DO instance after a dev-server restart, workerd does not propagate the name intoctx.id.name, so without a fallbackthis.namethrows insideonStart()and frameworks like Cloudflare Agents surface the error reported in #390.Closes #390.
What's actually going on
The
compatibility_dateframing in the original issue (and in the public DO docs at https://developers.cloudflare.com/durable-objects/api/id/#name) doesn't quite match the runtime contract. We tested empirically and:ctx.id.nameIS available inalarm()on a warm DO instance, regardless of compatibility date.ctx.id.nameis NOT available inalarm()on a cold DO instance — i.e. when the alarm wakes a fresh DO instance whose alarm record was written across a dev-server / isolate restart. This applies even to alarms scheduled by current workerd (1.20260424.1) under any compatibility date we tried.ctx.id.namestaysundefinedfor the lifetime of that instance — even subsequentidFromName(...)fetches see it asundefined.The agents SDK's
Agent.alarm()callssuper.alarm()first (partyserver's), which runsonStart(), which callsthis.mcp.restoreConnectionsFromStorage(this.name). With stock 0.5.x there's no fallback forthis.nameafter the restart, soonStart()throws and the agent never recovers.The fix
In
Server.#ensureInitialized(), beforeonStart()runs, persistctx.id.nameto a__ps_namestorage record (only if not already present). The next time the same DO is woken — including by a cold alarm — partyserver's existing#hydrateNameFromLegacyStorage()will recover the name even whenctx.id.nameisundefined. One storageputper DO lifetime, idempotent.Also rewords the error message thrown from
Server.nameto mention the actual failure mode (cold alarm, no fallback present) instead of the misleading "alarm scheduled before 2026-03-15" line, and updates the README + 0.5.0 changelog framing accordingly.Reproduction fixture
fixtures/alarm-restart-e2e/runs three Durable Objects side by side undervite dev+@cloudflare/vite-plugin:RawAlarmDurableObject(no PartyServer)StockAlarmServerfrompartyserver@0.5.3(aliased aspartyserver-stock)FixedAlarmServerfrom this workspace's localpartyserverThe fixture's README walks through the cold-alarm-after-restart procedure. Result on
workerd@1.20260424.1,compatibility_date: "2026-01-28":idFromName(...)RawAlarmctx.id.name = "<room>"ctx.id.name = undefinedctx.id.name = undefined(instance is permanently tainted)StockAlarm(0.5.3)this.name = "<room>", no fallback writtenServer.fetchreturns 500 "Cannot determine the name"FixedAlarm(workspace)this.name = "<room>", fallback writtenctx.id.name = undefinedBUTthis.namerecovered from__ps_namethis.namestill recoveredStock matches the user-reported failure exactly. The workspace version recovers cleanly.
Test plan
npm run check:test -w partyserver(62 tests pass, including the new "Raw runtime contract" describe block that pins workerd'sctx.id.namebehavior foridFromName-addressed DOs inalarm()on a warm instance)npm run check:format,npm run check:lint,npm run check:typeall cleanfixtures/alarm-restart-e2e/confirms the bug repros againstpartyserver@0.5.3and is fixed against the workspace partyserverFollowups
ctx.id.namein cold-alarm wakes deserves a clarification (or a fix) on the workerd side. Filed a draft for the DO team summarizing what we observed; happy to push that conversation forward in a separate thread.cf_chat_recoverypath in agents will continue to work, but the mechanism is "frameworks below partyserver write a fallback record on every fetch". If the runtime ever lifts this restriction, we can drop the fallback again.Made with Cursor