Skip to content

Persist a __ps_name fallback so cold-alarm wakes recover this.name#391

Merged
threepointone merged 1 commit into
mainfrom
fix-name-fallback-for-old-alarm-compat
Apr 27, 2026
Merged

Persist a __ps_name fallback so cold-alarm wakes recover this.name#391
threepointone merged 1 commit into
mainfrom
fix-name-fallback-for-old-alarm-compat

Conversation

@threepointone
Copy link
Copy Markdown
Collaborator

Summary

Reintroduces a one-time-per-DO __ps_name storage record written during initialization whenever ctx.id.name is observed. This restores the safety net 0.5.0 dropped: when an alarm fires on a cold DO instance after a dev-server restart, workerd does not propagate the name into ctx.id.name, so without a fallback this.name throws inside onStart() and frameworks like Cloudflare Agents surface the error reported in #390.

Closes #390.

What's actually going on

The compatibility_date framing in the original issue (and in the public DO docs at https://developers.cloudflare.com/durable-objects/api/id/#name) doesn't quite match the runtime contract. We tested empirically and:

  • ctx.id.name IS available in alarm() on a warm DO instance, regardless of compatibility date.
  • ctx.id.name is NOT available in alarm() on a cold DO instance — i.e. when the alarm wakes a fresh DO instance whose alarm record was written across a dev-server / isolate restart. This applies even to alarms scheduled by current workerd (1.20260424.1) under any compatibility date we tried.
  • Once a DO instance is born nameless via cold alarm, ctx.id.name stays undefined for the lifetime of that instance — even subsequent idFromName(...) fetches see it as undefined.

The agents SDK's Agent.alarm() calls super.alarm() first (partyserver's), which runs onStart(), which calls this.mcp.restoreConnectionsFromStorage(this.name). With stock 0.5.x there's no fallback for this.name after the restart, so onStart() throws and the agent never recovers.

The fix

In Server.#ensureInitialized(), before onStart() runs, persist ctx.id.name to a __ps_name storage record (only if not already present). The next time the same DO is woken — including by a cold alarm — partyserver's existing #hydrateNameFromLegacyStorage() will recover the name even when ctx.id.name is undefined. One storage put per DO lifetime, idempotent.

Also rewords the error message thrown from Server.name to mention the actual failure mode (cold alarm, no fallback present) instead of the misleading "alarm scheduled before 2026-03-15" line, and updates the README + 0.5.0 changelog framing accordingly.

Reproduction fixture

fixtures/alarm-restart-e2e/ runs three Durable Objects side by side under vite dev + @cloudflare/vite-plugin:

DO Extends
RawAlarm DurableObject (no PartyServer)
StockAlarm Server from partyserver@0.5.3 (aliased as partyserver-stock)
FixedAlarm Server from this workspace's local partyserver

The fixture's README walks through the cold-alarm-after-restart procedure. Result on workerd@1.20260424.1, compatibility_date: "2026-01-28":

DO session A: fetch session B: cold alarm session B: subsequent fetch via idFromName(...)
RawAlarm ctx.id.name = "<room>" ctx.id.name = undefined ctx.id.name = undefined (instance is permanently tainted)
StockAlarm (0.5.3) this.name = "<room>", no fallback written Server.fetch returns 500 "Cannot determine the name" still 500
FixedAlarm (workspace) this.name = "<room>", fallback written ctx.id.name = undefined BUT this.name recovered from __ps_name this.name still recovered

Stock matches the user-reported failure exactly. The workspace version recovers cleanly.

Test plan

  • npm run check:test -w partyserver (62 tests pass, including the new "Raw runtime contract" describe block that pins workerd's ctx.id.name behavior for idFromName-addressed DOs in alarm() on a warm instance)
  • npm run check:format, npm run check:lint, npm run check:type all clean
  • Manual reproduction via fixtures/alarm-restart-e2e/ confirms the bug repros against partyserver@0.5.3 and is fixed against the workspace partyserver
  • Sanity check downstream agents SDK once this is published

Followups

  • The runtime contract for ctx.id.name in cold-alarm wakes deserves a clarification (or a fix) on the workerd side. Filed a draft for the DO team summarizing what we observed; happy to push that conversation forward in a separate thread.
  • The user-reported cf_chat_recovery path in agents will continue to work, but the mechanism is "frameworks below partyserver write a fallback record on every fetch". If the runtime ever lifts this restriction, we can drop the fallback again.

Made with Cursor

Reintroduces a one-time-per-DO `__ps_name` storage record written
during initialization whenever `ctx.id.name` is observed. This
restores the safety net 0.5.0 dropped: when an alarm fires on a cold
DO instance after a dev-server restart, workerd does not propagate
the name into `ctx.id.name`, so without a fallback `this.name` throws
inside `onStart()` and frameworks like Cloudflare Agents surface the
error reported in #390.

Adds a self-contained reproduction at
`fixtures/alarm-restart-e2e/` that runs three Durable Objects
side-by-side under `vite dev` + `@cloudflare/vite-plugin`: a raw
`DurableObject`, `partyserver@0.5.3` (aliased as `partyserver-stock`),
and the workspace partyserver. The fixture's README walks through
the cold-alarm-after-restart procedure and shows that stock
partyserver throws while the workspace version recovers via the
fallback record.

Also adds a "Raw runtime contract" describe block in the partyserver
test suite that pins workerd's behavior for `ctx.id.name` in
`alarm()` on a warm DO under `compatibility_date < 2026-03-15`.

Made-with: Cursor
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 27, 2026

🦋 Changeset detected

Latest commit: f83d968

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
partyserver Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Apr 27, 2026

Open in StackBlitz

hono-party

npm i https://pkg.pr.new/cloudflare/partykit/hono-party@391

partyfn

npm i https://pkg.pr.new/cloudflare/partykit/partyfn@391

partyserver

npm i https://pkg.pr.new/cloudflare/partykit/partyserver@391

partysocket

npm i https://pkg.pr.new/cloudflare/partykit/partysocket@391

partysub

npm i https://pkg.pr.new/cloudflare/partykit/partysub@391

partysync

npm i https://pkg.pr.new/cloudflare/partykit/partysync@391

partytracks

npm i https://pkg.pr.new/cloudflare/partykit/partytracks@391

partywhen

npm i https://pkg.pr.new/cloudflare/partykit/partywhen@391

y-partyserver

npm i https://pkg.pr.new/cloudflare/partykit/y-partyserver@391

commit: f83d968

@threepointone threepointone merged commit 6273c96 into main Apr 27, 2026
6 checks passed
@threepointone threepointone deleted the fix-name-fallback-for-old-alarm-compat branch April 27, 2026 16:56
@github-actions github-actions Bot mentioned this pull request Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Alarms on fresh 0.5.x DOs with pre-2026-03-15 compat date fail to recover this.name

1 participant