Skip to content

Complete the WebSocket close handshake in webSocketClose#393

Merged
threepointone merged 1 commit into
mainfrom
fix/close-handshake-389
Apr 27, 2026
Merged

Complete the WebSocket close handshake in webSocketClose#393
threepointone merged 1 commit into
mainfrom
fix/close-handshake-389

Conversation

@threepointone
Copy link
Copy Markdown
Collaborator

Summary

Fixes #389 — clients that initiate a clean WebSocket close get stuck in CLOSING and time out with 1006 (abnormal closure) because PartyServer's webSocketClose never reciprocates the peer's Close frame.

The Hibernation API documents this contract explicitly:

You must call ws.close(code, reason) inside this handler to complete the WebSocket close handshake. Failing to reciprocate the close will result in 1006 errors on the client, representing an abnormal closure per the WebSocket specification.
DurableObject.webSocketClose docs

PartyServer's wrapper was forwarding to user onClose but never calling ws.close(), so every clean client-initiated disconnect ended up dangling.

The non-hibernating path (#attachSocketEventHandlers) had the same omission. It was masked on compat dates >= 2026-04-07 by the runtime's web_socket_auto_reply_to_close flag, but broken on older compat dates.

Changes

  • packages/partyserver/src/index.ts

    • New closeQuietly(ws, code, reason) helper that wraps ws.close() in a try/catch and normalizes reserved close codes (1005, 1006, 10151000) so the framework's reciprocation can never throw InvalidAccessError.
    • Server.webSocketClose now awaits onClose and reciprocates via closeQuietly in a finally. The headline fix.
    • #attachSocketEventHandlers's close listener restructured to handle both synchronous and asynchronous onClose throws, then unconditionally reciprocate. On compat dates >= 2026-04-07 this is a silent no-op (runtime already auto-replied); on older compat dates it's the only way the client gets a clean close.
  • packages/partyserver/src/tests/ — 10 new tests under a Close handshake (#389) describe block, plus 6 fixture DOs (hibernating + in-memory variants of three scenarios):

    Scenario Hibernating In-Memory
    Client-initiated clean close completes the handshake (the literal PartyServer WebSockets do not complete client-initiated close handshake #389 repro)
    onClose receives the peer's code/reason/wasClean
    Throwing onClose still completes the handshake (regression guard for sync vs async throws)
    User calling connection.close() from onClose is idempotent with framework reciprocation
    Reserved close codes don't crash reciprocation
    Server-initiated close still works

Why this is safe

  • Calling close() on an already-closed socket is a documented silent no-op (changelog), so the new reciprocation doesn't conflict with user code that already calls connection.close() from onClose.
  • closeQuietly's try/catch covers the remaining edge cases (oversize reason, future runtime invariants).
  • The fix is in the framework's wrapper only; user-facing APIs are unchanged.
  • Sister packages (y-partyserver, partysub, partysync, partywhen) extend Server but don't override webSocketClose/#attachSocketEventHandlers, so they pick up the fix transparently. y-partyserver's existing awareness cleanup in onClose benefits directly: onClose is now awaited before reciprocation, so awareness state is fully cleaned up before the handshake completes.

Verification

The tests fail loudly without the fix — verified by stashing the source change and re-running:

× client-initiated close completes the handshake (issue #389)
  Timed out after 2000ms waiting for client close event. readyState=2
  (this means the server-side WebSocket never reciprocated the peer's Close frame)

7 of the 10 new tests fail without the fix; all 10 pass with it.

Test plan

Made with Cursor

When a client initiated a clean close, PartyServer's `webSocketClose`
forwarded to user `onClose` but never reciprocated the peer's Close
frame, leaving the client in CLOSING until it timed out and reported
1006 (abnormal closure). The Hibernation API contract requires the
application to reciprocate on every compat date; the standard
`accept()` API requires it on compat dates before 2026-04-07 (where
the runtime's `web_socket_auto_reply_to_close` flag isn't yet active).

Both the hibernating and non-hibernating paths now reciprocate via a
shared `closeQuietly` helper in a `finally` block, after `onClose` has
run (and even when `onClose` throws synchronously or asynchronously).
Reserved close codes (1005, 1006, 1015) are normalized to 1000 so the
reciprocation can never throw `InvalidAccessError`. Calling `close()`
on an already-closed socket is a silent no-op, so user code that
already calls `connection.close()` from `onClose` is unaffected.

Adds 10 regression tests (5 hibernating + 4 non-hibernating + 1
cross-cutting) covering the headline #389 repro, peer code/reason
delivery to onClose, throwing-onClose recovery, idempotent
reciprocation when user code closes from onClose, and reserved-code
normalization. The tests fail loudly without the fix with a clear
"server-side WebSocket never reciprocated the peer's Close frame"
timeout message.

Fixes #389

Made-with: Cursor
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 27, 2026

🦋 Changeset detected

Latest commit: 5fe2f55

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
partyserver Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Apr 27, 2026

Open in StackBlitz

hono-party

npm i https://pkg.pr.new/cloudflare/partykit/hono-party@393

partyfn

npm i https://pkg.pr.new/cloudflare/partykit/partyfn@393

partyserver

npm i https://pkg.pr.new/cloudflare/partykit/partyserver@393

partysocket

npm i https://pkg.pr.new/cloudflare/partykit/partysocket@393

partysub

npm i https://pkg.pr.new/cloudflare/partykit/partysub@393

partysync

npm i https://pkg.pr.new/cloudflare/partykit/partysync@393

partytracks

npm i https://pkg.pr.new/cloudflare/partykit/partytracks@393

partywhen

npm i https://pkg.pr.new/cloudflare/partykit/partywhen@393

y-partyserver

npm i https://pkg.pr.new/cloudflare/partykit/y-partyserver@393

commit: 5fe2f55

@threepointone threepointone merged commit 5335251 into main Apr 27, 2026
6 checks passed
@threepointone threepointone deleted the fix/close-handshake-389 branch April 27, 2026 23:46
@github-actions github-actions Bot mentioned this pull request Apr 27, 2026
threepointone added a commit that referenced this pull request Apr 28, 2026
Followup to #393. The close handshake fix landed in 0.5.4 normalized
the reserved synthetic codes (1005 NoStatusReceived, 1006 AbnormalClosure,
1015 TLSHandshake) to 1000 and reciprocated anyway, on the theory that
calling `ws.close(...)` was always safe. In practice it isn't: those
codes are precisely the runtime's signal that there was no real Close
frame from the peer — the underlying transport is already gone. The
reciprocating `ws.close(...)` succeeds synchronously but schedules an
outbound write on a dead transport, which the runtime later rejects
asynchronously with `Network connection lost` / `WebSocket peer
disconnected`. That rejection can't be observed from `closeQuietly`'s
synchronous try/catch (the call returns void, no Promise to attach a
`.catch` to), so it surfaces as an unhandled promise rejection in
tests and production logs.

Surfaced by Cloudflare Agents sub-agent routing, where the WebSocket
pair is tunneled across Durable Object RPC boundaries: the runtime
delivered `webSocketClose(ws, 1005, "", true)` reliably, our
reciprocation tried to write a Close frame back through an
already-severed RPC link, and vitest reported `Errors 11` on an
otherwise-passing 21-test suite.

The fix is to recognize the reserved-code shape and skip the
reciprocation entirely. There is no peer to acknowledge to. Both the
hibernating `webSocketClose` path and the non-hibernating
`#attachSocketEventHandlers` close listener pick this up via the
shared `closeQuietly` helper.

Adds one regression test under `Close handshake (#389) > hibernating`
that exercises a client `ws.close()` with no code (the cleanest way
to drive a code-1005 arrival on the server in the in-process test
runner) and asserts that `onClose` still runs with the reserved code
while the framework no longer attempts a reciprocation.

User-visible behavior change: a client that calls `ws.close()` with
no code on a server running a compatibility date `< 2026-04-07`
(where the runtime's `web_socket_auto_reply_to_close` flag isn't
yet active) will now observe a non-clean close instead of the
previously-fabricated 1000 reciprocation. Clients that pass an
explicit close code, and any client on compatibility dates
`>= 2026-04-07` (auto-reply does the work), are unaffected.

Verification:
- `npm run check:test -w partyserver`: 73/73 pass.
- `npm run check:type`, `check:lint`, `check:format`: clean.
- Repro suite in cloudflare/agents (examples/assistant): 21/21
  pass with 0 errors. Was 21/21 + 11 unhandled rejection errors
  before this fix.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PartyServer WebSockets do not complete client-initiated close handshake

1 participant