Skip to content

[miniflare] Close all open handles on dispose to prevent process hangs#13515

Merged
petebacondarwin merged 14 commits intomainfrom
fix/close-runtime-dispatcher-pool
Apr 17, 2026
Merged

[miniflare] Close all open handles on dispose to prevent process hangs#13515
petebacondarwin merged 14 commits intomainfrom
fix/close-runtime-dispatcher-pool

Conversation

@petebacondarwin
Copy link
Copy Markdown
Contributor

@petebacondarwin petebacondarwin commented Apr 15, 2026

Supersedes #13520, #13521, #13522, #13523.

Problem

The start-worker-node-test fixture (which uses node --test to exercise the unstable_startWorker API) was hanging indefinitely in CI. Unlike vitest-based tests, node --test runs each test file as a child process and waits for it to exit naturally. If any event-loop handles remain open after tests complete, the process never exits and CI hangs forever.

The root cause was not a single leak but a cascade of unclosed resources across Miniflare.dispose() and the wrangler DevEnv teardown path. Each fix in this PR addresses a distinct handle that could keep the Node.js event loop alive after disposal.

Changes

miniflare (patch)

Undici Pool leaks:

  • Close the #runtimeDispatcher (undici Pool) in dispose() after the runtime is killed. Without this, lingering TCP sockets in the pool keep the event loop alive indefinitely.
  • Close the old #runtimeDispatcher Pool when the runtime entry URL changes during config updates, preventing socket leaks on every setOptions() call.
  • Close the #devRegistryDispatcher Pool in dispose() — same class of leak as #runtimeDispatcher.
  • Close the old #devRegistryDispatcher Pool before replacement during config updates — without this, every call to #assembleAndUpdateConfig() leaked the previous Pool's TCP sockets.

WebSocket and HTTP server leaks:

  • Close WebSocketServer instances (#liveReloadServer, #webSocketServer) in dispose() so connected client sockets don't keep the event loop alive. These use noServer: true so they don't own an HTTP server, but connected WebSocket clients still hold open sockets.
  • Force-close connections in InspectorProxyController.dispose() by calling server.closeAllConnections() before server.close(), so active keep-alive/WebSocket connections don't prevent the close callback from firing.
  • Close InspectorProxy runtime WebSocket on dispose instead of relying on process death to break the connection.

Hyperdrive proxy:

  • Fix HyperdriveProxyController.dispose(): add missing return in .map() callback so Promise.allSettled actually waits for net.Server instances to close.
  • Don't block dispose on net.Server.close() callback — TCP proxy servers with lingering sockets (e.g. from in-progress TLS negotiation) would block indefinitely. Just call close() to stop accepting new connections without awaiting the callback.

IPC channel / exit-hook:

  • Replace the third-party exit-hook package with a lightweight inline implementation that removes all process listeners when the last callback is unregistered. The original package registers a permanent process.on('message') listener that refs the IPC channel handle, preventing node --test child processes from exiting after tests complete.

Miscellaneous:

  • Clear ProxyClientBridge finalization batch setTimeout on dispose.

wrangler (patch)

  • Fix esbuild cleanup race in BundlerController: the cleanup function returned by runBuild() now stops esbuild immediately if the build has completed, or fire-and-forget the cleanup chain if it hasn't. The previous approach awaited buildPromise in the cleanup function, which could block the entire DevEnv teardown if the initial build was still in flight.

Fixture changes

  • Disable persistence (persist: false) in start-worker-node-test to avoid SQLITE_BUSY_RECOVERY errors when back-to-back test runs contend on the same .wrangler/state directory.
  • Set --test-timeout=15000 on the fixture to fail fast on genuine hangs while giving sufficient headroom for Windows CI, where workerd process teardown and TCP socket cleanup can take ~7s.

Verification

  • 65+ consecutive runs of start-worker-node-test passed without hanging
  • miniflare test suite: 730/730 tests passed
  • miniflare-node-test fixture: 6/6 passed

  • Tests
    • Automated tests not possible - manual testing has been completed as follows: 65+ consecutive loop of start-worker-node-test and miniflare-node-test fixtures, plus full miniflare test suite
    • Tests included/updated
  • Public documentation
    • Documentation not necessary because: internal implementation fix with no user-facing API change

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 15, 2026

🦋 Changeset detected

Latest commit: e82c46d

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-project-automation github-project-automation bot moved this to Untriaged in workers-sdk Apr 15, 2026
@workers-devprod workers-devprod requested review from a team and jamesopstad and removed request for a team April 15, 2026 13:13
@workers-devprod
Copy link
Copy Markdown
Contributor

workers-devprod commented Apr 15, 2026

Codeowners approval required for this PR:

  • ✅ @cloudflare/wrangler
Show detailed file reviewers

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 15, 2026

Changeset Review

Files Reviewed

  • .changeset/close-runtime-dispatcher-pool.md
  • .changeset/fix-esbuild-cleanup-race.md

Summary

✅ All changesets look good

Review Details

close-runtime-dispatcher-pool.md

Criteria Status
Version Type patch appropriate for bug fix
Changelog Quality ✅ Clear description of the cleanup issue with specific resources listed
Markdown Headers ✅ No h1/h2/h3 headers
Analytics ✅ N/A - no analytics changes
Dependabot ✅ N/A
Experimental Features ✅ N/A

fix-esbuild-cleanup-race.md

Criteria Status
Version Type patch appropriate for bug fix
Changelog Quality ✅ Clear explanation of the race condition and fix
Markdown Headers ✅ No h1/h2/h3 headers
Analytics ✅ N/A - no analytics changes
Dependabot ✅ N/A
Experimental Features ✅ N/A

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Apr 15, 2026

The function is async, so void is used intentionally to fire-and-forget the old pool close (no-floating-promises lint rule). This is reasonable — waiting for the old pool to close isn't necessary before creating the new one, and the old pool is being discarded anyway.

Both changes are sound:

  1. Line 2349 (void this.#runtimeDispatcher?.close()): Fire-and-forget close of the old pool when the entry URL changes. The old pool reference is immediately overwritten on the next line, so awaiting is unnecessary. void satisfies the no-floating-promises lint rule.

  2. Line 3021 (await this.#runtimeDispatcher?.close()): Awaited close during dispose(), correctly placed after this.#runtime?.dispose() so the workerd process is already killed and in-flight connections will terminate immediately.

The changeset is correctly formatted as a patch bump.

LGTM

github run

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Apr 15, 2026

create-cloudflare

npm i https://pkg.pr.new/create-cloudflare@13515

@cloudflare/kv-asset-handler

npm i https://pkg.pr.new/@cloudflare/kv-asset-handler@13515

miniflare

npm i https://pkg.pr.new/miniflare@13515

@cloudflare/pages-shared

npm i https://pkg.pr.new/@cloudflare/pages-shared@13515

@cloudflare/unenv-preset

npm i https://pkg.pr.new/@cloudflare/unenv-preset@13515

@cloudflare/vite-plugin

npm i https://pkg.pr.new/@cloudflare/vite-plugin@13515

@cloudflare/vitest-pool-workers

npm i https://pkg.pr.new/@cloudflare/vitest-pool-workers@13515

@cloudflare/workers-editor-shared

npm i https://pkg.pr.new/@cloudflare/workers-editor-shared@13515

wrangler

npm i https://pkg.pr.new/wrangler@13515

commit: e82c46d

Copy link
Copy Markdown
Contributor

@workers-devprod workers-devprod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codeowners reviews satisfied

@github-project-automation github-project-automation bot moved this from Untriaged to Approved in workers-sdk Apr 15, 2026
@petebacondarwin petebacondarwin changed the title [miniflare] Close undici Pool on dispose to prevent test hangs [miniflare] Close all open handles on dispose to prevent process hangs Apr 16, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

jamesopstad
jamesopstad previously approved these changes Apr 16, 2026
@petebacondarwin petebacondarwin marked this pull request as draft April 16, 2026 10:16
@petebacondarwin
Copy link
Copy Markdown
Contributor Author

Thanks for review @jamesopstad but I think there is something bad going on here - switched to draft for now

@petebacondarwin petebacondarwin force-pushed the fix/close-runtime-dispatcher-pool branch 2 times, most recently from 8d05cc9 to ba7c019 Compare April 16, 2026 16:09
@petebacondarwin petebacondarwin marked this pull request as ready for review April 17, 2026 07:50
@petebacondarwin petebacondarwin force-pushed the fix/close-runtime-dispatcher-pool branch from 89950d7 to 71237ce Compare April 17, 2026 07:50
Comment thread packages/miniflare/src/exit-hook.ts Outdated
@penalosa penalosa added run-remote-tests run-c3-frameworks-tests Run all the C3 frameworks tests on this PR labels Apr 17, 2026
@petebacondarwin petebacondarwin dismissed workers-devprod’s stale review April 17, 2026 11:31

Force a code-owners-plus review

Comment thread packages/miniflare/src/index.ts
petebacondarwin and others added 13 commits April 17, 2026 12:33
- Fix esbuild cleanup race in BundlerController: await build promise
  before calling stopWatching so esbuild context is always disposed
- Fix HyperdriveProxyController.dispose(): add missing return in
  .map() callback so net.Server close is actually awaited
- Close WebSocketServer instances (#liveReloadServer, #webSocketServer)
  in Miniflare.dispose() so connected client sockets are cleaned up
- Close InspectorProxy runtime WebSocket on dispose instead of relying
  on process death to break the connection
- Clear ProxyClientBridge finalizeBatch setTimeout on dispose
…ispose

Fold in fix from PR #13523: call server.closeAllConnections() before
server.close() so active HTTP keep-alive or WebSocket connections
don't prevent the close callback from firing.
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The previous approach awaited buildPromise in the cleanup function,
which could block the entire DevEnv teardown if the initial build
was still in flight. Instead, stop esbuild immediately if the build
has completed, or fire-and-forget the cleanup chain if it hasn't.
Use persist: false to avoid SQLite SQLITE_BUSY_RECOVERY errors
when back-to-back test runs contend on the same .wrangler/state
directory. These tests don't need persistence between runs.
net.Server.close() waits for all existing connections to end before
calling its callback. For TCP proxy servers with lingering sockets
(e.g. from in-progress TLS negotiation), this blocks dispose
indefinitely. Just call close() to stop accepting new connections
without awaiting the callback.
… IPC channel

The third-party exit-hook package registers a permanent
process.on('message') listener that is never removed. In Node.js,
a 'message' listener refs the IPC channel handle, preventing child
processes (e.g. node --test runners) from exiting after tests complete.

Replace it with a lightweight inline module that removes all process
listeners when the last callback is unregistered, allowing clean exit.

Also close #devRegistryDispatcher (undici Pool) during dispose —
same class of leaked-socket bug already fixed for #runtimeDispatcher.
During config updates, the previous #devRegistryDispatcher Pool was
silently overwritten without being closed, leaking TCP sockets. This
mirrors the existing cleanup for #runtimeDispatcher at the same call
site.
The 8s timeout was marginally too tight for Windows CI where workerd
process teardown and TCP socket cleanup can take ~7s. Bumping to 15s
gives sufficient headroom while still catching genuine hangs.
@petebacondarwin petebacondarwin force-pushed the fix/close-runtime-dispatcher-pool branch from b31f554 to 1311255 Compare April 17, 2026 11:33
Copy link
Copy Markdown
Contributor

@jamesopstad jamesopstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving but posted a couple of non-blocking comments from the Devin review that could be addressed here or as follow ups.

Copy link
Copy Markdown
Contributor

@workers-devprod workers-devprod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codeowners reviews satisfied

@petebacondarwin petebacondarwin merged commit b35617b into main Apr 17, 2026
95 of 100 checks passed
@petebacondarwin petebacondarwin deleted the fix/close-runtime-dispatcher-pool branch April 17, 2026 14:53
@github-project-automation github-project-automation bot moved this from Approved to Done in workers-sdk Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-c3-frameworks-tests Run all the C3 frameworks tests on this PR

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants