Fix zombie minidump server process blocking Windows auto-update#10528
Merged
Conversation
acarl005
commented
May 9, 2026
| minidumper::LoopAction::Continue | ||
| } | ||
| } | ||
|
|
Contributor
Author
There was a problem hiding this comment.
Contributor
|
I'm starting a first review of this pull request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
Contributor
There was a problem hiding this comment.
Overview
This PR updates the minidump server handler to exit the server loop when the last client disconnects, addressing orphaned minidump server processes that can keep the Windows executable locked during updates.
Concerns
- No blocking concerns found in the reviewed diff.
Verdict
Found: 0 critical, 0 important, 0 suggestions
Approve
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
bnavetta
approved these changes
May 9, 2026
3 tasks
cephalonaut
pushed a commit
that referenced
this pull request
May 12, 2026
## Description The minidump crash-reporting server can become a zombie process on Windows, holding a file lock on `warp.exe` and causing the Inno Setup auto-updater to fail with `DeleteFile failed; code 5. Access is denied.` ### Root cause Warp spawns a child process to run the minidump server for native crash reporting. The parent keeps the server alive by sending a ping every 5 seconds; the server's `stale_timeout` (10s) is supposed to reap dead client connections when pings stop arriving. When the parent Warp process exits (cleanly or otherwise), the stale timeout correctly removes the dead client connection — but the `minidumper` crate's [`ServerHandler::on_client_disconnected`](https://github.com/EmbarkStudios/crash-handling/blob/minidumper-0.8.3/minidumper/src/lib.rs#L57-L59) trait method defaults to `LoopAction::Continue`. Since our `Handler` did not override it, the server loop kept running indefinitely after the last client was reaped, polling IOCP for new connections that would never arrive. To be clear: the 10-second `stale_timeout` we pass to `Server::run()` does work — it successfully detects and removes dead client connections. But `stale_timeout` only controls *connection reaping*, not *server shutdown*. After reaping, the server calls `on_client_disconnected(0)` to ask whether to exit. Since the [default implementation returns `LoopAction::Continue`](https://github.com/EmbarkStudios/crash-handling/blob/minidumper-0.8.3/minidumper/src/lib.rs#L57-L59), the server keeps its main loop running — listening on the Unix domain socket for new client connections that will never come. ### Why this is a problem on Windows specifically On Windows, a running process holds an exclusive file lock on its executable. The Inno Setup installer detects that the main Warp instance exited (via the single-instance mutex), but cannot replace `warp.exe` because the orphaned minidump server — a *separate* process using the same binary — is still running and locking the file. This causes a `DeleteFile failed; code 5` error that blocks the entire update. On macOS and Linux, replacing a running executable's file on disk is permitted by the OS, so this bug would not block updates on those platforms (though the zombie process itself is still undesirable). ### Fix Override `on_client_disconnected` on the minidump server's `Handler` to return `LoopAction::Exit` when `num_clients == 0`. This ensures the server process exits promptly after its parent disconnects, releasing the file lock on `warp.exe`. ## Linked Issue Closes #10202 - [x] The linked issue is labeled `ready-to-spec` or `ready-to-implement`. - [ ] Where appropriate, screenshots or a short video of the implementation are included below (especially for user-visible or UI changes). ## Testing Analyzed a Windows process memory dump (`.DMP`) of a zombie Warp process using WinDbg (`cdb.exe`) to confirm the root cause: - Confirmed the zombie was the minidump server (command line included `minidump-server` subcommand) - Thread stacks showed the main thread blocked in `GetQueuedCompletionStatusEx` (IOCP poll inside `minidumper::Server::run`) - Sentry threads (`sentry-transport`, `sentry-session-flusher`) were parked, confirming the server loop never exited to drop the Sentry guard - Process had been alive for >24 hours with its parent long gone - [ ] I have manually tested my changes locally with `./script/run` ## Agent Mode - [x] Warp Agent Mode - This PR was created via Warp's AI Agent Mode Co-Authored-By: Oz <oz-agent@warp.dev> <!-- CHANGELOG-BUG-FIX: [Windows] Fixed zombie minidump server process blocking auto-updates. -->
dagmfactory
pushed a commit
that referenced
this pull request
May 12, 2026
## Description The minidump crash-reporting server can become a zombie process on Windows, holding a file lock on `warp.exe` and causing the Inno Setup auto-updater to fail with `DeleteFile failed; code 5. Access is denied.` ### Root cause Warp spawns a child process to run the minidump server for native crash reporting. The parent keeps the server alive by sending a ping every 5 seconds; the server's `stale_timeout` (10s) is supposed to reap dead client connections when pings stop arriving. When the parent Warp process exits (cleanly or otherwise), the stale timeout correctly removes the dead client connection — but the `minidumper` crate's [`ServerHandler::on_client_disconnected`](https://github.com/EmbarkStudios/crash-handling/blob/minidumper-0.8.3/minidumper/src/lib.rs#L57-L59) trait method defaults to `LoopAction::Continue`. Since our `Handler` did not override it, the server loop kept running indefinitely after the last client was reaped, polling IOCP for new connections that would never arrive. To be clear: the 10-second `stale_timeout` we pass to `Server::run()` does work — it successfully detects and removes dead client connections. But `stale_timeout` only controls *connection reaping*, not *server shutdown*. After reaping, the server calls `on_client_disconnected(0)` to ask whether to exit. Since the [default implementation returns `LoopAction::Continue`](https://github.com/EmbarkStudios/crash-handling/blob/minidumper-0.8.3/minidumper/src/lib.rs#L57-L59), the server keeps its main loop running — listening on the Unix domain socket for new client connections that will never come. ### Why this is a problem on Windows specifically On Windows, a running process holds an exclusive file lock on its executable. The Inno Setup installer detects that the main Warp instance exited (via the single-instance mutex), but cannot replace `warp.exe` because the orphaned minidump server — a *separate* process using the same binary — is still running and locking the file. This causes a `DeleteFile failed; code 5` error that blocks the entire update. On macOS and Linux, replacing a running executable's file on disk is permitted by the OS, so this bug would not block updates on those platforms (though the zombie process itself is still undesirable). ### Fix Override `on_client_disconnected` on the minidump server's `Handler` to return `LoopAction::Exit` when `num_clients == 0`. This ensures the server process exits promptly after its parent disconnects, releasing the file lock on `warp.exe`. ## Linked Issue Closes #10202 - [x] The linked issue is labeled `ready-to-spec` or `ready-to-implement`. - [ ] Where appropriate, screenshots or a short video of the implementation are included below (especially for user-visible or UI changes). ## Testing Analyzed a Windows process memory dump (`.DMP`) of a zombie Warp process using WinDbg (`cdb.exe`) to confirm the root cause: - Confirmed the zombie was the minidump server (command line included `minidump-server` subcommand) - Thread stacks showed the main thread blocked in `GetQueuedCompletionStatusEx` (IOCP poll inside `minidumper::Server::run`) - Sentry threads (`sentry-transport`, `sentry-session-flusher`) were parked, confirming the server loop never exited to drop the Sentry guard - Process had been alive for >24 hours with its parent long gone - [ ] I have manually tested my changes locally with `./script/run` ## Agent Mode - [x] Warp Agent Mode - This PR was created via Warp's AI Agent Mode Co-Authored-By: Oz <oz-agent@warp.dev> <!-- CHANGELOG-BUG-FIX: [Windows] Fixed zombie minidump server process blocking auto-updates. -->
lawsmd
pushed a commit
to lawsmd/cortex
that referenced
this pull request
May 22, 2026
…dotdev#10528) ## Description The minidump crash-reporting server can become a zombie process on Windows, holding a file lock on `warp.exe` and causing the Inno Setup auto-updater to fail with `DeleteFile failed; code 5. Access is denied.` ### Root cause Warp spawns a child process to run the minidump server for native crash reporting. The parent keeps the server alive by sending a ping every 5 seconds; the server's `stale_timeout` (10s) is supposed to reap dead client connections when pings stop arriving. When the parent Warp process exits (cleanly or otherwise), the stale timeout correctly removes the dead client connection — but the `minidumper` crate's [`ServerHandler::on_client_disconnected`](https://github.com/EmbarkStudios/crash-handling/blob/minidumper-0.8.3/minidumper/src/lib.rs#L57-L59) trait method defaults to `LoopAction::Continue`. Since our `Handler` did not override it, the server loop kept running indefinitely after the last client was reaped, polling IOCP for new connections that would never arrive. To be clear: the 10-second `stale_timeout` we pass to `Server::run()` does work — it successfully detects and removes dead client connections. But `stale_timeout` only controls *connection reaping*, not *server shutdown*. After reaping, the server calls `on_client_disconnected(0)` to ask whether to exit. Since the [default implementation returns `LoopAction::Continue`](https://github.com/EmbarkStudios/crash-handling/blob/minidumper-0.8.3/minidumper/src/lib.rs#L57-L59), the server keeps its main loop running — listening on the Unix domain socket for new client connections that will never come. ### Why this is a problem on Windows specifically On Windows, a running process holds an exclusive file lock on its executable. The Inno Setup installer detects that the main Warp instance exited (via the single-instance mutex), but cannot replace `warp.exe` because the orphaned minidump server — a *separate* process using the same binary — is still running and locking the file. This causes a `DeleteFile failed; code 5` error that blocks the entire update. On macOS and Linux, replacing a running executable's file on disk is permitted by the OS, so this bug would not block updates on those platforms (though the zombie process itself is still undesirable). ### Fix Override `on_client_disconnected` on the minidump server's `Handler` to return `LoopAction::Exit` when `num_clients == 0`. This ensures the server process exits promptly after its parent disconnects, releasing the file lock on `warp.exe`. ## Linked Issue Closes warpdotdev#10202 - [x] The linked issue is labeled `ready-to-spec` or `ready-to-implement`. - [ ] Where appropriate, screenshots or a short video of the implementation are included below (especially for user-visible or UI changes). ## Testing Analyzed a Windows process memory dump (`.DMP`) of a zombie Warp process using WinDbg (`cdb.exe`) to confirm the root cause: - Confirmed the zombie was the minidump server (command line included `minidump-server` subcommand) - Thread stacks showed the main thread blocked in `GetQueuedCompletionStatusEx` (IOCP poll inside `minidumper::Server::run`) - Sentry threads (`sentry-transport`, `sentry-session-flusher`) were parked, confirming the server loop never exited to drop the Sentry guard - Process had been alive for >24 hours with its parent long gone - [ ] I have manually tested my changes locally with `./script/run` ## Agent Mode - [x] Warp Agent Mode - This PR was created via Warp's AI Agent Mode Co-Authored-By: Oz <oz-agent@warp.dev> <!-- CHANGELOG-BUG-FIX: [Windows] Fixed zombie minidump server process blocking auto-updates. -->
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The minidump crash-reporting server can become a zombie process on Windows, holding a file lock on
warp.exeand causing the Inno Setup auto-updater to fail withDeleteFile failed; code 5. Access is denied.Root cause
Warp spawns a child process to run the minidump server for native crash reporting. The parent keeps the server alive by sending a ping every 5 seconds; the server's
stale_timeout(10s) is supposed to reap dead client connections when pings stop arriving.When the parent Warp process exits (cleanly or otherwise), the stale timeout correctly removes the dead client connection — but the
minidumpercrate'sServerHandler::on_client_disconnectedtrait method defaults toLoopAction::Continue. Since ourHandlerdid not override it, the server loop kept running indefinitely after the last client was reaped, polling IOCP for new connections that would never arrive.To be clear: the 10-second
stale_timeoutwe pass toServer::run()does work — it successfully detects and removes dead client connections. Butstale_timeoutonly controls connection reaping, not server shutdown. After reaping, the server callson_client_disconnected(0)to ask whether to exit. Since the default implementation returnsLoopAction::Continue, the server keeps its main loop running — listening on the Unix domain socket for new client connections that will never come.Why this is a problem on Windows specifically
On Windows, a running process holds an exclusive file lock on its executable. The Inno Setup installer detects that the main Warp instance exited (via the single-instance mutex), but cannot replace
warp.exebecause the orphaned minidump server — a separate process using the same binary — is still running and locking the file. This causes aDeleteFile failed; code 5error that blocks the entire update.On macOS and Linux, replacing a running executable's file on disk is permitted by the OS, so this bug would not block updates on those platforms (though the zombie process itself is still undesirable).
Fix
Override
on_client_disconnectedon the minidump server'sHandlerto returnLoopAction::Exitwhennum_clients == 0. This ensures the server process exits promptly after its parent disconnects, releasing the file lock onwarp.exe.Linked Issue
Closes #10202
ready-to-specorready-to-implement.Testing
Analyzed a Windows process memory dump (
.DMP) of a zombie Warp process using WinDbg (cdb.exe) to confirm the root cause:Confirmed the zombie was the minidump server (command line included
minidump-serversubcommand)Thread stacks showed the main thread blocked in
GetQueuedCompletionStatusEx(IOCP poll insideminidumper::Server::run)Sentry threads (
sentry-transport,sentry-session-flusher) were parked, confirming the server loop never exited to drop the Sentry guardProcess had been alive for >24 hours with its parent long gone
I have manually tested my changes locally with
./script/runAgent Mode
Co-Authored-By: Oz oz-agent@warp.dev