fix(windows): make port/process takeover actually free the port by senamakel · Pull Request #2552 · tinyhumansai/openhuman

senamakel · 2026-05-23T21:11:15Z

Summary

Windows-only fix for the stale-listener recovery path in the Tauri shell — startup-time port takeover and force-kill were failing in three distinct ways.
kill_pid_force now treats taskkill exit 128 ("process not found") as success, matching Unix ESRCH semantics. Previously every race between PID lookup and force-kill aborted recovery.
parse_netstat_pid skips kernel-protected PIDs 0 (System Idle) and 4 (NT Kernel & System), which surface for HTTP.sys / driver-level socket reservations and cannot be signalled from user mode.
Adds Windows-only unit tests (exit-code classification, protected-PID refusal) and an integration test that spawns a real PowerShell TcpListener and walks the full find_pid_on_port → kill_pid_force flow end to end, including idempotency on an already-gone pid.

Problem

On Windows, when a previous OpenHuman session left a stale core process bound to the RPC port, startup recovery (CoreProcessHandle::takeover_stale_listener) often failed to free the port:

kill_pid_force used status().success() to detect failure. taskkill /F /T /PID <pid> returns exit code 128 ("There is no running instance of the task") when the process exits between our pid lookup and the kill — a normal race. The Unix branch already handles the equivalent ESRCH as success; the Windows branch was strictly stricter and aborted recovery instead.
parse_netstat_pid happily returned PID 0 (System Idle Process) or PID 4 (NT Kernel & System) when those showed up as LISTENING on the port. Those PIDs are kernel-owned and taskkill will always fail against them, breaking recovery. They can appear when HTTP.sys or another driver has the socket reserved — in that case the right behavior is to fall back to a different port, not to try to kill the kernel.
kill_pid_term / kill_pid_force had no defense in depth against ever receiving these protected PIDs from a future caller.

Solution

app/src-tauri/src/process_kill.rs:

New classify_taskkill_force_status(code, stderr, pid) — exit 0 → Ok, exit 128 → Ok (already gone), stderr containing "not found" / "could not be terminated" → Ok (for hosts that normalize the exit code), everything else → Err with full diagnostic.
kill_pid_force now uses .output() and routes through the classifier so the stderr fallback path is reachable.
New is_protected_windows_pid(pid) — pid == 0 || pid == 4. Both kill_pid_term and kill_pid_force refuse protected PIDs with a descriptive error.

app/src-tauri/src/core_process.rs:

parse_netstat_pid skips lines whose PID is 0 or 4 and continues scanning (so a dual-stack listener like [::]:port showing under PID 4 ahead of the real 127.0.0.1:port owner still returns the genuine PID).

Tests:

Cross-platform parser units: parse_netstat_pid_skips_protected_kernel_pids, parse_netstat_pid_falls_through_protected_to_real_owner_on_dual_stack.
Windows-only units (#[cfg(all(test, windows))] in process_kill.rs): exit 0 / exit 128 / stderr-only "not found" / genuine access-denied classifications, refusal of pids 0 and 4 by both kill functions, and a real-process round-trip that spawns a cmd /C timeout child, force-kills it, and asserts the same PID can be force-killed again with no error.
Windows-only end-to-end (#[cfg(windows)] in core_process_tests.rs): windows_port_takeover_finds_and_kills_listener spawns a real PowerShell TCP listener on an ephemeral port, walks the production path (find_pid_on_port → kill_pid_force → poll is_port_open), and asserts the port is actually freed.

Submission Checklist

Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
N/A: diff-coverage gate — changed code is platform-specific Windows logic gated on #[cfg(windows)]; macOS/Linux coverage runs cannot exercise those paths. Cross-platform parser changes are covered by the new parse_netstat_pid_* unit tests that run on every host.
N/A: coverage matrix — behaviour-only change inside an existing lifecycle path; no new user-facing feature row.
N/A: feature IDs — no matrix entries affected (see above).
No new external network dependencies introduced (mock backend used per Testing Strategy)
N/A: manual smoke checklist — internal lifecycle fix, no release-cut surface affected.
N/A: linked issue — fix discovered during follow-up debugging of Proactively kill stale old OpenHuman RPC processes before the Tauri app reuses their port #1130 / Windows lifecycle work; no dedicated issue filed.

Impact

Platform: Windows-only behavior change inside app/src-tauri (the Tauri host). macOS and Linux paths are unchanged.
Runtime: Windows users who hard-quit OpenHuman should now see startup recover from a stale core process reliably instead of aborting with "taskkill exited with...".
Security: protected kernel PIDs (0/4) can no longer be passed into taskkill from this code path.
Performance: negligible. One extra cheap check per kill call.

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Key: N/A
URL: N/A

Commit & Branch

Branch: fix/windows-port-kill
Commit SHA: 69eea0121ee1da464e25655c0ecb3447e5089b72

Validation Run

pnpm --filter openhuman-app format:check
N/A: pnpm typecheck — pre-existing failure on upstream/main (missing qrcode.react, @noble/ciphers/*, @tauri-apps/plugin-barcode-scanner deps in iOS-client files I did not touch). Reproduced on a clean upstream/main checkout. Used git push --no-verify per the CLAUDE.md policy for pre-existing unrelated breakage.
Focused tests: cargo test --manifest-path app/src-tauri/Cargo.toml --lib parse_netstat_pid — 3/3 pass.
N/A: Rust fmt/check (core) — no core crate changes.
Tauri fmt/check (changed): cargo check --manifest-path app/src-tauri/Cargo.toml --tests — clean.

Validation Blocked

command: pnpm typecheck
error: Cannot find module 'qrcode.react' / '@noble/ciphers/chacha' / '@noble/ciphers/webcrypto' / '@tauri-apps/plugin-barcode-scanner'
impact: None — reproduces verbatim on a clean upstream/main checkout, files are iOS-client code I did not modify.

Behavior Changes

Intended behavior change: Windows port takeover now succeeds in two scenarios that previously aborted recovery (process exits mid-kill; kernel-reserved socket on the port).
User-visible effect: Windows users no longer see startup hang/abort after a hard quit when the previous core left a stale listener.

Parity Contract

Legacy behavior preserved: Unix branches unchanged; Windows kill_pid_force still surfaces genuine failures (access denied, spawn failure, unexpected exit codes) — only the "already gone" race is reclassified.
Guard/fallback/dispatch parity checks: parse_netstat_pid still returns Some(pid) for the same real-listener inputs the existing test covers; the new skip applies only to PIDs 0 and 4.

Duplicate / Superseded PR Handling

Duplicate PR(s): None
Canonical PR: This one
Resolution: N/A

Summary by CodeRabbit

Bug Fixes
- Windows process termination now safely skips kernel-protected processes, preventing potential system errors
- Improved port listener recovery to gracefully handle protected kernel-owned processes and fall back to alternative routing
Tests
- Added comprehensive Windows process handling tests covering protected process scenarios, edge cases, and port cleanup verification

Three bugs in the Windows side of the stale-listener recovery path: 1. `kill_pid_force` returned `Err` for every non-zero taskkill exit, including exit code 128 ("There is no running instance"). That's the normal race when the process exits between our pid lookup and the force-kill — recovery would bail out instead of treating the port as freed. Now classified through `classify_taskkill_force_status`, with the same "already gone is success" semantics Unix already has for ESRCH. Also recognizes the stderr "not found" message when an intermediate shell normalizes the exit code. 2. `parse_netstat_pid` happily returned PIDs 0 (System Idle) and 4 (NT Kernel) when HTTP.sys / driver-level reservations showed up as LISTENING. Those can't be killed from user mode — taskkill errors, and recovery aborts. Now skipped so callers fall back to the port-reroute path. Also walks past the protected entry to the real user-mode owner on dual-stack (`[::]:port` + `127.0.0.1:port`) lines. 3. `kill_pid_term` / `kill_pid_force` now refuse pids 0 and 4 as a second line of defense in case the parser is ever bypassed. Tests: - Parser: existing `parse_netstat_pid_finds_listening_entry` plus new `parse_netstat_pid_skips_protected_kernel_pids` and `parse_netstat_pid_falls_through_protected_to_real_owner_on_dual_stack` (cross-platform). - Windows-only unit: `classify_taskkill_force_*` covers exit 0, exit 128, stderr-only "not found", and genuine access-denied; plus `is_protected_windows_pid` and refusal of pids 0/4 at the public API. - Windows-only end-to-end: `windows_port_takeover_finds_and_kills_listener` spawns a real PowerShell TCP listener on an ephemeral port, walks the production path (`find_pid_on_port` → `kill_pid_force` → poll port), asserts the port is freed, and re-calls `kill_pid_force` to verify idempotency on an already-gone pid.

coderabbitai · 2026-05-23T21:11:29Z

Warning

Review limit reached

@senamakel, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 2 reviews/hour. Refill in 36 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5926efdf-ca72-4a32-9865-a59a517ccdd4

📥 Commits

Reviewing files that changed from the base of the PR and between af340d0 and 80af8d4.

📒 Files selected for processing (1)

app/src-tauri/src/process_kill.rs

📝 Walkthrough

Walkthrough

This PR prevents attempts to kill kernel-owned Windows PIDs (0 and 4) by skipping them during netstat parsing, refusing termination requests, adding taskkill result classification, and providing unit and end-to-end tests validating the flow and idempotency.

Changes

Windows Protected PID Handling for Port Takeover

Layer / File(s)	Summary
Protected PID contract and foundation `app/src-tauri/src/process_kill.rs`, `app/src-tauri/src/core_process.rs`	Adds `is_protected_windows_pid` (PIDs 0 and 4) and documentation describing kernel-owned, unkillable PIDs; establishes guarded checks used by parsing and termination code.
Netstat discovery filtering `app/src-tauri/src/core_process.rs`, `app/src-tauri/src/core_process_tests.rs`	Windows netstat parsing now logs and skips protected PIDs, continuing to scan for real user-mode owners; unit tests cover single- and dual-stack scenarios.
Process termination safety and result classification `app/src-tauri/src/process_kill.rs`	`kill_pid_term`/`kill_pid_force` refuse protected PIDs; `kill_pid_force` captures `taskkill` output and uses `classify_taskkill_force_status` to treat exit 0, 128, and "not found"/already-gone stderr as success while surfacing other failures.
End-to-end port takeover integration test `app/src-tauri/src/core_process_tests.rs`, `app/src-tauri/src/process_kill.rs`	Windows integration test spawns a PowerShell TCP listener, finds and force-kills it via production code, verifies the port is released, and asserts `kill_pid_force` is idempotent when re-invoked.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

bug

Suggested reviewers

graycyrus
M3gA-Mind

Poem

🐇 I sniffed the netstat lines at dawn,
Two guarded PIDs I left alone.
Taskkill learns to say "it's gone" with grace,
Ports freed, no kernel chase.
A tiny hop — the startup's calm is grown.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: fixing Windows port/process takeover to actually free the port by properly handling protected PIDs and improving taskkill classification. It directly reflects the core objective of the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/src-tauri/src/core_process_tests.rs`:
- Around line 384-389: The assertion comparing pid and child.id() can panic
before the spawned child is cleaned up; modify the test around the assert_eq! so
that if pid != child.id() you first ensure the spawned child (child) is
terminated (call child.kill() and wait/ignore errors) and/or join its handle,
then panic or assert-fail with the same message; alternatively replace the
direct assert_eq! with an if-check that performs child.kill().ok() and
child.wait().ok() before invoking panic! or failing the test, referencing the
existing pid and child.id() variables.

In `@app/src-tauri/src/process_kill.rs`:
- Around line 213-216: The current predicate in classify_taskkill_force_status
treats any stderr_str containing "could not be terminated" as a harmless
"already gone" case; change the condition so that stderr_str is only considered
Ok when it clearly indicates the process is absent (e.g., contains "not found"
or "no running instance") and do NOT treat a plain "could not be terminated" as
Ok—require it to be paired with "not found" (i.e., ("could not be terminated" &&
"not found")) or explicitly exclude stderr_str that contains "Access is denied";
update the if condition around stderr_str in classify_taskkill_force_status
accordingly to ensure access-denied errors are returned Err instead of Ok.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7da6c907-65b6-46d2-a6f6-d6f2d471cc88

📥 Commits

Reviewing files that changed from the base of the PR and between 4c6007b and 69eea01.

📒 Files selected for processing (3)

app/src-tauri/src/core_process.rs
app/src-tauri/src/core_process_tests.rs
app/src-tauri/src/process_kill.rs

…ess paired CodeRabbit feedback on PR tinyhumansai#2552: 1. `classify_taskkill_force_status` was treating any stderr containing "could not be terminated" as success, but that phrase also appears in access-denied messages like "...could not be terminated. Reason: Access is denied." — masking real failures and leaving the process alive. Now requires "no running instance of the task" OR a pairing with "not found" before treating the run as success; bare access-denied stderr now propagates Err as intended. Added explicit tests for both the access-denied case (must fail) and the canonical `/T`-tree "no running instance of the task" case (must succeed). 2. `windows_port_takeover_finds_and_kills_listener` could panic on the pid sanity check before the spawned PowerShell listener was reaped, leaving a 60-second listener squatting on the test port. Now tears down the child via `kill()` + `wait()` before any panic.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/src-tauri/src/process_kill.rs`:
- Around line 281-288: The test
classify_taskkill_force_treats_no_running_instance_as_success currently passes
Some(128) which triggers the early exit path and never exercises the stderr
fallback branch; change the exit code argument to a non-128 masked value (e.g.,
Some(1) or Some(0)) so classify_taskkill_force_status actually inspects stderr
and validates the "no running instance of the task" message in the test body,
ensuring the stderr-matching branch is covered; update only the argument passed
to classify_taskkill_force_status in that test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: acb78815-7269-4458-bf18-03b735b26640

📥 Commits

Reviewing files that changed from the base of the PR and between 69eea01 and af340d0.

📒 Files selected for processing (2)

app/src-tauri/src/core_process_tests.rs
app/src-tauri/src/process_kill.rs

Per CodeRabbit on PR tinyhumansai#2552: passing Some(128) hit the early exit-128 fast path in classify_taskkill_force_status and never evaluated the new "no running instance of the task" stderr branch — so a regression in the fallback matcher would silently still pass. Use Some(1) (a masked non-128 code) so the stderr matching is actually covered.

senamakel added 2 commits May 23, 2026 12:03

chore: apply auto-fixes from pre-push

69eea01

senamakel requested a review from a team May 23, 2026 21:11

coderabbitai Bot added the bug label May 23, 2026

coderabbitai Bot requested changes May 23, 2026

View reviewed changes

Comment thread app/src-tauri/src/core_process_tests.rs Outdated

Comment thread app/src-tauri/src/process_kill.rs Outdated

coderabbitai Bot requested changes May 23, 2026

View reviewed changes

Comment thread app/src-tauri/src/process_kill.rs

coderabbitai Bot approved these changes May 23, 2026

View reviewed changes

senamakel merged commit 6a06bae into tinyhumansai:main May 23, 2026
24 checks passed

coderabbitai Bot mentioned this pull request May 25, 2026

fix(startup): recover from core port 7788 conflict automatically #2626

Merged

12 tasks

CodeGhost21 mentioned this pull request May 27, 2026

Windows v0.54.0 release still drops Google/GitHub OAuth callbacks after #2469/#2511 #2521

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(windows): make port/process takeover actually free the port#2552

fix(windows): make port/process takeover actually free the port#2552
senamakel merged 4 commits into
tinyhumansai:mainfrom
senamakel:fix/windows-port-kill

senamakel commented May 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 23, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

senamakel commented May 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

Validation Run

Validation Blocked

Behavior Changes

Parity Contract

Duplicate / Superseded PR Handling

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

senamakel commented May 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 23, 2026 •

edited

Loading