Skip to content

Fix daemon setup flow: stale cleanup, timeouts, improved init output#3

Merged
willisrocks merged 7 commits intomainfrom
fix-daemon-setup
Mar 9, 2026
Merged

Fix daemon setup flow: stale cleanup, timeouts, improved init output#3
willisrocks merged 7 commits intomainfrom
fix-daemon-setup

Conversation

@willisrocks
Copy link
Copy Markdown
Contributor

Summary

  • Stale daemon cleanupinit reads PID file, validates process is actually devproxy, sends SIGTERM/SIGKILL with proper EPERM handling
  • Daemon startup verificationinit polls IPC socket for up to 5s after spawn, shows daemon.log tail on failure
  • Improved init output — prints full DNS setup commands (dnsmasq + /etc/resolver), CA trust command with correct path, sudo guidance for port 443
  • devproxy up fails fast — uses ipc::ping_sync() with 2s timeout instead of hanging on dead daemon
  • Shared ipc::ping_sync() — synchronous ping utility used by both init and up, eliminating duplicated protocol logic
  • PID file + daemon log — daemon writes PID to daemon.pid, stderr to daemon.log (append mode) for diagnostics
  • Socket permissionschmod 0o777 on IPC socket so non-root users can connect when daemon runs as root

Test Plan

  • 25 unit tests passed, 10 e2e tests passed (7 Docker-dependent tests ignored)
  • cargo clippy --all-targets -- -D warnings: clean
  • New tests: init output verification (DNS/CA/sudo), up fails fast with dead daemon, re-init kills stale daemon, IPC timeout unit test

🤖 Generated with Claude Code

chris-4d and others added 7 commits March 9, 2026 00:30
…nd improved init output

- Kill stale daemon processes on re-init using PID file
- Write daemon PID file on startup, clean up on exit
- Verify daemon is actually alive after spawn (wait up to 5s)
- Report daemon startup failure clearly with actionable hints
- Add IPC request timeout (3s default) to prevent hangs
- Add connection timeout in `up` command to fail fast with dead daemon
- Print complete DNS setup instructions (dnsmasq + resolver on macOS)
- Print CA trust command with correct cert path and sudo note
- Add 6 new e2e tests: init output verification, up fails fast with
  dead daemon, daemon PID file, re-init stale cleanup, CA path, sudo note
- Add 1 unit test: IPC timeout on unresponsive socket

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Validate PID is a devproxy process before killing to prevent PID
  reuse races (check via ps on macOS, /proc/pid/exe on Linux)
- Handle EPERM from kill(pid, 0) when stale daemon is owned by
  another user -- warn and suggest sudo instead of silently skipping
- wait_for_daemon now sends actual IPC ping/pong instead of just
  checking socket connectivity
- up command sends real IPC ping with 2s timeout instead of relying
  on connect() + unused read/write timeouts
- test_reinit_kills_stale_daemon leaves daemon1 alive so init's
  kill_stale_daemon exercises the SIGTERM/SIGKILL path
- Check libc::kill return value in e2e test cleanup with warning
- Remove redundant port 443 note at bottom of init output
- Remove committed plan artifacts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Document TOCTOU race between is_devproxy_process and SIGTERM as
  residual risk (PID reuse + same binary name is vanishingly unlikely)
- Check libc::kill return values for SIGTERM/SIGKILL; bail with
  actionable "sudo kill" guidance on EPERM
- Bail instead of returning Ok on alive_but_no_perms so the caller
  gets a clear error about why the new daemon will fail
- Use exact basename match (== "devproxy") instead of ends_with on
  macOS to avoid false positives like "my-devproxy"
- Replace std::mem::forget(child) in test with proper cleanup via
  original_child.wait() after init kills the process
- Simplify up.rs IPC ping to match init.rs pattern (no try_clone)
- Add cross-reference comments to raw JSON ping format pointing to
  canonical ipc.rs Request/Response protocol
- Only print CA trust instructions in "Next steps" if trust actually
  failed (skip if already trusted successfully)
- Verify socket file exists after UnixListener drop in stale daemon
  test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Only print "stale daemon killed" when SIGTERM or SIGKILL actually
  succeeded; print warning otherwise
- Extract shared ipc::ping_sync() function to eliminate duplicated
  IPC ping logic between init.rs and up.rs, preventing protocol drift
- Validate PID > 0 before calling libc::kill to prevent signaling
  the entire process group on corrupted PID file
- Restructure alive/EPERM/dead check as if/else-if/else for clarity
- DaemonGuard Drop reaps the dummy "true" child via kill+wait
- Document UnixListener drop behavior (does not remove socket file)
- Harden test_init_output_includes_ca_trust_path comment to explain
  why it works regardless of trust success/failure
- Replace unreachable unwrap_or with expect on rsplit('.').next()
- Add ping_sync unit test for nonexistent socket path

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Capture daemon stderr to daemon.log and show last 10 lines on startup failure
- Only remove PID file when kill is confirmed; leave for manual cleanup otherwise
- Add proper SIGTERM/SIGKILL cleanup for new daemon in reinit test
- Parse PID as u32 for consistency with std::process::id() output
- Fix misleading test comment (hung daemon, not dead daemon)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Warn on empty or unparseable PID file content before cleanup
- Reject PID values > i32::MAX to prevent silent wrap to negative
- Use append mode for daemon.log to preserve previous daemon's output
- Eliminate unused `step` variable assignment after last usage

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@willisrocks willisrocks merged commit 4543bff into main Mar 9, 2026
2 checks passed
willisrocks added a commit that referenced this pull request Mar 10, 2026
- Update docs/spec.md and skills/setup/SKILL.md to reference login
  keychain instead of system keychain (finding #1)
- Use distinct CN "devproxy Test CA" in roundtrip test to avoid
  colliding with real devproxy CAs in the login keychain (finding #3)
- Update e2e test to accept trust success (no longer requires sudo
  fallback output since login keychain trust succeeds without sudo)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants