Skip to content

Fix iOS host error screen, SPM dynamic-product link leak, --headless plumbing, and two daemon-test races#146

Merged
obj-p merged 14 commits intomainfrom
fix/host-error-and-testsupport
May 1, 2026
Merged

Fix iOS host error screen, SPM dynamic-product link leak, --headless plumbing, and two daemon-test races#146
obj-p merged 14 commits intomainfrom
fix/host-error-and-testsupport

Conversation

@obj-p
Copy link
Copy Markdown
Owner

@obj-p obj-p commented Apr 23, 2026

Summary

Fixes #85.

Bundled fixes discovered while shipping the original two. Rebased on latest main. Eight commits, five logical changes:

  1. iOS host error screen UX (a442337, 08f705e) — the red error screen rendered long dyld errors through a frame-based UILabel pinned to view.bounds.insetBy(dx: 20, dy: 20). Long messages clipped, ran under the status bar, and weren't scrollable. First pass replaced it with a UITextView (isScrollEnabled = true, safe-area-anchored, monospace 14pt). User testing showed UITextView's content-size heuristic still left long single lines extending off-screen with no visible scroll affordance, so the second commit swapped to an explicit UIScrollView + UILabel with the label's widthAnchor tied to the scroll view's frameLayoutGuide.widthAnchor — the load-bearing constraint that forces wrapping at the visible width and gives the scroll view a real vertical scroll region. Body font bumped to 15pt monospace, title at Dynamic Type headline, container accessibilityLabel for snapshot trees. Trade-off: lost text selection (UILabel isn't selectable). Long-press copy via UIMenuInteraction is a follow-up if it matters. See Sources/PreviewsIOS/IOSHostAppSource.swift.

  2. Stop linking SPM dynamic library products into the preview host (58d1358, aa8bf00, dd869a2) — SPMBuildSystem.archiveDependencyTargets walked every <Target>.build/ directory under the bin path and force-added -l<Target> to the preview link line. For internal non-product library targets that's necessary (SPM leaves loose .o files with no -module-link-name). For dynamic library products it's wrong: SPM already emits lib<Target>.dylib, ld's -l search order prefers .dylib over .a, and the blanket -l burns a load command for the dynamic product into the preview dylib even when the consumer never imports it. Autolink via -module-link-name handles real importers either way.

    The fix extracts the skip decision into nonisolated static shouldSkipDependencyTarget(targetName:consumerTargetName:binPath:fm:) that additionally returns true when lib<Target>.dylib exists in binPath. Five unit tests cover every branch of the predicate. One end-to-end integration test (dd869a2) writes a synthetic SPM Package.swift with a .library(type: .dynamic) sibling product the consumer doesn't import, runs the real SPMBuildSystem.build, and asserts both that the dylib was produced (load-bearing sanity) and that compilerFlags does not contain -l<Sibling>. Verified the test fails when the dylib check is stubbed out.

    Review nits in aa8bf00: documented the target-name-equals-product-name assumption, tightened the dylib fixture to write a Mach-O 64-bit magic header instead of an empty file, replaced the host-binary byte-search loop in IOSHostBuilderTests with Data.range(of:).

  3. Always forward --headless from CLI to daemon (80b5f55) — RunCommand only sent the headless key when the flag was true, so omitting --headless dropped the key. Both daemon paths (MCPServer.swift:268 macOS, :348 iOS) default extractOptionalBool(\"headless\") ?? true when the key is absent — so a CLI user who didn't pass --headless got a headless window anyway, with no way to request a visible simulator. Always sending the value preserves the daemon's absent-key default for MCP clients while letting the CLI honor the user's choice. Discovered while testing fix Add CI, release workflow, and Homebrew distribution #1.

  4. De-flake LogTests against cross-suite stderr interleaving (ce3ee03) — LogTests.captureStderr rebinds the process-wide fd 2 via dup2, and .serialized only orders within the suite. CI run 25141775602 captured ◇ Test \"…\" started.\\n markers from concurrent suites in the redirect window, breaking the original ^…$-anchored regex against the whole capture. The original log line was correctly formatted at the end of the capture. Fix splits the capture on \\n and matches the format pattern as a line-anchored regex against any line — keeps format pinning, drops dependence on no other test writing to stderr during the redirect. Mitigation, not a root-cause fix: captureStderr's process-global dup2 design is still racy for any future test that depends on capture exclusivity. Real fixes would either inject the Log output sink or gate capture under a process-wide lock; both are larger and out of scope here.

  5. Retry the daemon handshake when a sibling CLI kills it mid-call (b322ef7) — CI run 25142036360 hit a race in the version-mismatch restart logic from 1b27286: two concurrent CLIs against a stale daemon both probe canConnect()=true, both call openClient(), but one wins the restart lock and SIGTERMs the daemon while the other is still inside client.connect(transport:). The lock-loser's transport tears down mid-handshake and surfaces MCPError.internalError(\"Transport not connected\") to the user as exit 1 — the opposite of the feature's "transparent from the user's point of view" promise. Fix: in DaemonClient.connect(...), if openClient throws and we didn't just spawn the daemon ourselves, wait for the socket to come back up (sibling's respawn typically completes in <2s) and retry the handshake once. Skip the retry on the just-spawned branch — failure there is our own startup misbehaving and retrying masks it.

Repro for the dynamic-product leak

A user hit this on a project that transitively depends on swift-issue-reporting, which declares IssueReportingTestSupport as a .dynamic library product linking Testing.framework. The simulator runtime doesn't ship Testing.framework, so the preview host crashed on launch:

Library not loaded: @rpath/Testing.framework/Testing
Referenced from: <build-dir>/arm64-apple-ios-simulator/debug/libIssueReportingTestSupport.dylib
Reason: tried multiple paths (no such file)

Test plan

  • swift build on rebased history
  • swift test --filter BuildSystem (69 tests, 5 new)
  • swift test --filter PreviewSessionBuildContext (6 tests, 1 new — verified fails when fix is stubbed)
  • swift test --filter IOSHostBuilder (embedded source compiles, title byte-grep passes)
  • swift test --filter Log (3 tests, deflake verified locally)
  • swift test --filter VersionHandshakeTests (3 tests, all pass; concurrentClientsRestartOnce ran 5x locally without flake)
  • User retest of the error-screen UX after kill-daemon + rebuild
  • Full CI green (in progress on this PR's run)
  • User retest end-to-end against the original swift-issue-reporting project

🤖 Generated with Claude Code

obj-p added a commit that referenced this pull request Apr 23, 2026
- shouldSkipDependencyTarget: document the target-name-equals-product-
  name assumption. Renamed dynamic products would fall through to the
  archive-then-link fallback, which is what ran before this fix and
  doesn't re-trigger the original bug (blanket `-l<Target>` only
  resolves to `lib<Target>.dylib` when the names match).
- BuildSystemTests: write the four-byte Mach-O 64-bit magic
  (MH_MAGIC_64, 0xFEEDFACF LE) into the dylib fixture so the test stays
  meaningful if the predicate ever tightens from `fileExists` to a real
  Mach-O check.
- IOSHostBuilderTests: replace the hand-rolled two-loop byte search
  with `Data.range(of:) != nil`. One line, obviously correct, no
  pointer arithmetic to eyeball.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
obj-p added a commit that referenced this pull request Apr 30, 2026
- shouldSkipDependencyTarget: document the target-name-equals-product-
  name assumption. Renamed dynamic products would fall through to the
  archive-then-link fallback, which is what ran before this fix and
  doesn't re-trigger the original bug (blanket `-l<Target>` only
  resolves to `lib<Target>.dylib` when the names match).
- BuildSystemTests: write the four-byte Mach-O 64-bit magic
  (MH_MAGIC_64, 0xFEEDFACF LE) into the dylib fixture so the test stays
  meaningful if the predicate ever tightens from `fileExists` to a real
  Mach-O check.
- IOSHostBuilderTests: replace the hand-rolled two-loop byte search
  with `Data.range(of:) != nil`. One line, obviously correct, no
  pointer arithmetic to eyeball.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@obj-p obj-p force-pushed the fix/host-error-and-testsupport branch from db38b9f to 80b5f55 Compare April 30, 2026 00:56
@obj-p obj-p changed the title Fix iOS host error screen UX and stop linking SPM dynamic products into the preview Fix iOS host error screen, SPM dynamic-product link leak, --headless plumbing, and a stderr-capture flake Apr 30, 2026
@obj-p obj-p changed the title Fix iOS host error screen, SPM dynamic-product link leak, --headless plumbing, and a stderr-capture flake Fix iOS host error screen, SPM dynamic-product link leak, --headless plumbing, and two daemon-test races Apr 30, 2026
obj-p added a commit that referenced this pull request Apr 30, 2026
- shouldSkipDependencyTarget: document the target-name-equals-product-
  name assumption. Renamed dynamic products would fall through to the
  archive-then-link fallback, which is what ran before this fix and
  doesn't re-trigger the original bug (blanket `-l<Target>` only
  resolves to `lib<Target>.dylib` when the names match).
- BuildSystemTests: write the four-byte Mach-O 64-bit magic
  (MH_MAGIC_64, 0xFEEDFACF LE) into the dylib fixture so the test stays
  meaningful if the predicate ever tightens from `fileExists` to a real
  Mach-O check.
- IOSHostBuilderTests: replace the hand-rolled two-loop byte search
  with `Data.range(of:) != nil`. One line, obviously correct, no
  pointer arithmetic to eyeball.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@obj-p obj-p force-pushed the fix/host-error-and-testsupport branch from a39da71 to a1fab50 Compare April 30, 2026 12:34
obj-p and others added 9 commits April 30, 2026 23:52
The previous showError UIViewController used a frame-based UILabel that
clipped long dyld errors, ran under the status bar, and showed paths
and symbols in a proportional font. Replace with a safe-area-anchored
layout: bold title UILabel + selectable non-editable monospaced
UITextView. Vertical scrolling handles arbitrarily long error payloads,
selection replaces a copy button, Auto Layout covers rotation.

Add a byte-grep assertion to the IOSHostBuilder compile-host test so a
bad backslash escape in the embedded "\"\"\"...\"\"\" source would fail
the build instead of reaching a user.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
archiveDependencyTargets iterates every <Target>.build/ directory under
the SPM bin path, archives its loose .o files, and appends -l<Target>
to the preview link line. That's the right call for internal non-product
library targets — SPM leaves them as loose .o files with no
-module-link-name, so autolink can't pull their symbols on its own —
but it is too aggressive for dynamic library products. SPM has already
emitted lib<Target>.dylib next to the .build/ dir, and ld's -l search
order prefers .dylib over .a, so the blanket -l<Target> burns a load
command for the dynamic product into the preview dylib even when the
consumer doesn't import it.

swift-issue-reporting ships IssueReportingTestSupport as a .dynamic
library product that links Testing.framework. Testing.framework is only
available inside Xcode's test runner — not in the iOS simulator runtime
— so once the preview dylib references libIssueReportingTestSupport.dylib
the preview host dies on launch with
"Library not loaded: @rpath/Testing.framework/Testing".

Extract the skip decision into a nonisolated static helper
shouldSkipDependencyTarget(targetName:consumerTargetName:binPath:fm:)
that also returns true when lib<Target>.dylib exists in binPath. Swift's
autolink via -module-link-name still handles linking from consumers that
actually import the module, so consumers that do use a dynamic product
are unaffected; consumers that don't import it stop dragging its
transitive deps in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- shouldSkipDependencyTarget: document the target-name-equals-product-
  name assumption. Renamed dynamic products would fall through to the
  archive-then-link fallback, which is what ran before this fix and
  doesn't re-trigger the original bug (blanket `-l<Target>` only
  resolves to `lib<Target>.dylib` when the names match).
- BuildSystemTests: write the four-byte Mach-O 64-bit magic
  (MH_MAGIC_64, 0xFEEDFACF LE) into the dylib fixture so the test stays
  meaningful if the predicate ever tightens from `fileExists` to a real
  Mach-O check.
- IOSHostBuilderTests: replace the hand-rolled two-loop byte search
  with `Data.range(of:) != nil`. One line, obviously correct, no
  pointer arithmetic to eyeball.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The five unit tests on shouldSkipDependencyTarget verify the predicate's
decision table. Neither they nor the host-binary title grep actually
exercise archiveDependencyTargets against real SPM output — a reviewer
correctly pointed out that A-only tests would pass for a change that
silently skips the predicate entirely.

Add a synthetic-fixture integration test: write a minimal Package.swift
at a tempdir with a sibling .library(type: .dynamic) product that the
consumer target does NOT import, run the real SPMBuildSystem.build,
and assert two load-bearing facts:

  - libSiblingDyn.dylib was actually produced in binPath (without the
    dylib there is no artifact to trigger the skip, and the `-l absent`
    assertion below would pass for the wrong reason);
  - context.compilerFlags does not contain -lSiblingDyn.

Prefer a synthetic fixture over adding swift-issue-reporting to
examples/spm: a local probe showed that plain `swift build` on a root
package that only imports IssueReporting does not emit
libIssueReportingTestSupport.dylib — SPM materialises it only when that
product is explicitly requested, which PreviewsMCP's build path does
not do. Using the real package would either fail to reproduce or
require contorting the fixture to import IssueReportingTestSupport
directly, which inverts the bug shape.

Verified by temporarily stubbing the dylib check to return false:
without the fix, compilerFlags contains `-lSiblingDyn` and the test
fails with a clear diagnostic showing the full flag list; with the
fix restored, it passes in ~2.5s on macOS.

Chosen macOS (not iOS simulator) as the build triple because the bug
mechanism is a platform-agnostic filesystem predicate over binPath.
iOS was the motivation for the bug report — Testing.framework is
absent from the simulator — but the regression we're guarding is the
linker-flag emission, which reproduces on macOS without needing the
iphonesimulator SDK, codesigning quirks, or a booted simulator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
UITextView with isScrollEnabled = true is internally a UIScrollView
and should have handled long error payloads, but user testing showed
the error still rendered unreadable — likely because UITextView's
content-size heuristic and default textContainerInset behave subtly
differently across iOS versions, and a short-but-long-line error
could extend off-screen without the user seeing any scroll
affordance.

Swap to the conservative UIScrollView + UILabel pattern the user
explicitly asked for. The label is pinned to the scroll view's
contentLayoutGuide and its widthAnchor is tied to the frameLayoutGuide
— the widthAnchor tie is the load-bearing piece: without it, long
lines extend horizontally and no vertical scroll region ever appears.
Bump body font to 15pt monospace for readability on device; keep the
title at Dynamic Type headline.

Lose selection-to-copy on the body (UILabel is not selectable by
default) — acceptable cost for predictable rendering. If selection
turns out to matter, a UIMenuInteraction on long-press is a cheaper
add than wrestling UITextView's layout again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RunCommand only sent the `headless` key when the flag was true, so
omitting `--headless` dropped the key entirely. Both daemon paths
(MCPServer.swift:268 macOS, :348 iOS) default
`extractOptionalBool("headless") ?? true` when the key is absent —
so a CLI user who did not pass `--headless` got a headless window
anyway, and there was no way to request a visible simulator from the
CLI at all. The flag was effectively a no-op.

Always sending the value keeps the MCP-side default behavior
backwards-compatible for clients that don't set headless, while
letting the CLI actually honor the user's choice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI run 25141775602 captured this on `info writes timestamped plain
message to stderr`:

    "◇ Test \"XcodeBuildSystem parseBuildSettings stops at second
    target\" started.\n◇ Test \"readPackageName returns nil when
    target is not in the manifest\" started.\n◇ Test \"ProjectInfo
    decodes from xcodebuild -workspace -list JSON\" started.\n◇ Test
    \"BazelBuildSystem constructs label for nested file in
    package\" started.\n[00:58:56.716] hello world\n"

The log line at the end is correctly formatted. The failure is that
swift-testing emits its `◇ Test "…" started.` progress markers on
stderr from concurrent suites, and `dup2(fileno(stderr))` in
`captureStderr` is process-wide — `.serialized` only orders within
the suite. The original `^…$`-anchored regex required the entire
capture to be the log line, which is unprovable when other suites
share the fd.

Match `^…$` at the line scope instead. Splits the capture on `\n`
and checks any line matches, via a small private `containsLine`
helper. Format is still pinned (line-anchored regex), but the test
no longer depends on no other test happening to write to stderr
during the redirect window.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI run 25142036360 (build-and-test) hit a race in the version-mismatch
restart logic landed in 1b27286: two concurrent CLIs against a stale
daemon both probe `canConnect()=true`, both call `openClient()`, but
one wins the restart lock and SIGTERMs the daemon while the other is
still inside `client.connect(transport:)`. The lock-loser's transport
tears down mid-handshake and surfaces
`MCPError.internalError("Transport not connected")` to the user as
exit 1 — the opposite of the feature's "transparent from the user's
point of view" promise.

Failing capture from CI:

    ✘ Test "concurrent CLIs against a stale daemon restart it
      exactly once" recorded an issue at VersionHandshakeTests.swift:196:13:
      Expectation failed: (a.exitCode → 1) == 0
    ↳ A stderr: Error: Internal error: Transport not connected

Fix: in `DaemonClient.connect(...)`, if `openClient` throws and we
didn't just spawn the daemon ourselves, wait for the socket to come
back up (the sibling's respawn typically completes in <2s) and retry
the handshake once. One retry is enough — a second failure means the
daemon is genuinely gone, propagate. Skip the retry on the
just-spawned branch; failure there is our own startup misbehaving and
retrying would mask it.

Verified by running `concurrentClientsRestartOnce` 5x locally — all
green. Local scheduling doesn't reliably reproduce the CI race, so
the real proof is the next CI run on this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-ups from the PR review:

  - showError set `vc.view.accessibilityLabel = "Preview host error"`
    while also rendering a UILabel with the same text, so VoiceOver
    would announce the title twice (once from the container, once
    from the label). Drop the container-level label; the UILabel is
    already discoverable.

  - LogTests dropped the trailing-`\n` portion of the format pin when
    it switched from `^…\n$` (whole-capture) to per-line matching, so
    a regression that stopped emitting newlines wouldn't fail the
    test. Restore the terminator by matching `(?m)^<pattern>\n`
    against the raw capture instead of splitting on `\n` first —
    `split` collapses `"foo"` and `"foo\n"` to the same element, so
    the newline pin has to live in the regex. Patterns lose their
    `^…$` anchors at the call sites since the helper now applies
    line anchoring internally.

Verified all three LogTests pass against the new helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
obj-p and others added 5 commits May 1, 2026 08:29
The integration-test prompts in each example README walked from the
macOS render straight into the iOS interaction step without an explicit
preview_stop. That made it easy to leak the first session — which I did
during a /integration-test run and only caught at the end via
session_list. Adding a one-line stop instruction at the end of step 2
closes the foot-gun.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The MCPIntegrationTests step on macOS CI started hanging for 1200s on
random MacOSMCPTests cases. The per-instance MCP-server stderr dump
shows the test reaches preview_stop, the daemon logs the callTool, and
then both processes go silent for ~19 minutes until Swift Testing's
.timeLimit fires. Even MCPTestServer's existing 60s pthread watchdog
produced no output, so we couldn't tell whether closePreview was
wedged, MainActor was deadlocked, or something else entirely.

Adds:

- handlePreviewStop: log entry, MainActor hop, existence check result,
  closePreview entry/exit. The hang in the silent window will now
  bracket between two of these lines.
- HostApp.closePreview: log entry, watcher/session removal, the
  orderOut(_:) call that's the only AppKit-touching point in the
  function, and exit. orderOut is the prime suspect since it's the one
  call that could synchronously dispatch to other threads.
- MCPTestServer.emitWatchdogHeartbeat: also append to the per-instance
  stderr log file (which the GH Actions "Dump MCP server stderr" step
  cats post-mortem) in addition to writing to the test-process stderr.
  In the failed run the stderr-buffered heartbeats were never flushed,
  so the file-backed copy is the more reliable channel for hangs.

This is observation infrastructure, not a fix. Once a reproducing CI
run lands, the new log lines should pinpoint where preview_stop wedges.

Refs: #156

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first instrumentation round narrowed the hang to the test side:
the daemon completes preview_stop in ~3 ms, then the test process
wedges for 1200 s before Swift Testing's .timeLimit fires. The wedge
is somewhere inside the deferred MCPTestServer.stop() — most likely
process.waitUntilExit() (SIGTERM not killing the daemon) or the
SDK's client.disconnect() — but the existing code logs nothing as it
walks those phases.

Adds a small trace helper that, for each phase of stop(), writes one
line to both the test-process stderr and the per-instance stderr log
file using a raw open(2)/write(2) pair. The raw FDs sidestep any
FileHandle / Foundation locking that the wedge might also be holding,
so even a deeply hung test should leave a partial phase trace before
GH Actions kills the step at 30 min.

The next failed CI dump should show "[stop ...] process.terminate"
or "[stop ...] process.waitUntilExit (begin)" without the matching
"(returned)" line — that names the phase that's stuck.

Refs: #156

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two rounds of CI instrumentation for #156 narrowed the hang to the MCP
Swift SDK: the daemon services a request, but the test client's
`await client.callTool(...)` continuation never resumes. With the
existing bare-callTool plumbing (no per-call timeout), one dropped
response wedges the test for the full 1200 s `.timeLimit`.

Two improvements:

1. Default timeout on bare callTool. The previous "no timeout" overload
   is now a thin wrapper over `callToolWithTimeout(timeout: .seconds(60))`,
   so any of the ~30 call sites in MacOSMCPTests that hit the SDK
   transport-drop fail in 60 s instead of 20 minutes. The thrown
   `MCPTestError.timedOut` already carries the tool name, which is
   the missing piece — previous hangs only told us "somewhere in the
   test," not which call. 60 s is wide enough for compileCombined
   (~3 s) plus headroom for variants/configure recompile chains.

2. Watchdog tail no longer self-recurses. emitWatchdogHeartbeat
   reads the per-instance log to dump the last 5 lines as context.
   After commit 990b18e started writing heartbeats to that same file,
   each subsequent tick was including prior `[MCPTestServer watchdog]`
   markers and `[/watchdog]` trailers in its own tail — by t=180s
   the dump was nested heartbeats with no useful daemon content.
   Filter those lines out before taking the suffix so each tick
   reports actual daemon activity (or its absence).

Refs: #156

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI dump from run 25227229628 attempt 3 finally pinned the wedge:

  ◇ Test "File edit triggers hot reload (literal-only fast path)" started.   ← 19:27:32
  [stop 19:27:40.862] enter                                                   ← body done in 8s
  [stop 19:27:40.862] process.terminate
  [stop 19:27:40.862] process.waitUntilExit (begin)                          ← SIGTERM sent
  ✘ ... Time limit was exceeded: 1200.000 seconds                            ← 19:47:32
  ##[error]The action 'MCP integration tests' has timed out after 30 minutes.

The daemon doesn't exit after SIGTERM. process.waitUntilExit() then
blocks for the full per-test .timeLimit (1200 s), which is what the
last four CI flakes have been hitting. The earlier hypotheses (SDK
transport drop, MainActor deadlock, stop() phase issue) were
downstream symptoms; the actual wedge is the unbounded wait for a
process that isn't going to exit on its own.

Replace the bare waitUntilExit with a 5 s polling deadline; on
expiry, send SIGKILL and then waitUntilExit (which returns instantly
after SIGKILL since the process has been forcibly reaped). The total
extra delay on the happy path is at most one 50 ms sleep tick.

This is a band-aid: the daemon ignoring SIGTERM is a real bug worth
fixing on the daemon side (likely an NSApplication runloop or signal
handler issue exposed by the post-#153 timing). But the band-aid
turns a 30-minute opaque CI hang into a ~5 s blip with a clear
"SIGTERM ignored" trace line, so the next investigation has data
to work with rather than a black box.

Refs: #156

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@obj-p obj-p merged commit cf6ceca into main May 1, 2026
30 of 32 checks passed
@obj-p obj-p deleted the fix/host-error-and-testsupport branch May 1, 2026 23:40
obj-p added a commit that referenced this pull request May 2, 2026
Investigation of #156 found that the daemon already exits promptly on
SIGTERM in every measured condition (median 50-60ms across 30 local
SIGTERM cycles in the full MacOSMCPTests suite, 0 SIGKILL escalations).
The 1200s wedge tracked by #146's polling band-aid was inside
Foundation's `Process.waitUntilExit()` Mach-port path on the test-host
side, not the daemon.

Stdio mode previously had no SIGTERM handler — `runStdio()` skipped
`DaemonLifecycle.installSignalHandlers()`, leaving the kernel default
disposition (terminate immediately). That's already fast, but yields
no observability: if a future shutdown wedge recurs, there's no way
to distinguish "daemon never received SIGTERM" from "Foundation Process
wedged on the test side."

This adds a minimal `signal()` handler that writes a stderr breadcrumb
via signal-safe `write(2)` and exits via `_exit(0)`. Deliberately not
the daemon-mode `DispatchSource(... queue: .main)` pattern: that path
is coupled to NSApplication's runloop, and queue starvation under load
was the only remaining live hypothesis for the original wedge. Stdio
mode has no on-disk state to clean up before exit (no PID file, no
socket), so signal-thread `_exit(0)` is correct.

Local verification:
- `swift test --filter MacOSMCPTests`: 7/7 pass, exit latency unchanged
- Bare-spawn SIGTERM: 4ms exit (vs 5ms before — handler is faster than
  the kernel default path because it skips dyld destructors)
- Breadcrumb appears in per-instance daemon stderr log immediately
  before `[stop ...] process exited after SIGTERM` from the test harness
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing dylib screen

1 participant