Skip to content

Use _NSGetExecutablePath for daemon self-path resolution (#100)#153

Merged
obj-p merged 1 commit intomainfrom
fix/100-self-path-nsgetexecutablepath
May 1, 2026
Merged

Use _NSGetExecutablePath for daemon self-path resolution (#100)#153
obj-p merged 1 commit intomainfrom
fix/100-self-path-nsgetexecutablepath

Conversation

@obj-p
Copy link
Copy Markdown
Owner

@obj-p obj-p commented Apr 30, 2026

Fixes #100.

Summary

DaemonClient.spawnDaemon resolved the binary to launch via ProcessInfo.processInfo.arguments[0]. argv[0] is unreliable in three ways:

  1. Relative paths. ./previewsmcp lands as argv[0] = "./previewsmcp". URL(fileURLWithPath:).standardizedFileURL resolves it against the current working directory — fine today since the CLI doesn't chdir, but a tripwire for any future code that does.
  2. PATH-resolved invocations. When the shell finds the binary via PATH, argv[0] is the bare name previewsmcp. Resolution against CWD silently fails.
  3. Caller-controlled. execve lets the parent set argv[0] to anything — wrapper scripts, launchers, misbehaving init systems. Not a security boundary, but adds weird-environment failures.

The version-mismatch respawn logic from #142 makes this load-bearing — every transparent daemon respawn re-runs spawnDaemon, so a fragile self-path shows up as a respawn loop or a spurious binaryNotFound against a CLI that's actually on disk. The existing code at DaemonClient.swift:380-385 already had a comment pointing at #100 acknowledging this.

MCPServer.swift already had a working _NSGetExecutablePath-backed helper for preview_build_info (added in #150), file-private, with a section comment that explicitly named #100 as the issue to migrate the daemon path to it. This change extracts that helper into module-internal Sources/PreviewsCLI/SelfPath.swift and points both call sites at it. The kernel records the executing binary's path at execve time; this primitive reads it back regardless of CWD or argv[0].

Stale comments in DaemonClient.swift referencing argv[0] (lines 43-45 and 422-426) updated to reflect the new authoritative resolution.

Test plan

Tests/PreviewsCLITests/SelfPathTests.swift covers the property argv[0] lacked:

  • returns an absolute, executable path — proves no relative-path output
  • returns the same value across calls — determinism
  • invariant under chdir — the property argv[0] lacks — captures path, chdir to /tmp, calls again, asserts equal

The suite is @Suite(.serialized) because the chdir test mutates process-global state.

🤖 Generated with Claude Code

`DaemonClient.spawnDaemon` resolved the binary to launch via
`ProcessInfo.processInfo.arguments[0]`. argv[0] is unreliable: it can
be relative (`./previewsmcp`, resolved against the current working
directory), bare when PATH-resolved by the shell, or rewritten by the
caller via `execve`. The version-mismatch respawn logic from #142 made
this load-bearing — every transparent daemon respawn re-runs
`spawnDaemon`, so a fragile self-path shows up as a respawn loop or a
spurious "binaryNotFound" against a CLI that's actually on disk.

`MCPServer.swift` already had a working `_NSGetExecutablePath`-backed
helper for `preview_build_info` (added in #150), with a comment
explicitly noting that #100 covers migrating the daemon path to it.
This change extracts that helper into module-internal `SelfPath.swift`
and points both call sites at it. The kernel records the executing
binary's path at `execve` time; this primitive reads it back regardless
of CWD or what the caller put in argv[0].

Tests cover the property argv[0] lacks: the resolved path is absolute,
exists, is identical across calls, and is invariant under chdir. The
suite is `.serialized` because the chdir test mutates process-global
state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@obj-p obj-p force-pushed the fix/100-self-path-nsgetexecutablepath branch from 335c845 to 7ecb3aa Compare May 1, 2026 00:53
@obj-p obj-p merged commit 8f99d86 into main May 1, 2026
7 of 8 checks passed
@obj-p obj-p deleted the fix/100-self-path-nsgetexecutablepath branch May 1, 2026 03:47
obj-p added a commit that referenced this pull request May 1, 2026
CI dump from run 25227229628 attempt 3 finally pinned the wedge:

  ◇ Test "File edit triggers hot reload (literal-only fast path)" started.   ← 19:27:32
  [stop 19:27:40.862] enter                                                   ← body done in 8s
  [stop 19:27:40.862] process.terminate
  [stop 19:27:40.862] process.waitUntilExit (begin)                          ← SIGTERM sent
  ✘ ... Time limit was exceeded: 1200.000 seconds                            ← 19:47:32
  ##[error]The action 'MCP integration tests' has timed out after 30 minutes.

The daemon doesn't exit after SIGTERM. process.waitUntilExit() then
blocks for the full per-test .timeLimit (1200 s), which is what the
last four CI flakes have been hitting. The earlier hypotheses (SDK
transport drop, MainActor deadlock, stop() phase issue) were
downstream symptoms; the actual wedge is the unbounded wait for a
process that isn't going to exit on its own.

Replace the bare waitUntilExit with a 5 s polling deadline; on
expiry, send SIGKILL and then waitUntilExit (which returns instantly
after SIGKILL since the process has been forcibly reaped). The total
extra delay on the happy path is at most one 50 ms sleep tick.

This is a band-aid: the daemon ignoring SIGTERM is a real bug worth
fixing on the daemon side (likely an NSApplication runloop or signal
handler issue exposed by the post-#153 timing). But the band-aid
turns a 30-minute opaque CI hang into a ~5 s blip with a clear
"SIGTERM ignored" trace line, so the next investigation has data
to work with rather than a black box.

Refs: #156

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
obj-p added a commit that referenced this pull request May 1, 2026
…plumbing, and two daemon-test races (#146)

* Make iOS host error screen scrollable and readable

The previous showError UIViewController used a frame-based UILabel that
clipped long dyld errors, ran under the status bar, and showed paths
and symbols in a proportional font. Replace with a safe-area-anchored
layout: bold title UILabel + selectable non-editable monospaced
UITextView. Vertical scrolling handles arbitrarily long error payloads,
selection replaces a copy button, Auto Layout covers rotation.

Add a byte-grep assertion to the IOSHostBuilder compile-host test so a
bad backslash escape in the embedded "\"\"\"...\"\"\" source would fail
the build instead of reaching a user.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Stop linking SPM dynamic library products into the preview host

archiveDependencyTargets iterates every <Target>.build/ directory under
the SPM bin path, archives its loose .o files, and appends -l<Target>
to the preview link line. That's the right call for internal non-product
library targets — SPM leaves them as loose .o files with no
-module-link-name, so autolink can't pull their symbols on its own —
but it is too aggressive for dynamic library products. SPM has already
emitted lib<Target>.dylib next to the .build/ dir, and ld's -l search
order prefers .dylib over .a, so the blanket -l<Target> burns a load
command for the dynamic product into the preview dylib even when the
consumer doesn't import it.

swift-issue-reporting ships IssueReportingTestSupport as a .dynamic
library product that links Testing.framework. Testing.framework is only
available inside Xcode's test runner — not in the iOS simulator runtime
— so once the preview dylib references libIssueReportingTestSupport.dylib
the preview host dies on launch with
"Library not loaded: @rpath/Testing.framework/Testing".

Extract the skip decision into a nonisolated static helper
shouldSkipDependencyTarget(targetName:consumerTargetName:binPath:fm:)
that also returns true when lib<Target>.dylib exists in binPath. Swift's
autolink via -module-link-name still handles linking from consumers that
actually import the module, so consumers that do use a dynamic product
are unaffected; consumers that don't import it stop dragging its
transitive deps in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address code review nits on PR #146

- shouldSkipDependencyTarget: document the target-name-equals-product-
  name assumption. Renamed dynamic products would fall through to the
  archive-then-link fallback, which is what ran before this fix and
  doesn't re-trigger the original bug (blanket `-l<Target>` only
  resolves to `lib<Target>.dylib` when the names match).
- BuildSystemTests: write the four-byte Mach-O 64-bit magic
  (MH_MAGIC_64, 0xFEEDFACF LE) into the dylib fixture so the test stays
  meaningful if the predicate ever tightens from `fileExists` to a real
  Mach-O check.
- IOSHostBuilderTests: replace the hand-rolled two-loop byte search
  with `Data.range(of:) != nil`. One line, obviously correct, no
  pointer arithmetic to eyeball.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add end-to-end integration test for the dynamic-product link fix

The five unit tests on shouldSkipDependencyTarget verify the predicate's
decision table. Neither they nor the host-binary title grep actually
exercise archiveDependencyTargets against real SPM output — a reviewer
correctly pointed out that A-only tests would pass for a change that
silently skips the predicate entirely.

Add a synthetic-fixture integration test: write a minimal Package.swift
at a tempdir with a sibling .library(type: .dynamic) product that the
consumer target does NOT import, run the real SPMBuildSystem.build,
and assert two load-bearing facts:

  - libSiblingDyn.dylib was actually produced in binPath (without the
    dylib there is no artifact to trigger the skip, and the `-l absent`
    assertion below would pass for the wrong reason);
  - context.compilerFlags does not contain -lSiblingDyn.

Prefer a synthetic fixture over adding swift-issue-reporting to
examples/spm: a local probe showed that plain `swift build` on a root
package that only imports IssueReporting does not emit
libIssueReportingTestSupport.dylib — SPM materialises it only when that
product is explicitly requested, which PreviewsMCP's build path does
not do. Using the real package would either fail to reproduce or
require contorting the fixture to import IssueReportingTestSupport
directly, which inverts the bug shape.

Verified by temporarily stubbing the dylib check to return false:
without the fix, compilerFlags contains `-lSiblingDyn` and the test
fails with a clear diagnostic showing the full flag list; with the
fix restored, it passes in ~2.5s on macOS.

Chosen macOS (not iOS simulator) as the build triple because the bug
mechanism is a platform-agnostic filesystem predicate over binPath.
iOS was the motivation for the bug report — Testing.framework is
absent from the simulator — but the regression we're guarding is the
linker-flag emission, which reproduces on macOS without needing the
iphonesimulator SDK, codesigning quirks, or a booted simulator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Wrap the iOS host error body in an explicit UIScrollView + UILabel

UITextView with isScrollEnabled = true is internally a UIScrollView
and should have handled long error payloads, but user testing showed
the error still rendered unreadable — likely because UITextView's
content-size heuristic and default textContainerInset behave subtly
differently across iOS versions, and a short-but-long-line error
could extend off-screen without the user seeing any scroll
affordance.

Swap to the conservative UIScrollView + UILabel pattern the user
explicitly asked for. The label is pinned to the scroll view's
contentLayoutGuide and its widthAnchor is tied to the frameLayoutGuide
— the widthAnchor tie is the load-bearing piece: without it, long
lines extend horizontally and no vertical scroll region ever appears.
Bump body font to 15pt monospace for readability on device; keep the
title at Dynamic Type headline.

Lose selection-to-copy on the body (UILabel is not selectable by
default) — acceptable cost for predictable rendering. If selection
turns out to matter, a UIMenuInteraction on long-press is a cheaper
add than wrestling UITextView's layout again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Always forward --headless from CLI to daemon

RunCommand only sent the `headless` key when the flag was true, so
omitting `--headless` dropped the key entirely. Both daemon paths
(MCPServer.swift:268 macOS, :348 iOS) default
`extractOptionalBool("headless") ?? true` when the key is absent —
so a CLI user who did not pass `--headless` got a headless window
anyway, and there was no way to request a visible simulator from the
CLI at all. The flag was effectively a no-op.

Always sending the value keeps the MCP-side default behavior
backwards-compatible for clients that don't set headless, while
letting the CLI actually honor the user's choice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* De-flake LogTests against cross-suite stderr interleaving

CI run 25141775602 captured this on `info writes timestamped plain
message to stderr`:

    "◇ Test \"XcodeBuildSystem parseBuildSettings stops at second
    target\" started.\n◇ Test \"readPackageName returns nil when
    target is not in the manifest\" started.\n◇ Test \"ProjectInfo
    decodes from xcodebuild -workspace -list JSON\" started.\n◇ Test
    \"BazelBuildSystem constructs label for nested file in
    package\" started.\n[00:58:56.716] hello world\n"

The log line at the end is correctly formatted. The failure is that
swift-testing emits its `◇ Test "…" started.` progress markers on
stderr from concurrent suites, and `dup2(fileno(stderr))` in
`captureStderr` is process-wide — `.serialized` only orders within
the suite. The original `^…$`-anchored regex required the entire
capture to be the log line, which is unprovable when other suites
share the fd.

Match `^…$` at the line scope instead. Splits the capture on `\n`
and checks any line matches, via a small private `containsLine`
helper. Format is still pinned (line-anchored regex), but the test
no longer depends on no other test happening to write to stderr
during the redirect window.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Retry the daemon handshake when a sibling CLI kills it mid-call

CI run 25142036360 (build-and-test) hit a race in the version-mismatch
restart logic landed in 1b27286: two concurrent CLIs against a stale
daemon both probe `canConnect()=true`, both call `openClient()`, but
one wins the restart lock and SIGTERMs the daemon while the other is
still inside `client.connect(transport:)`. The lock-loser's transport
tears down mid-handshake and surfaces
`MCPError.internalError("Transport not connected")` to the user as
exit 1 — the opposite of the feature's "transparent from the user's
point of view" promise.

Failing capture from CI:

    ✘ Test "concurrent CLIs against a stale daemon restart it
      exactly once" recorded an issue at VersionHandshakeTests.swift:196:13:
      Expectation failed: (a.exitCode → 1) == 0
    ↳ A stderr: Error: Internal error: Transport not connected

Fix: in `DaemonClient.connect(...)`, if `openClient` throws and we
didn't just spawn the daemon ourselves, wait for the socket to come
back up (the sibling's respawn typically completes in <2s) and retry
the handshake once. One retry is enough — a second failure means the
daemon is genuinely gone, propagate. Skip the retry on the
just-spawned branch; failure there is our own startup misbehaving and
retrying would mask it.

Verified by running `concurrentClientsRestartOnce` 5x locally — all
green. Local scheduling doesn't reliably reproduce the CI race, so
the real proof is the next CI run on this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address review nits: drop duplicate VoiceOver label, restore \n pin

Two follow-ups from the PR review:

  - showError set `vc.view.accessibilityLabel = "Preview host error"`
    while also rendering a UILabel with the same text, so VoiceOver
    would announce the title twice (once from the container, once
    from the label). Drop the container-level label; the UILabel is
    already discoverable.

  - LogTests dropped the trailing-`\n` portion of the format pin when
    it switched from `^…\n$` (whole-capture) to per-line matching, so
    a regression that stopped emitting newlines wouldn't fail the
    test. Restore the terminator by matching `(?m)^<pattern>\n`
    against the raw capture instead of splitting on `\n` first —
    `split` collapses `"foo"` and `"foo\n"` to the same element, so
    the newline pin has to live in the regex. Patterns lose their
    `^…$` anchors at the call sites since the helper now applies
    line anchoring internally.

Verified all three LogTests pass against the new helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Tell example READMEs to stop the macOS session before iOS

The integration-test prompts in each example README walked from the
macOS render straight into the iOS interaction step without an explicit
preview_stop. That made it easy to leak the first session — which I did
during a /integration-test run and only caught at the end via
session_list. Adding a one-line stop instruction at the end of step 2
closes the foot-gun.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Instrument preview_stop to localize the CI hang in #156

The MCPIntegrationTests step on macOS CI started hanging for 1200s on
random MacOSMCPTests cases. The per-instance MCP-server stderr dump
shows the test reaches preview_stop, the daemon logs the callTool, and
then both processes go silent for ~19 minutes until Swift Testing's
.timeLimit fires. Even MCPTestServer's existing 60s pthread watchdog
produced no output, so we couldn't tell whether closePreview was
wedged, MainActor was deadlocked, or something else entirely.

Adds:

- handlePreviewStop: log entry, MainActor hop, existence check result,
  closePreview entry/exit. The hang in the silent window will now
  bracket between two of these lines.
- HostApp.closePreview: log entry, watcher/session removal, the
  orderOut(_:) call that's the only AppKit-touching point in the
  function, and exit. orderOut is the prime suspect since it's the one
  call that could synchronously dispatch to other threads.
- MCPTestServer.emitWatchdogHeartbeat: also append to the per-instance
  stderr log file (which the GH Actions "Dump MCP server stderr" step
  cats post-mortem) in addition to writing to the test-process stderr.
  In the failed run the stderr-buffered heartbeats were never flushed,
  so the file-backed copy is the more reliable channel for hangs.

This is observation infrastructure, not a fix. Once a reproducing CI
run lands, the new log lines should pinpoint where preview_stop wedges.

Refs: #156

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Trace each phase of MCPTestServer.stop() for #156

The first instrumentation round narrowed the hang to the test side:
the daemon completes preview_stop in ~3 ms, then the test process
wedges for 1200 s before Swift Testing's .timeLimit fires. The wedge
is somewhere inside the deferred MCPTestServer.stop() — most likely
process.waitUntilExit() (SIGTERM not killing the daemon) or the
SDK's client.disconnect() — but the existing code logs nothing as it
walks those phases.

Adds a small trace helper that, for each phase of stop(), writes one
line to both the test-process stderr and the per-instance stderr log
file using a raw open(2)/write(2) pair. The raw FDs sidestep any
FileHandle / Foundation locking that the wedge might also be holding,
so even a deeply hung test should leave a partial phase trace before
GH Actions kills the step at 30 min.

The next failed CI dump should show "[stop ...] process.terminate"
or "[stop ...] process.waitUntilExit (begin)" without the matching
"(returned)" line — that names the phase that's stuck.

Refs: #156

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Cap MCPTestServer.callTool at 60s and de-recurse the watchdog tail

Two rounds of CI instrumentation for #156 narrowed the hang to the MCP
Swift SDK: the daemon services a request, but the test client's
`await client.callTool(...)` continuation never resumes. With the
existing bare-callTool plumbing (no per-call timeout), one dropped
response wedges the test for the full 1200 s `.timeLimit`.

Two improvements:

1. Default timeout on bare callTool. The previous "no timeout" overload
   is now a thin wrapper over `callToolWithTimeout(timeout: .seconds(60))`,
   so any of the ~30 call sites in MacOSMCPTests that hit the SDK
   transport-drop fail in 60 s instead of 20 minutes. The thrown
   `MCPTestError.timedOut` already carries the tool name, which is
   the missing piece — previous hangs only told us "somewhere in the
   test," not which call. 60 s is wide enough for compileCombined
   (~3 s) plus headroom for variants/configure recompile chains.

2. Watchdog tail no longer self-recurses. emitWatchdogHeartbeat
   reads the per-instance log to dump the last 5 lines as context.
   After commit 990b18e started writing heartbeats to that same file,
   each subsequent tick was including prior `[MCPTestServer watchdog]`
   markers and `[/watchdog]` trailers in its own tail — by t=180s
   the dump was nested heartbeats with no useful daemon content.
   Filter those lines out before taking the suffix so each tick
   reports actual daemon activity (or its absence).

Refs: #156

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Bound MCPTestServer.stop's waitUntilExit, escalate to SIGKILL

CI dump from run 25227229628 attempt 3 finally pinned the wedge:

  ◇ Test "File edit triggers hot reload (literal-only fast path)" started.   ← 19:27:32
  [stop 19:27:40.862] enter                                                   ← body done in 8s
  [stop 19:27:40.862] process.terminate
  [stop 19:27:40.862] process.waitUntilExit (begin)                          ← SIGTERM sent
  ✘ ... Time limit was exceeded: 1200.000 seconds                            ← 19:47:32
  ##[error]The action 'MCP integration tests' has timed out after 30 minutes.

The daemon doesn't exit after SIGTERM. process.waitUntilExit() then
blocks for the full per-test .timeLimit (1200 s), which is what the
last four CI flakes have been hitting. The earlier hypotheses (SDK
transport drop, MainActor deadlock, stop() phase issue) were
downstream symptoms; the actual wedge is the unbounded wait for a
process that isn't going to exit on its own.

Replace the bare waitUntilExit with a 5 s polling deadline; on
expiry, send SIGKILL and then waitUntilExit (which returns instantly
after SIGKILL since the process has been forcibly reaped). The total
extra delay on the happy path is at most one 50 ms sleep tick.

This is a band-aid: the daemon ignoring SIGTERM is a real bug worth
fixing on the daemon side (likely an NSApplication runloop or signal
handler issue exposed by the post-#153 timing). But the band-aid
turns a 30-minute opaque CI hang into a ~5 s blip with a clear
"SIGTERM ignored" trace line, so the next investigation has data
to work with rather than a black box.

Refs: #156

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Self-path resolution uses argv[0]; switch to authoritative API

1 participant