Use _NSGetExecutablePath for daemon self-path resolution (#100)#153
Merged
Use _NSGetExecutablePath for daemon self-path resolution (#100)#153
Conversation
This was referenced May 1, 2026
Closed
`DaemonClient.spawnDaemon` resolved the binary to launch via `ProcessInfo.processInfo.arguments[0]`. argv[0] is unreliable: it can be relative (`./previewsmcp`, resolved against the current working directory), bare when PATH-resolved by the shell, or rewritten by the caller via `execve`. The version-mismatch respawn logic from #142 made this load-bearing — every transparent daemon respawn re-runs `spawnDaemon`, so a fragile self-path shows up as a respawn loop or a spurious "binaryNotFound" against a CLI that's actually on disk. `MCPServer.swift` already had a working `_NSGetExecutablePath`-backed helper for `preview_build_info` (added in #150), with a comment explicitly noting that #100 covers migrating the daemon path to it. This change extracts that helper into module-internal `SelfPath.swift` and points both call sites at it. The kernel records the executing binary's path at `execve` time; this primitive reads it back regardless of CWD or what the caller put in argv[0]. Tests cover the property argv[0] lacks: the resolved path is absolute, exists, is identical across calls, and is invariant under chdir. The suite is `.serialized` because the chdir test mutates process-global state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
335c845 to
7ecb3aa
Compare
obj-p
added a commit
that referenced
this pull request
May 1, 2026
CI dump from run 25227229628 attempt 3 finally pinned the wedge: ◇ Test "File edit triggers hot reload (literal-only fast path)" started. ← 19:27:32 [stop 19:27:40.862] enter ← body done in 8s [stop 19:27:40.862] process.terminate [stop 19:27:40.862] process.waitUntilExit (begin) ← SIGTERM sent ✘ ... Time limit was exceeded: 1200.000 seconds ← 19:47:32 ##[error]The action 'MCP integration tests' has timed out after 30 minutes. The daemon doesn't exit after SIGTERM. process.waitUntilExit() then blocks for the full per-test .timeLimit (1200 s), which is what the last four CI flakes have been hitting. The earlier hypotheses (SDK transport drop, MainActor deadlock, stop() phase issue) were downstream symptoms; the actual wedge is the unbounded wait for a process that isn't going to exit on its own. Replace the bare waitUntilExit with a 5 s polling deadline; on expiry, send SIGKILL and then waitUntilExit (which returns instantly after SIGKILL since the process has been forcibly reaped). The total extra delay on the happy path is at most one 50 ms sleep tick. This is a band-aid: the daemon ignoring SIGTERM is a real bug worth fixing on the daemon side (likely an NSApplication runloop or signal handler issue exposed by the post-#153 timing). But the band-aid turns a 30-minute opaque CI hang into a ~5 s blip with a clear "SIGTERM ignored" trace line, so the next investigation has data to work with rather than a black box. Refs: #156 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
obj-p
added a commit
that referenced
this pull request
May 1, 2026
…plumbing, and two daemon-test races (#146) * Make iOS host error screen scrollable and readable The previous showError UIViewController used a frame-based UILabel that clipped long dyld errors, ran under the status bar, and showed paths and symbols in a proportional font. Replace with a safe-area-anchored layout: bold title UILabel + selectable non-editable monospaced UITextView. Vertical scrolling handles arbitrarily long error payloads, selection replaces a copy button, Auto Layout covers rotation. Add a byte-grep assertion to the IOSHostBuilder compile-host test so a bad backslash escape in the embedded "\"\"\"...\"\"\" source would fail the build instead of reaching a user. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Stop linking SPM dynamic library products into the preview host archiveDependencyTargets iterates every <Target>.build/ directory under the SPM bin path, archives its loose .o files, and appends -l<Target> to the preview link line. That's the right call for internal non-product library targets — SPM leaves them as loose .o files with no -module-link-name, so autolink can't pull their symbols on its own — but it is too aggressive for dynamic library products. SPM has already emitted lib<Target>.dylib next to the .build/ dir, and ld's -l search order prefers .dylib over .a, so the blanket -l<Target> burns a load command for the dynamic product into the preview dylib even when the consumer doesn't import it. swift-issue-reporting ships IssueReportingTestSupport as a .dynamic library product that links Testing.framework. Testing.framework is only available inside Xcode's test runner — not in the iOS simulator runtime — so once the preview dylib references libIssueReportingTestSupport.dylib the preview host dies on launch with "Library not loaded: @rpath/Testing.framework/Testing". Extract the skip decision into a nonisolated static helper shouldSkipDependencyTarget(targetName:consumerTargetName:binPath:fm:) that also returns true when lib<Target>.dylib exists in binPath. Swift's autolink via -module-link-name still handles linking from consumers that actually import the module, so consumers that do use a dynamic product are unaffected; consumers that don't import it stop dragging its transitive deps in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address code review nits on PR #146 - shouldSkipDependencyTarget: document the target-name-equals-product- name assumption. Renamed dynamic products would fall through to the archive-then-link fallback, which is what ran before this fix and doesn't re-trigger the original bug (blanket `-l<Target>` only resolves to `lib<Target>.dylib` when the names match). - BuildSystemTests: write the four-byte Mach-O 64-bit magic (MH_MAGIC_64, 0xFEEDFACF LE) into the dylib fixture so the test stays meaningful if the predicate ever tightens from `fileExists` to a real Mach-O check. - IOSHostBuilderTests: replace the hand-rolled two-loop byte search with `Data.range(of:) != nil`. One line, obviously correct, no pointer arithmetic to eyeball. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add end-to-end integration test for the dynamic-product link fix The five unit tests on shouldSkipDependencyTarget verify the predicate's decision table. Neither they nor the host-binary title grep actually exercise archiveDependencyTargets against real SPM output — a reviewer correctly pointed out that A-only tests would pass for a change that silently skips the predicate entirely. Add a synthetic-fixture integration test: write a minimal Package.swift at a tempdir with a sibling .library(type: .dynamic) product that the consumer target does NOT import, run the real SPMBuildSystem.build, and assert two load-bearing facts: - libSiblingDyn.dylib was actually produced in binPath (without the dylib there is no artifact to trigger the skip, and the `-l absent` assertion below would pass for the wrong reason); - context.compilerFlags does not contain -lSiblingDyn. Prefer a synthetic fixture over adding swift-issue-reporting to examples/spm: a local probe showed that plain `swift build` on a root package that only imports IssueReporting does not emit libIssueReportingTestSupport.dylib — SPM materialises it only when that product is explicitly requested, which PreviewsMCP's build path does not do. Using the real package would either fail to reproduce or require contorting the fixture to import IssueReportingTestSupport directly, which inverts the bug shape. Verified by temporarily stubbing the dylib check to return false: without the fix, compilerFlags contains `-lSiblingDyn` and the test fails with a clear diagnostic showing the full flag list; with the fix restored, it passes in ~2.5s on macOS. Chosen macOS (not iOS simulator) as the build triple because the bug mechanism is a platform-agnostic filesystem predicate over binPath. iOS was the motivation for the bug report — Testing.framework is absent from the simulator — but the regression we're guarding is the linker-flag emission, which reproduces on macOS without needing the iphonesimulator SDK, codesigning quirks, or a booted simulator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Wrap the iOS host error body in an explicit UIScrollView + UILabel UITextView with isScrollEnabled = true is internally a UIScrollView and should have handled long error payloads, but user testing showed the error still rendered unreadable — likely because UITextView's content-size heuristic and default textContainerInset behave subtly differently across iOS versions, and a short-but-long-line error could extend off-screen without the user seeing any scroll affordance. Swap to the conservative UIScrollView + UILabel pattern the user explicitly asked for. The label is pinned to the scroll view's contentLayoutGuide and its widthAnchor is tied to the frameLayoutGuide — the widthAnchor tie is the load-bearing piece: without it, long lines extend horizontally and no vertical scroll region ever appears. Bump body font to 15pt monospace for readability on device; keep the title at Dynamic Type headline. Lose selection-to-copy on the body (UILabel is not selectable by default) — acceptable cost for predictable rendering. If selection turns out to matter, a UIMenuInteraction on long-press is a cheaper add than wrestling UITextView's layout again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Always forward --headless from CLI to daemon RunCommand only sent the `headless` key when the flag was true, so omitting `--headless` dropped the key entirely. Both daemon paths (MCPServer.swift:268 macOS, :348 iOS) default `extractOptionalBool("headless") ?? true` when the key is absent — so a CLI user who did not pass `--headless` got a headless window anyway, and there was no way to request a visible simulator from the CLI at all. The flag was effectively a no-op. Always sending the value keeps the MCP-side default behavior backwards-compatible for clients that don't set headless, while letting the CLI actually honor the user's choice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * De-flake LogTests against cross-suite stderr interleaving CI run 25141775602 captured this on `info writes timestamped plain message to stderr`: "◇ Test \"XcodeBuildSystem parseBuildSettings stops at second target\" started.\n◇ Test \"readPackageName returns nil when target is not in the manifest\" started.\n◇ Test \"ProjectInfo decodes from xcodebuild -workspace -list JSON\" started.\n◇ Test \"BazelBuildSystem constructs label for nested file in package\" started.\n[00:58:56.716] hello world\n" The log line at the end is correctly formatted. The failure is that swift-testing emits its `◇ Test "…" started.` progress markers on stderr from concurrent suites, and `dup2(fileno(stderr))` in `captureStderr` is process-wide — `.serialized` only orders within the suite. The original `^…$`-anchored regex required the entire capture to be the log line, which is unprovable when other suites share the fd. Match `^…$` at the line scope instead. Splits the capture on `\n` and checks any line matches, via a small private `containsLine` helper. Format is still pinned (line-anchored regex), but the test no longer depends on no other test happening to write to stderr during the redirect window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Retry the daemon handshake when a sibling CLI kills it mid-call CI run 25142036360 (build-and-test) hit a race in the version-mismatch restart logic landed in 1b27286: two concurrent CLIs against a stale daemon both probe `canConnect()=true`, both call `openClient()`, but one wins the restart lock and SIGTERMs the daemon while the other is still inside `client.connect(transport:)`. The lock-loser's transport tears down mid-handshake and surfaces `MCPError.internalError("Transport not connected")` to the user as exit 1 — the opposite of the feature's "transparent from the user's point of view" promise. Failing capture from CI: ✘ Test "concurrent CLIs against a stale daemon restart it exactly once" recorded an issue at VersionHandshakeTests.swift:196:13: Expectation failed: (a.exitCode → 1) == 0 ↳ A stderr: Error: Internal error: Transport not connected Fix: in `DaemonClient.connect(...)`, if `openClient` throws and we didn't just spawn the daemon ourselves, wait for the socket to come back up (the sibling's respawn typically completes in <2s) and retry the handshake once. One retry is enough — a second failure means the daemon is genuinely gone, propagate. Skip the retry on the just-spawned branch; failure there is our own startup misbehaving and retrying would mask it. Verified by running `concurrentClientsRestartOnce` 5x locally — all green. Local scheduling doesn't reliably reproduce the CI race, so the real proof is the next CI run on this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address review nits: drop duplicate VoiceOver label, restore \n pin Two follow-ups from the PR review: - showError set `vc.view.accessibilityLabel = "Preview host error"` while also rendering a UILabel with the same text, so VoiceOver would announce the title twice (once from the container, once from the label). Drop the container-level label; the UILabel is already discoverable. - LogTests dropped the trailing-`\n` portion of the format pin when it switched from `^…\n$` (whole-capture) to per-line matching, so a regression that stopped emitting newlines wouldn't fail the test. Restore the terminator by matching `(?m)^<pattern>\n` against the raw capture instead of splitting on `\n` first — `split` collapses `"foo"` and `"foo\n"` to the same element, so the newline pin has to live in the regex. Patterns lose their `^…$` anchors at the call sites since the helper now applies line anchoring internally. Verified all three LogTests pass against the new helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Tell example READMEs to stop the macOS session before iOS The integration-test prompts in each example README walked from the macOS render straight into the iOS interaction step without an explicit preview_stop. That made it easy to leak the first session — which I did during a /integration-test run and only caught at the end via session_list. Adding a one-line stop instruction at the end of step 2 closes the foot-gun. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Instrument preview_stop to localize the CI hang in #156 The MCPIntegrationTests step on macOS CI started hanging for 1200s on random MacOSMCPTests cases. The per-instance MCP-server stderr dump shows the test reaches preview_stop, the daemon logs the callTool, and then both processes go silent for ~19 minutes until Swift Testing's .timeLimit fires. Even MCPTestServer's existing 60s pthread watchdog produced no output, so we couldn't tell whether closePreview was wedged, MainActor was deadlocked, or something else entirely. Adds: - handlePreviewStop: log entry, MainActor hop, existence check result, closePreview entry/exit. The hang in the silent window will now bracket between two of these lines. - HostApp.closePreview: log entry, watcher/session removal, the orderOut(_:) call that's the only AppKit-touching point in the function, and exit. orderOut is the prime suspect since it's the one call that could synchronously dispatch to other threads. - MCPTestServer.emitWatchdogHeartbeat: also append to the per-instance stderr log file (which the GH Actions "Dump MCP server stderr" step cats post-mortem) in addition to writing to the test-process stderr. In the failed run the stderr-buffered heartbeats were never flushed, so the file-backed copy is the more reliable channel for hangs. This is observation infrastructure, not a fix. Once a reproducing CI run lands, the new log lines should pinpoint where preview_stop wedges. Refs: #156 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Trace each phase of MCPTestServer.stop() for #156 The first instrumentation round narrowed the hang to the test side: the daemon completes preview_stop in ~3 ms, then the test process wedges for 1200 s before Swift Testing's .timeLimit fires. The wedge is somewhere inside the deferred MCPTestServer.stop() — most likely process.waitUntilExit() (SIGTERM not killing the daemon) or the SDK's client.disconnect() — but the existing code logs nothing as it walks those phases. Adds a small trace helper that, for each phase of stop(), writes one line to both the test-process stderr and the per-instance stderr log file using a raw open(2)/write(2) pair. The raw FDs sidestep any FileHandle / Foundation locking that the wedge might also be holding, so even a deeply hung test should leave a partial phase trace before GH Actions kills the step at 30 min. The next failed CI dump should show "[stop ...] process.terminate" or "[stop ...] process.waitUntilExit (begin)" without the matching "(returned)" line — that names the phase that's stuck. Refs: #156 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Cap MCPTestServer.callTool at 60s and de-recurse the watchdog tail Two rounds of CI instrumentation for #156 narrowed the hang to the MCP Swift SDK: the daemon services a request, but the test client's `await client.callTool(...)` continuation never resumes. With the existing bare-callTool plumbing (no per-call timeout), one dropped response wedges the test for the full 1200 s `.timeLimit`. Two improvements: 1. Default timeout on bare callTool. The previous "no timeout" overload is now a thin wrapper over `callToolWithTimeout(timeout: .seconds(60))`, so any of the ~30 call sites in MacOSMCPTests that hit the SDK transport-drop fail in 60 s instead of 20 minutes. The thrown `MCPTestError.timedOut` already carries the tool name, which is the missing piece — previous hangs only told us "somewhere in the test," not which call. 60 s is wide enough for compileCombined (~3 s) plus headroom for variants/configure recompile chains. 2. Watchdog tail no longer self-recurses. emitWatchdogHeartbeat reads the per-instance log to dump the last 5 lines as context. After commit 990b18e started writing heartbeats to that same file, each subsequent tick was including prior `[MCPTestServer watchdog]` markers and `[/watchdog]` trailers in its own tail — by t=180s the dump was nested heartbeats with no useful daemon content. Filter those lines out before taking the suffix so each tick reports actual daemon activity (or its absence). Refs: #156 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Bound MCPTestServer.stop's waitUntilExit, escalate to SIGKILL CI dump from run 25227229628 attempt 3 finally pinned the wedge: ◇ Test "File edit triggers hot reload (literal-only fast path)" started. ← 19:27:32 [stop 19:27:40.862] enter ← body done in 8s [stop 19:27:40.862] process.terminate [stop 19:27:40.862] process.waitUntilExit (begin) ← SIGTERM sent ✘ ... Time limit was exceeded: 1200.000 seconds ← 19:47:32 ##[error]The action 'MCP integration tests' has timed out after 30 minutes. The daemon doesn't exit after SIGTERM. process.waitUntilExit() then blocks for the full per-test .timeLimit (1200 s), which is what the last four CI flakes have been hitting. The earlier hypotheses (SDK transport drop, MainActor deadlock, stop() phase issue) were downstream symptoms; the actual wedge is the unbounded wait for a process that isn't going to exit on its own. Replace the bare waitUntilExit with a 5 s polling deadline; on expiry, send SIGKILL and then waitUntilExit (which returns instantly after SIGKILL since the process has been forcibly reaped). The total extra delay on the happy path is at most one 50 ms sleep tick. This is a band-aid: the daemon ignoring SIGTERM is a real bug worth fixing on the daemon side (likely an NSApplication runloop or signal handler issue exposed by the post-#153 timing). But the band-aid turns a 30-minute opaque CI hang into a ~5 s blip with a clear "SIGTERM ignored" trace line, so the next investigation has data to work with rather than a black box. Refs: #156 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #100.
Summary
DaemonClient.spawnDaemonresolved the binary to launch viaProcessInfo.processInfo.arguments[0]. argv[0] is unreliable in three ways:./previewsmcplands asargv[0] = "./previewsmcp".URL(fileURLWithPath:).standardizedFileURLresolves it against the current working directory — fine today since the CLI doesn't chdir, but a tripwire for any future code that does.previewsmcp. Resolution against CWD silently fails.execvelets the parent set argv[0] to anything — wrapper scripts, launchers, misbehaving init systems. Not a security boundary, but adds weird-environment failures.The version-mismatch respawn logic from #142 makes this load-bearing — every transparent daemon respawn re-runs
spawnDaemon, so a fragile self-path shows up as a respawn loop or a spuriousbinaryNotFoundagainst a CLI that's actually on disk. The existing code atDaemonClient.swift:380-385already had a comment pointing at #100 acknowledging this.MCPServer.swiftalready had a working_NSGetExecutablePath-backed helper forpreview_build_info(added in #150), file-private, with a section comment that explicitly named #100 as the issue to migrate the daemon path to it. This change extracts that helper into module-internalSources/PreviewsCLI/SelfPath.swiftand points both call sites at it. The kernel records the executing binary's path atexecvetime; this primitive reads it back regardless of CWD or argv[0].Stale comments in
DaemonClient.swiftreferencing argv[0] (lines 43-45 and 422-426) updated to reflect the new authoritative resolution.Test plan
Tests/PreviewsCLITests/SelfPathTests.swiftcovers the property argv[0] lacked:returns an absolute, executable path— proves no relative-path outputreturns the same value across calls— determinisminvariant under chdir — the property argv[0] lacks— captures path, chdir to /tmp, calls again, asserts equalThe suite is
@Suite(.serialized)because the chdir test mutates process-global state.swift buildcleanswift test --filter SelfPath— 3/3 passswift test— pre-existing flakes confirmed unrelated by re-running each in isolation: StallTimer, AsyncProcess, IOSCLIWorkflowTests, IOSMCPTests all pass clean. Same flakes have hit other recent PRs (see Make MCP hot-reload test diagnosable; split by reload path #126, Make MCPTestServer.withTimeout starvation-immune via pthread timer #136, Eliminate iOS simulator test contention: wait for boot + distinct device per test #141).🤖 Generated with Claude Code