Make MCPTestServer.withTimeout starvation-immune via pthread timer by obj-p · Pull Request #136 · obj-p/PreviewsMCP

obj-p · 2026-04-21T20:25:48Z

Phase 3 of the issue #135 implementation plan. Unblocks #133's CI by making the test-harness timeout primitive fire even when the Swift cooperative thread pool is pegged in the MCP SDK's busy-spin. First of three PRs; Phases 1 and 2 (daemon-side heartbeat + client-side stall detection for production callers) will follow.

Problem

MCPTestServer.withTimeout raced body against Task.sleep(for:) inside a withThrowingTaskGroup. When the MCP SDK's transport loop is busy-spinning on a wedged server — the pattern in issue #135 — the cooperative thread pool is exhausted and Task.sleep never resumes to throw. The 30s callToolWithTimeout deadline silently passes; only Swift Testing's outer .timeLimit(.minutes(10)) catches it, by which point 10 minutes of CI have been burned with no diagnostic context.

Three CI runs hit this in the past 36 hours: 72323677364, 72328816376, 72345678664.

Fix

withTimeout is restructured around a withCheckedThrowingContinuation with two racers:

Body path: ordinary Task that resumes on success/failure.
Timeout path: Thread.detachNewThread (real kernel thread — not GCD, not Swift concurrency) that `Thread.sleep`s for duration, then calls process.terminate() on the server subprocess and resumes the continuation with TestTimeoutSentinel.

OSAllocatedUnfairLock<Bool> serializes exactly-once resume across the race. CheckedContinuation.resume is a synchronous kernel primitive, so the caller unblocks regardless of pool state.

Tradeoff

The body `Task` may remain suspended forever if its pending MCP continuation is never drained (the SDK's transport EOF does not resume pendingRequests — see `Client.swift:234-276` vs `:304-342`; only explicit `disconnect()` drains them, and that's itself async, so we can't run it from the pthread). This leak is accepted because:

The caller has already been resumed via the continuation — the test body completes.
The subprocess is terminated by the pthread, so subsequent tests start clean.
The process exits shortly after the test run.

What this PR does NOT do

No change to daemon-side code (Phase 1 heartbeat is the next PR).
No change to client-side DaemonClient — CLI and MCP agents still lack stall detection (Phase 2 is the PR after).
No root-cause fix for the daemon wedge itself — see issue Daemon can become non-responsive after hot-reload; MCP client has no liveness detection #135's Phase 4 for that investigation.

This is strictly test-harness hardening that gets CI green and unblocks #133.

Test plan

swift build — clean
swift-format lint --strict --recursive Sources/ Tests/ examples/ — clean
swift test --filter MacOSMCPTests — 6/6 green locally, 67.241s total
New regression test withTimeoutFiresUnderStarvation: body occupies a cooperative thread with POSIX sleep(60) (no yield), pthread fires at 2s budget, test asserts elapsed ∈ [2,5) and subprocess.isRunning == false. Passes in 2.013s.
CI on this PR — confirms nothing else regressed and Phase 3 doesn't introduce new flakes.

Issue Daemon can become non-responsive after hot-reload; MCP client has no liveness detection #135 — full root-cause analysis and implementation plan.
PR Add previewsmcp logs subcommand and promote Debugging section #133 — blocked on CI; this unblocks it.
PR Add watchdog diagnostic for MCP integration test hangs #134 — adds the watchdog heartbeat diagnostic; Phase 3 here is orthogonal (different file region) but both land in MCPTestServer.swift.

🤖 Generated with Claude Code

The prior implementation raced `body` against `Task.sleep(for:)` inside a `withThrowingTaskGroup`. When the MCP SDK's transport loop busy-spins on a wedged server — the exact failure mode in issue #135 — the cooperative thread pool is exhausted and `Task.sleep` never resumes to throw, so the 30s `callToolWithTimeout` deadline is silently missed. CI runs 72323677364, 72328816376, and 72345678664 all died that way: 10 minutes of no output, then Swift Testing's outer `.timeLimit` kill, then the 15-minute GHA step truncation — no diagnostic context produced. This is Phase 3 of the implementation plan for issue #135. Replace the task-group with a `withCheckedThrowingContinuation` + two racers: 1. Body path: ordinary Task that resumes the continuation on success/failure. 2. Timeout path: `Thread.detachNewThread` (real kernel thread, not GCD, not Swift concurrency) that `Thread.sleep`s for the duration, then calls `process.terminate()` on the server subprocess and resumes the continuation with `TestTimeoutSentinel`. `OSAllocatedUnfairLock<Bool>` guards exactly-once resume across the race. `CheckedContinuation.resume` is a synchronous kernel primitive with no cooperative-pool dependency, so the caller is unblocked regardless of pool state. The body Task may remain suspended forever if its pending MCP continuation is never drained — acceptable, because the caller has already been resumed via the continuation, the subprocess has been terminated so the next test starts clean, and the process exits shortly after. New regression test `withTimeoutFiresUnderStarvation` occupies a cooperative thread with `sleep(60)` (POSIX, no async availability annotation unlike `Thread.sleep`) as the body, and asserts the pthread still fires within `timeout + 3s` and terminates the subprocess. Passes in 2.013s locally against a 2s budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ol-based SDK refs Review of #136 flagged three items, all addressed here: - Important: the regression test's dummy `/bin/sleep` subprocess inherited the parent's stdout/stderr. MCPTestServer.start()'s own comment warns this pattern previously wedged CI runners on macOS 15. Route dummy stdio to `FileHandle.nullDevice` to match the hardened pattern. - Nit: `Duration.components.seconds + .attoseconds/1e18` was duplicated in withTimeout and the regression test. Extract `Duration.asTimeInterval` (internal to the test target, colocated with `MCPTestError`). - Nit: withTimeout's docstring referenced swift-sdk `Client.swift:234-276` and `:304-342`. Those line numbers rot on every SDK bump. Replace with the stable symbols `Client.handleMessage(_:)`, `Client.disconnect()`, and `pendingRequests`. Behavior unchanged; all 6 MacOSMCPTests still pass in ~68s locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 4A (same PR, prior commit) added markers that caught the post-reload wedge on the first CI run after merge. Run 72439165932's server stderr dump showed: [snapshot] enter [snapshot] pre session-check [snapshot] post session-check ← last marker before silence The next marker `[snapshot] post 300ms sleep` never fires. The ONLY code between those two markers is: guard isMacOSSession else { ... } // no await try await Task.sleep(for: .milliseconds(300)) // ← wedges here The hang is `Task.sleep`, which ultimately depends on Swift's cooperative scheduler to resume. On a hot-reload test that polls snapshots rapidly without swiftc breaks (the literal-only path in particular — dozens of rapid snapshot calls per `awaitSnapshotChange` loop), the pool accumulates enough load that the timer's continuation stays unscheduled. Test `hotReloadLiteralOnly` hit this at around the 8th snapshot cycle on CI run 72439165932. This is exactly the failure mode issue #135 described — and the reason PR #136 built a pthread-based timeout for the test harness. Same pattern, different place. The sleep itself is unnecessary: `Snapshot.capture` calls `NSView.cacheDisplay(in:to:)`, which forces layout synchronously when the view is dirty. The MCP client→server round trip already gives the UI plenty of time to settle between view mount and capture. Verified: all 7 MacOSMCPTests pass without the sleep — including `hotReloadLiteralOnly` and `hotReloadStructural`, both of which assert image bytes actually change after the reload. The sleep was added in the initial commit (820b0cd) with the generic comment "Wait briefly for SwiftUI to finish layout". No evidence it was ever load-bearing; appears to have been defensive paranoia that turned load-bearing only as an unintended starvation trigger. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

) * Add targeted stderr instrumentation in preview_snapshot + reload path Phase 4A of issue #135. Phases 1-3 catch the daemon post-reload wedge (CLI stall-detects at 30s, test harness pthread-terminates at 30s), but we still don't know WHERE the handler hangs. The server stderr dump from PR #134 currently goes silent after "Reloaded!", leaving root-cause analysis speculative. This PR adds ~8 targeted `fputs` markers through `handlePreviewSnapshot` and one after the reload's `MainActor.run` block. Each marker writes to the server subprocess's stderr — captured in the per-instance temp file and dumped by PR #134's post-failure CI step. When the next flake hits, the CI log will show exactly how far the handler got before the wedge: [snapshot] enter [snapshot] pre session-check ← MainActor queue blocked here? [snapshot] post session-check [snapshot] post 300ms sleep ← Task.sleep problem? [snapshot] main-actor capture enter ← MainActor work starts [snapshot] main-actor capture done ← Snapshot.capture returned [snapshot] encoding [snapshot] returning Plus on the reload side, after the MainActor.run block containing `loadPreview`: [reload] main-actor block returned No behavior change. `fputs` + `fflush` is synchronous libc, no Swift concurrency involvement, no allocations beyond the literal strings. Verified locally: the full 7-stage sequence appears on the happy path of `hotReloadStructural`. Phase 4B (the actual fix) is gated on seeing what a real CI failure shows. This PR is the diagnostic prerequisite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address review: add throw-path marker, TODO tag, user-visibility note Round-1 review of #139 flagged three optional polishes worth applying before merge: - The `[snapshot] main-actor capture enter` marker fires on entry but the `captureFailed` throw path (when `host.window(for:)` returns nil) exits the MainActor block without a matching marker. Added a do/catch around the block with a `[snapshot] main-actor capture threw: ...` marker. This way the CI dump shows we exited the block one way or another, even on error paths. - Added a `TODO(#135-4B)` tag on the instrumentation's doc comment so the cleanup obligation is grep-visible when Phase 4B lands. - Expanded the doc comment with two clarifications the reviewer verified: (1) markers are not user-visible because `DaemonClient.registerStderrLogForwarder` forwards MCP `LogMessageNotification` payloads, not raw daemon stderr; (2) macOS snapshot handler only — iOS returns early above. No behavior change. `swift test --filter hotReloadStructural` passes in 8.5s; manually verified the new throw-path marker does not fire on the happy path (only on the failure path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Remove 300ms pre-capture Task.sleep (Phase 4B root-cause fix for #135) Phase 4A (same PR, prior commit) added markers that caught the post-reload wedge on the first CI run after merge. Run 72439165932's server stderr dump showed: [snapshot] enter [snapshot] pre session-check [snapshot] post session-check ← last marker before silence The next marker `[snapshot] post 300ms sleep` never fires. The ONLY code between those two markers is: guard isMacOSSession else { ... } // no await try await Task.sleep(for: .milliseconds(300)) // ← wedges here The hang is `Task.sleep`, which ultimately depends on Swift's cooperative scheduler to resume. On a hot-reload test that polls snapshots rapidly without swiftc breaks (the literal-only path in particular — dozens of rapid snapshot calls per `awaitSnapshotChange` loop), the pool accumulates enough load that the timer's continuation stays unscheduled. Test `hotReloadLiteralOnly` hit this at around the 8th snapshot cycle on CI run 72439165932. This is exactly the failure mode issue #135 described — and the reason PR #136 built a pthread-based timeout for the test harness. Same pattern, different place. The sleep itself is unnecessary: `Snapshot.capture` calls `NSView.cacheDisplay(in:to:)`, which forces layout synchronously when the view is dirty. The MCP client→server round trip already gives the UI plenty of time to settle between view mount and capture. Verified: all 7 MacOSMCPTests pass without the sleep — including `hotReloadLiteralOnly` and `hotReloadStructural`, both of which assert image bytes actually change after the reload. The sleep was added in the initial commit (820b0cd) with the generic comment "Wait briefly for SwiftUI to finish layout". No evidence it was ever load-bearing; appears to have been defensive paranoia that turned load-bearing only as an unintended starvation trigger. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

obj-p and others added 2 commits April 21, 2026 16:25

obj-p merged commit 8a25f5f into main Apr 21, 2026
4 checks passed

obj-p deleted the fix/mcp-test-pthread-timeout branch April 21, 2026 21:46

This was referenced Apr 21, 2026

Add daemon-global 2s heartbeat notification (Phase 1 of issue #135) #137

Merged

Add client-side stall detection via StallTimer (Phase 2 of issue #135) #138

Merged

Phase 4 (#135): instrumentation + remove starving 300ms Task.sleep #139

Merged

obj-p mentioned this pull request Apr 22, 2026

Eliminate iOS simulator test contention: wait for boot + distinct device per test #141

Merged

5 tasks

obj-p mentioned this pull request Apr 30, 2026

Use _NSGetExecutablePath for daemon self-path resolution (#100) #153

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make MCPTestServer.withTimeout starvation-immune via pthread timer#136

Make MCPTestServer.withTimeout starvation-immune via pthread timer#136
obj-p merged 2 commits intomainfrom
fix/mcp-test-pthread-timeout

obj-p commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

obj-p commented Apr 21, 2026

Problem

Fix

Tradeoff

What this PR does NOT do

Test plan

Related

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant