Make MCPTestServer.withTimeout starvation-immune via pthread timer#136
Merged
Make MCPTestServer.withTimeout starvation-immune via pthread timer#136
Conversation
The prior implementation raced `body` against `Task.sleep(for:)` inside a `withThrowingTaskGroup`. When the MCP SDK's transport loop busy-spins on a wedged server — the exact failure mode in issue #135 — the cooperative thread pool is exhausted and `Task.sleep` never resumes to throw, so the 30s `callToolWithTimeout` deadline is silently missed. CI runs 72323677364, 72328816376, and 72345678664 all died that way: 10 minutes of no output, then Swift Testing's outer `.timeLimit` kill, then the 15-minute GHA step truncation — no diagnostic context produced. This is Phase 3 of the implementation plan for issue #135. Replace the task-group with a `withCheckedThrowingContinuation` + two racers: 1. Body path: ordinary Task that resumes the continuation on success/failure. 2. Timeout path: `Thread.detachNewThread` (real kernel thread, not GCD, not Swift concurrency) that `Thread.sleep`s for the duration, then calls `process.terminate()` on the server subprocess and resumes the continuation with `TestTimeoutSentinel`. `OSAllocatedUnfairLock<Bool>` guards exactly-once resume across the race. `CheckedContinuation.resume` is a synchronous kernel primitive with no cooperative-pool dependency, so the caller is unblocked regardless of pool state. The body Task may remain suspended forever if its pending MCP continuation is never drained — acceptable, because the caller has already been resumed via the continuation, the subprocess has been terminated so the next test starts clean, and the process exits shortly after. New regression test `withTimeoutFiresUnderStarvation` occupies a cooperative thread with `sleep(60)` (POSIX, no async availability annotation unlike `Thread.sleep`) as the body, and asserts the pthread still fires within `timeout + 3s` and terminates the subprocess. Passes in 2.013s locally against a 2s budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ol-based SDK refs Review of #136 flagged three items, all addressed here: - Important: the regression test's dummy `/bin/sleep` subprocess inherited the parent's stdout/stderr. MCPTestServer.start()'s own comment warns this pattern previously wedged CI runners on macOS 15. Route dummy stdio to `FileHandle.nullDevice` to match the hardened pattern. - Nit: `Duration.components.seconds + .attoseconds/1e18` was duplicated in withTimeout and the regression test. Extract `Duration.asTimeInterval` (internal to the test target, colocated with `MCPTestError`). - Nit: withTimeout's docstring referenced swift-sdk `Client.swift:234-276` and `:304-342`. Those line numbers rot on every SDK bump. Replace with the stable symbols `Client.handleMessage(_:)`, `Client.disconnect()`, and `pendingRequests`. Behavior unchanged; all 6 MacOSMCPTests still pass in ~68s locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 21, 2026
obj-p
added a commit
that referenced
this pull request
Apr 22, 2026
Phase 4A (same PR, prior commit) added markers that caught the
post-reload wedge on the first CI run after merge. Run 72439165932's
server stderr dump showed:
[snapshot] enter
[snapshot] pre session-check
[snapshot] post session-check ← last marker before silence
The next marker `[snapshot] post 300ms sleep` never fires. The ONLY
code between those two markers is:
guard isMacOSSession else { ... } // no await
try await Task.sleep(for: .milliseconds(300)) // ← wedges here
The hang is `Task.sleep`, which ultimately depends on Swift's
cooperative scheduler to resume. On a hot-reload test that polls
snapshots rapidly without swiftc breaks (the literal-only path in
particular — dozens of rapid snapshot calls per `awaitSnapshotChange`
loop), the pool accumulates enough load that the timer's continuation
stays unscheduled. Test `hotReloadLiteralOnly` hit this at around
the 8th snapshot cycle on CI run 72439165932.
This is exactly the failure mode issue #135 described — and the
reason PR #136 built a pthread-based timeout for the test harness.
Same pattern, different place.
The sleep itself is unnecessary: `Snapshot.capture` calls
`NSView.cacheDisplay(in:to:)`, which forces layout synchronously
when the view is dirty. The MCP client→server round trip already
gives the UI plenty of time to settle between view mount and
capture. Verified: all 7 MacOSMCPTests pass without the sleep —
including `hotReloadLiteralOnly` and `hotReloadStructural`, both
of which assert image bytes actually change after the reload.
The sleep was added in the initial commit (820b0cd) with the
generic comment "Wait briefly for SwiftUI to finish layout". No
evidence it was ever load-bearing; appears to have been defensive
paranoia that turned load-bearing only as an unintended starvation
trigger.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
obj-p
added a commit
that referenced
this pull request
Apr 22, 2026
) * Add targeted stderr instrumentation in preview_snapshot + reload path Phase 4A of issue #135. Phases 1-3 catch the daemon post-reload wedge (CLI stall-detects at 30s, test harness pthread-terminates at 30s), but we still don't know WHERE the handler hangs. The server stderr dump from PR #134 currently goes silent after "Reloaded!", leaving root-cause analysis speculative. This PR adds ~8 targeted `fputs` markers through `handlePreviewSnapshot` and one after the reload's `MainActor.run` block. Each marker writes to the server subprocess's stderr — captured in the per-instance temp file and dumped by PR #134's post-failure CI step. When the next flake hits, the CI log will show exactly how far the handler got before the wedge: [snapshot] enter [snapshot] pre session-check ← MainActor queue blocked here? [snapshot] post session-check [snapshot] post 300ms sleep ← Task.sleep problem? [snapshot] main-actor capture enter ← MainActor work starts [snapshot] main-actor capture done ← Snapshot.capture returned [snapshot] encoding [snapshot] returning Plus on the reload side, after the MainActor.run block containing `loadPreview`: [reload] main-actor block returned No behavior change. `fputs` + `fflush` is synchronous libc, no Swift concurrency involvement, no allocations beyond the literal strings. Verified locally: the full 7-stage sequence appears on the happy path of `hotReloadStructural`. Phase 4B (the actual fix) is gated on seeing what a real CI failure shows. This PR is the diagnostic prerequisite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address review: add throw-path marker, TODO tag, user-visibility note Round-1 review of #139 flagged three optional polishes worth applying before merge: - The `[snapshot] main-actor capture enter` marker fires on entry but the `captureFailed` throw path (when `host.window(for:)` returns nil) exits the MainActor block without a matching marker. Added a do/catch around the block with a `[snapshot] main-actor capture threw: ...` marker. This way the CI dump shows we exited the block one way or another, even on error paths. - Added a `TODO(#135-4B)` tag on the instrumentation's doc comment so the cleanup obligation is grep-visible when Phase 4B lands. - Expanded the doc comment with two clarifications the reviewer verified: (1) markers are not user-visible because `DaemonClient.registerStderrLogForwarder` forwards MCP `LogMessageNotification` payloads, not raw daemon stderr; (2) macOS snapshot handler only — iOS returns early above. No behavior change. `swift test --filter hotReloadStructural` passes in 8.5s; manually verified the new throw-path marker does not fire on the happy path (only on the failure path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Remove 300ms pre-capture Task.sleep (Phase 4B root-cause fix for #135) Phase 4A (same PR, prior commit) added markers that caught the post-reload wedge on the first CI run after merge. Run 72439165932's server stderr dump showed: [snapshot] enter [snapshot] pre session-check [snapshot] post session-check ← last marker before silence The next marker `[snapshot] post 300ms sleep` never fires. The ONLY code between those two markers is: guard isMacOSSession else { ... } // no await try await Task.sleep(for: .milliseconds(300)) // ← wedges here The hang is `Task.sleep`, which ultimately depends on Swift's cooperative scheduler to resume. On a hot-reload test that polls snapshots rapidly without swiftc breaks (the literal-only path in particular — dozens of rapid snapshot calls per `awaitSnapshotChange` loop), the pool accumulates enough load that the timer's continuation stays unscheduled. Test `hotReloadLiteralOnly` hit this at around the 8th snapshot cycle on CI run 72439165932. This is exactly the failure mode issue #135 described — and the reason PR #136 built a pthread-based timeout for the test harness. Same pattern, different place. The sleep itself is unnecessary: `Snapshot.capture` calls `NSView.cacheDisplay(in:to:)`, which forces layout synchronously when the view is dirty. The MCP client→server round trip already gives the UI plenty of time to settle between view mount and capture. Verified: all 7 MacOSMCPTests pass without the sleep — including `hotReloadLiteralOnly` and `hotReloadStructural`, both of which assert image bytes actually change after the reload. The sleep was added in the initial commit (820b0cd) with the generic comment "Wait briefly for SwiftUI to finish layout". No evidence it was ever load-bearing; appears to have been defensive paranoia that turned load-bearing only as an unintended starvation trigger. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 3 of the issue #135 implementation plan. Unblocks #133's CI by making the test-harness timeout primitive fire even when the Swift cooperative thread pool is pegged in the MCP SDK's busy-spin. First of three PRs; Phases 1 and 2 (daemon-side heartbeat + client-side stall detection for production callers) will follow.
Problem
MCPTestServer.withTimeoutracedbodyagainstTask.sleep(for:)inside awithThrowingTaskGroup. When the MCP SDK's transport loop is busy-spinning on a wedged server — the pattern in issue #135 — the cooperative thread pool is exhausted andTask.sleepnever resumes to throw. The 30scallToolWithTimeoutdeadline silently passes; only Swift Testing's outer.timeLimit(.minutes(10))catches it, by which point 10 minutes of CI have been burned with no diagnostic context.Three CI runs hit this in the past 36 hours: 72323677364, 72328816376, 72345678664.
Fix
withTimeoutis restructured around awithCheckedThrowingContinuationwith two racers:Taskthat resumes on success/failure.Thread.detachNewThread(real kernel thread — not GCD, not Swift concurrency) that `Thread.sleep`s forduration, then callsprocess.terminate()on the server subprocess and resumes the continuation withTestTimeoutSentinel.OSAllocatedUnfairLock<Bool>serializes exactly-once resume across the race.CheckedContinuation.resumeis a synchronous kernel primitive, so the caller unblocks regardless of pool state.Tradeoff
The body `Task` may remain suspended forever if its pending MCP continuation is never drained (the SDK's transport EOF does not resume
pendingRequests— see `Client.swift:234-276` vs `:304-342`; only explicit `disconnect()` drains them, and that's itself async, so we can't run it from the pthread). This leak is accepted because:What this PR does NOT do
DaemonClient— CLI and MCP agents still lack stall detection (Phase 2 is the PR after).This is strictly test-harness hardening that gets CI green and unblocks #133.
Test plan
swift build— cleanswift-format lint --strict --recursive Sources/ Tests/ examples/— cleanswift test --filter MacOSMCPTests— 6/6 green locally, 67.241s totalwithTimeoutFiresUnderStarvation: body occupies a cooperative thread with POSIXsleep(60)(no yield), pthread fires at 2s budget, test asserts elapsed ∈ [2,5) and subprocess.isRunning == false. Passes in 2.013s.Related
previewsmcp logssubcommand and promote Debugging section #133 — blocked on CI; this unblocks it.MCPTestServer.swift.🤖 Generated with Claude Code