Add watchdog diagnostic for MCP integration test hangs by obj-p · Pull Request #134 · obj-p/PreviewsMCP

obj-p · 2026-04-21T15:25:31Z

Summary

Five MacOSMCPTests failures in the past 36 hours across four different tests have the same signature: Time limit was exceeded: 600.000 seconds, zero intermediate output, step killed by the 15-minute GHA timeout. The test's internal timeouts (30s/90s with Issue.record-on-throw) never fire because the Swift concurrency cooperative pool is starved — so their Task.sleep-based timers can't resume.

This PR adds a GCD-based watchdog on every `MCPTestServer` that survives cooperative-pool starvation and makes the next failure actually diagnosable.

Evidence

Run	PR	Test that timed out silently
72166579769	#130 attempt	`preview_variants captures multiple configurations`
72170918105	#131 attempt	`structuredContent payloads decode for each migrated tool`
72234336170	#132 attempt	`preview_variants captures multiple configurations`
72323677364	#133 run 1	`File edit triggers hot reload (structural recompile path)`
72328816376	#133 run 2	`File edit triggers hot reload (structural recompile path)`

All five land on the same `MacOSMCPTests.swift` test-function decorator line, never reach the test body's `Issue.record` paths, and get killed by the 15-minute step timeout. Not reproducible locally — all 12 MCP tests pass in ~65s.

What this changes

Watchdog in `MCPTestServer`

`DispatchSourceTimer` on `DispatchQueue.global(qos: .utility)` fires every 60s.
Handler reads the captured per-instance stderr log and writes `[MCPTestServer watchdog t=Xs] alive — server stderr tail: …` to the test process's own stderr via `fputs` + `fflush` (blocking I/O, thread-pool-independent).
`stop()` cancels the timer; `[weak server]` handler is a no-op if the server has been released.

On the happy path, zero heartbeats — tests that complete in under 60s never emit. On a hang, one line per minute gives a minute-by-minute trail of what the server subprocess was doing.

CI post-failure dump

`build-and-test` now has a "Dump MCP server stderr on failure" step mirroring the existing CLI-step equivalent. It concatenates every `previewsmcp-test-*.log` (per-instance stderr file `MCPTestServer` creates) plus the daemon's `serve.log` into job logs.

AGENTS.md

One-line note under "CI-specific concerns" documenting the watchdog and the "do not replace with `Task.sleep`" invariant.

Deliberately out of scope

Not attempting to fix the starvation itself. The pattern (5 different tests hit, all at 600s exactly) points to a systemic issue — likely the MCP SDK's `AsyncThrowingStream` busy-spin mentioned in `MCPTestServer.stop()`'s existing comment, or an accumulation across serialized tests. But (a) it doesn't reproduce locally, (b) any speculative fix can't be validated without CI cycles, and (c) the diagnostic unlocks targeted investigation once we see what the server was actually doing.

Test plan

`swift build` — clean
`swift test --filter "MCPIntegrationTests" --skip "IOSMCPTests"` — 12/12 pass, 65s total. No heartbeats emitted (all tests < 60s). Expected.
CI on this PR — confirms watchdog doesn't break anything and the post-failure dump step works when a test legitimately fails.
Next CI flake on any PR — heartbeat output in job logs (`[MCPTestServer watchdog t=60s] ...`) and per-instance stderr dump in post-failure step.

Blocks

Intended to merge before PR #133, which is held because the hot-reload flake is unrelated to that PR's code but blocks its CI.

🤖 Generated with Claude Code

Every few PRs lately, a MacOSMCPTests case has timed out at exactly 600s on CI with zero output: the Swift Testing `.timeLimit(.minutes(10))` fires after ten minutes of silence, then the GHA step kill truncates the run three minutes later. The test's own internal timeouts never get a chance to dump the server stderr they were designed to, because when the cooperative thread pool starves the `Task.sleep(for:)` inside the timeout primitive can't resume — so `Issue.record` paths never run and CI failure logs contain nothing actionable (see runs 72166579769, 72170918105, 72234336170, 72323677364, 72328816376 — five failures across four different MCP tests in 36 hours). This change makes the next occurrence debuggable without attempting a speculative fix for the underlying starvation. Each `MCPTestServer` now schedules a `DispatchSourceTimer` on a libdispatch queue (not the Swift cooperative pool) that fires every 60s and writes the server subprocess's elapsed runtime plus the tail of its captured stderr to the test process's own stderr via a blocking `fputs`. Tests that complete in under 60s are silent on the happy path; a hanging test produces one heartbeat per minute with concrete context for post-mortem. CI also gets a new post-failure step that dumps every `previewsmcp-test-*.log` under `$TMPDIR` / `/tmp`, mirroring the existing daemon-log dump on the CLI step so the per-server stderr files survive as job logs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The first attempt used DispatchSource.makeTimerSource with qos .utility. CI run 72345678664 hit the exact hang this watchdog was built for — MacOSMCPTests.hotReloadStructural timed out silently at 600s — and the watchdog emitted no heartbeats. libdispatch's utility-QoS worker threads are the first to starve when a busy-spin pegs the cores; no scheduling guarantees kept the timer alive. A detached Thread (`Thread.detachNewThread`) is a real kernel thread scheduled by the OS rather than libdispatch, and `Thread.sleep` blocks in the kernel instead of suspending a cooperative task. That sidesteps QoS starvation entirely. The cancellation flag moves from `DispatchSourceTimer.cancel()` to an `OSAllocatedUnfairLock<Bool>` the thread polls between sleeps. The post-failure stderr dump step added in the same PR worked as intended — it's what revealed the actual root cause: the daemon successfully compiled and reloaded ("Compiled:" + "Reloaded!" in server stderr) but MCPTestServer's subsequent preview_snapshot callTool never returned. That's a separate bug in the MCP client / transport layer and will be investigated with the better diagnostic now in place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

) * Add targeted stderr instrumentation in preview_snapshot + reload path Phase 4A of issue #135. Phases 1-3 catch the daemon post-reload wedge (CLI stall-detects at 30s, test harness pthread-terminates at 30s), but we still don't know WHERE the handler hangs. The server stderr dump from PR #134 currently goes silent after "Reloaded!", leaving root-cause analysis speculative. This PR adds ~8 targeted `fputs` markers through `handlePreviewSnapshot` and one after the reload's `MainActor.run` block. Each marker writes to the server subprocess's stderr — captured in the per-instance temp file and dumped by PR #134's post-failure CI step. When the next flake hits, the CI log will show exactly how far the handler got before the wedge: [snapshot] enter [snapshot] pre session-check ← MainActor queue blocked here? [snapshot] post session-check [snapshot] post 300ms sleep ← Task.sleep problem? [snapshot] main-actor capture enter ← MainActor work starts [snapshot] main-actor capture done ← Snapshot.capture returned [snapshot] encoding [snapshot] returning Plus on the reload side, after the MainActor.run block containing `loadPreview`: [reload] main-actor block returned No behavior change. `fputs` + `fflush` is synchronous libc, no Swift concurrency involvement, no allocations beyond the literal strings. Verified locally: the full 7-stage sequence appears on the happy path of `hotReloadStructural`. Phase 4B (the actual fix) is gated on seeing what a real CI failure shows. This PR is the diagnostic prerequisite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address review: add throw-path marker, TODO tag, user-visibility note Round-1 review of #139 flagged three optional polishes worth applying before merge: - The `[snapshot] main-actor capture enter` marker fires on entry but the `captureFailed` throw path (when `host.window(for:)` returns nil) exits the MainActor block without a matching marker. Added a do/catch around the block with a `[snapshot] main-actor capture threw: ...` marker. This way the CI dump shows we exited the block one way or another, even on error paths. - Added a `TODO(#135-4B)` tag on the instrumentation's doc comment so the cleanup obligation is grep-visible when Phase 4B lands. - Expanded the doc comment with two clarifications the reviewer verified: (1) markers are not user-visible because `DaemonClient.registerStderrLogForwarder` forwards MCP `LogMessageNotification` payloads, not raw daemon stderr; (2) macOS snapshot handler only — iOS returns early above. No behavior change. `swift test --filter hotReloadStructural` passes in 8.5s; manually verified the new throw-path marker does not fire on the happy path (only on the failure path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Remove 300ms pre-capture Task.sleep (Phase 4B root-cause fix for #135) Phase 4A (same PR, prior commit) added markers that caught the post-reload wedge on the first CI run after merge. Run 72439165932's server stderr dump showed: [snapshot] enter [snapshot] pre session-check [snapshot] post session-check ← last marker before silence The next marker `[snapshot] post 300ms sleep` never fires. The ONLY code between those two markers is: guard isMacOSSession else { ... } // no await try await Task.sleep(for: .milliseconds(300)) // ← wedges here The hang is `Task.sleep`, which ultimately depends on Swift's cooperative scheduler to resume. On a hot-reload test that polls snapshots rapidly without swiftc breaks (the literal-only path in particular — dozens of rapid snapshot calls per `awaitSnapshotChange` loop), the pool accumulates enough load that the timer's continuation stays unscheduled. Test `hotReloadLiteralOnly` hit this at around the 8th snapshot cycle on CI run 72439165932. This is exactly the failure mode issue #135 described — and the reason PR #136 built a pthread-based timeout for the test harness. Same pattern, different place. The sleep itself is unnecessary: `Snapshot.capture` calls `NSView.cacheDisplay(in:to:)`, which forces layout synchronously when the view is dirty. The MCP client→server round trip already gives the UI plenty of time to settle between view mount and capture. Verified: all 7 MacOSMCPTests pass without the sleep — including `hotReloadLiteralOnly` and `hotReloadStructural`, both of which assert image bytes actually change after the reload. The sleep was added in the initial commit (820b0cd) with the generic comment "Wait briefly for SwiftUI to finish layout". No evidence it was ever load-bearing; appears to have been defensive paranoia that turned load-bearing only as an unintended starvation trigger. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

obj-p and others added 3 commits April 21, 2026 21:33

Fix swift-format lint: split = log onto its own line

6cdfe2e

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

obj-p force-pushed the fix/mcp-integration-test-flake branch from de1255a to 23adda3 Compare April 22, 2026 01:36

obj-p merged commit f0fbad2 into main Apr 22, 2026
4 checks passed

obj-p deleted the fix/mcp-integration-test-flake branch April 22, 2026 03:37

obj-p mentioned this pull request Apr 22, 2026

Phase 4 (#135): instrumentation + remove starving 300ms Task.sleep #139

Merged

4 tasks

obj-p mentioned this pull request Apr 22, 2026

Eliminate iOS simulator test contention: wait for boot + distinct device per test #141

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add watchdog diagnostic for MCP integration test hangs#134

Add watchdog diagnostic for MCP integration test hangs#134
obj-p merged 3 commits intomainfrom
fix/mcp-integration-test-flake

obj-p commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

obj-p commented Apr 21, 2026

Summary

Evidence

What this changes

Watchdog in `MCPTestServer`

CI post-failure dump

AGENTS.md

Deliberately out of scope

Test plan

Blocks

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant