Add daemon-global 2s heartbeat notification (Phase 1 of issue #135)#137
Merged
Add daemon-global 2s heartbeat notification (Phase 1 of issue #135)#137
Conversation
Phase 1 of the implementation plan for issue #135. The MCP daemon now emits an unconditional `LogMessageNotification` with `logger: "heartbeat"` every 2 seconds for the lifetime of each connected transport. This is the liveness signal that Phase 2's client-side stall detector will key off of to distinguish "daemon is busy" from "daemon is wedged" — and crucially, it fires whether or not any tool call is in flight, which is necessary because the FileWatcher hot-reload path is entirely outside any request scope (exactly where issue #135's wedge lives). Key design decisions: - **LogMessageNotification, not ProgressNotification.** Per MCP spec, `ProgressNotification` requires a `progressToken` from an in-flight request's `_meta`. An unsolicited heartbeat has no such token; emitting one is out-of-spec and will cause strict agents to log warnings or render spurious progress UI. `LogMessageNotification.logger` is a free-form discriminator; clients filter on `logger == "heartbeat"`. - **Detached Task, not inherited isolation.** `runStdio` and the daemon connection handler run on specific actors (@mainactor in one case). Inheriting isolation into the heartbeat Task would serialize it with all other actor work. Detaching frees it to interleave with message handling. - **Wait on `server.waitUntilCompleted()`.** `server.start(transport:)` returns immediately after spawning its internal receive Task; it does NOT block until disconnect. Without the explicit wait, the defer that cancels the heartbeat fires right after the first tick — that's why the first-attempt test saw exactly one heartbeat emitted before the task was cancelled. - **Filter on the client.** `DaemonClient.registerStderrLogForwarder` now drops `logger == "heartbeat"` messages so they don't spam the CLI's stderr. The notifications still reach the client's message handler — Phase 2 will attach a stall-timer bump there. Test coverage: new `daemonEmitsHeartbeat` integration test idles for 5s without issuing any tool calls and asserts ≥2 heartbeats arrive. Passes in 5.166s locally. Full macOS MCP suite (7 tests) passes in 72s. Out of scope: client-side stall detection (Phase 2), daemon-side root cause fix for the post-reload wedge (Phase 4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-1 review of #137 flagged three nits and one FYI: - Test window 5s was tight against the T+2s first-ping timing. Bumped to 6s so even with a ~500ms subprocess-boot delay the 6s idle still guarantees two pings. - Added a `Timing contract` paragraph in runMCPServer's doc comment explicitly noting the first ping fires at T+2s, not T+0, so Phase 2's stall detector knows to grant grace on connect. - Renamed `MCPTestServer.heartbeatCount` property to `observedHeartbeatCount()` method — the value is inherently racy, a method signature signals "this is a snapshot; next call may differ" better than a property accessor. FYI on `.debug` level filtering by MCP clients (Phase 2 needs to call `setLoggingLevel(.debug)` during handshake or it'll see zero heartbeats) tracked in two places: the implementation plan under "Gotchas discovered during Phase 1" and as a comment on issue #135 so it's visible from the PR review chain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 1 of the issue #135 implementation plan. Adds a daemon-global
LogMessageNotificationheartbeat that Phase 2's client-side stall detector will key off of. Unlocks the Phase 2 PR (client stall detection); deliberately does not change any client-side behavior yet.Problem
Per issue #135, the MCP daemon can become non-responsive after hot-reload and there's no way for a client to tell "daemon is busy with a long operation" from "daemon is wedged." The MCP swift-sdk's
Client.callToolhas no built-in timeout — a call against a wedged daemon hangs forever.Any heartbeat-based liveness signal has to fire whether or not a tool is in flight, because the failing path in #135 is
HostApp.swift's FileWatcher reload callback — it runs entirely outside any request scope. Scoping heartbeats to individual tool handlers would miss exactly the silence we need to detect.Fix
`runMCPServer` (new helper in `MCPServer.swift`) spawns a detached Task that calls `server.log(level: .debug, logger: "heartbeat", data: .string("alive"))` every 2 seconds for the lifetime of a connected transport. Both stdio (`ServeCommand.runStdio`) and Unix-socket (`DaemonListener.handleConnection`) entry points route through this helper.
`DaemonClient.registerStderrLogForwarder` filters `logger == "heartbeat"` so the CLI's stderr stays clean. Notifications still reach the client's handler chain — Phase 2 attaches a stall-timer bump there.
Design choices explained
LogMessageNotification, not ProgressNotification. Per MCP spec 2025-11-25, `ProgressNotification` requires a `progressToken` from an in-flight request's `_meta`. Unsolicited progress pings are out-of-spec; strict clients log warnings or render spurious progress UI. `LogMessageNotification.logger` is a free-form discriminator, perfect for a heartbeat channel.
`Task.detached`, not inherited isolation. `runStdio` runs on `@MainActor`; the daemon handler runs on its own Task. Inheriting either would serialize the heartbeat with all other actor work. Detaching frees it to interleave with message handling.
`await server.waitUntilCompleted()` after `server.start`. The SDK's `server.start(transport:)` returns immediately after spawning its internal receive Task; it does NOT block until disconnect. Without the explicit wait, the `defer { heartbeat.cancel() }` fires right after the first tick — this bit me during development, the first-attempt test saw exactly one heartbeat emitted before the Task was cancelled.
What this PR doesn't do
Test plan
Related
🤖 Generated with Claude Code