Skip to content

Add daemon-global 2s heartbeat notification (Phase 1 of issue #135)#137

Merged
obj-p merged 2 commits intomainfrom
feat/daemon-heartbeat
Apr 22, 2026
Merged

Add daemon-global 2s heartbeat notification (Phase 1 of issue #135)#137
obj-p merged 2 commits intomainfrom
feat/daemon-heartbeat

Conversation

@obj-p
Copy link
Copy Markdown
Owner

@obj-p obj-p commented Apr 21, 2026

Phase 1 of the issue #135 implementation plan. Adds a daemon-global LogMessageNotification heartbeat that Phase 2's client-side stall detector will key off of. Unlocks the Phase 2 PR (client stall detection); deliberately does not change any client-side behavior yet.

Problem

Per issue #135, the MCP daemon can become non-responsive after hot-reload and there's no way for a client to tell "daemon is busy with a long operation" from "daemon is wedged." The MCP swift-sdk's Client.callTool has no built-in timeout — a call against a wedged daemon hangs forever.

Any heartbeat-based liveness signal has to fire whether or not a tool is in flight, because the failing path in #135 is HostApp.swift's FileWatcher reload callback — it runs entirely outside any request scope. Scoping heartbeats to individual tool handlers would miss exactly the silence we need to detect.

Fix

`runMCPServer` (new helper in `MCPServer.swift`) spawns a detached Task that calls `server.log(level: .debug, logger: "heartbeat", data: .string("alive"))` every 2 seconds for the lifetime of a connected transport. Both stdio (`ServeCommand.runStdio`) and Unix-socket (`DaemonListener.handleConnection`) entry points route through this helper.

`DaemonClient.registerStderrLogForwarder` filters `logger == "heartbeat"` so the CLI's stderr stays clean. Notifications still reach the client's handler chain — Phase 2 attaches a stall-timer bump there.

Design choices explained

  • LogMessageNotification, not ProgressNotification. Per MCP spec 2025-11-25, `ProgressNotification` requires a `progressToken` from an in-flight request's `_meta`. Unsolicited progress pings are out-of-spec; strict clients log warnings or render spurious progress UI. `LogMessageNotification.logger` is a free-form discriminator, perfect for a heartbeat channel.

  • `Task.detached`, not inherited isolation. `runStdio` runs on `@MainActor`; the daemon handler runs on its own Task. Inheriting either would serialize the heartbeat with all other actor work. Detaching frees it to interleave with message handling.

  • `await server.waitUntilCompleted()` after `server.start`. The SDK's `server.start(transport:)` returns immediately after spawning its internal receive Task; it does NOT block until disconnect. Without the explicit wait, the `defer { heartbeat.cancel() }` fires right after the first tick — this bit me during development, the first-attempt test saw exactly one heartbeat emitted before the Task was cancelled.

What this PR doesn't do

Test plan

  • `swift build` — clean
  • `swift-format lint --strict --recursive Sources/ Tests/` — clean
  • `swift test --filter MacOSMCPTests` — 7/7 pass in 72.3s, including the new `daemonEmitsHeartbeat` at 5.17s
  • New test: spins up an MCPTestServer, idles 5s without any tool calls, asserts ≥2 heartbeats arrived on the client's handler. Covers the exact silence scenario this change addresses.
  • CI confirms the full suite still passes and no regressions in daemon-mode / stdio-mode behavior.

Related

🤖 Generated with Claude Code

Phase 1 of the implementation plan for issue #135. The MCP daemon now
emits an unconditional `LogMessageNotification` with `logger: "heartbeat"`
every 2 seconds for the lifetime of each connected transport. This is
the liveness signal that Phase 2's client-side stall detector will key
off of to distinguish "daemon is busy" from "daemon is wedged" — and
crucially, it fires whether or not any tool call is in flight, which is
necessary because the FileWatcher hot-reload path is entirely outside
any request scope (exactly where issue #135's wedge lives).

Key design decisions:

- **LogMessageNotification, not ProgressNotification.** Per MCP spec,
  `ProgressNotification` requires a `progressToken` from an in-flight
  request's `_meta`. An unsolicited heartbeat has no such token; emitting
  one is out-of-spec and will cause strict agents to log warnings or
  render spurious progress UI. `LogMessageNotification.logger` is a
  free-form discriminator; clients filter on `logger == "heartbeat"`.

- **Detached Task, not inherited isolation.** `runStdio` and the daemon
  connection handler run on specific actors (@mainactor in one case).
  Inheriting isolation into the heartbeat Task would serialize it with
  all other actor work. Detaching frees it to interleave with message
  handling.

- **Wait on `server.waitUntilCompleted()`.** `server.start(transport:)`
  returns immediately after spawning its internal receive Task; it does
  NOT block until disconnect. Without the explicit wait, the defer that
  cancels the heartbeat fires right after the first tick — that's why
  the first-attempt test saw exactly one heartbeat emitted before the
  task was cancelled.

- **Filter on the client.** `DaemonClient.registerStderrLogForwarder`
  now drops `logger == "heartbeat"` messages so they don't spam the
  CLI's stderr. The notifications still reach the client's message
  handler — Phase 2 will attach a stall-timer bump there.

Test coverage: new `daemonEmitsHeartbeat` integration test idles for 5s
without issuing any tool calls and asserts ≥2 heartbeats arrive. Passes
in 5.166s locally. Full macOS MCP suite (7 tests) passes in 72s.

Out of scope: client-side stall detection (Phase 2), daemon-side root
cause fix for the post-reload wedge (Phase 4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-1 review of #137 flagged three nits and one FYI:

- Test window 5s was tight against the T+2s first-ping timing. Bumped
  to 6s so even with a ~500ms subprocess-boot delay the 6s idle still
  guarantees two pings.
- Added a `Timing contract` paragraph in runMCPServer's doc comment
  explicitly noting the first ping fires at T+2s, not T+0, so Phase 2's
  stall detector knows to grant grace on connect.
- Renamed `MCPTestServer.heartbeatCount` property to
  `observedHeartbeatCount()` method — the value is inherently racy, a
  method signature signals "this is a snapshot; next call may differ"
  better than a property accessor.

FYI on `.debug` level filtering by MCP clients (Phase 2 needs to call
`setLoggingLevel(.debug)` during handshake or it'll see zero heartbeats)
tracked in two places: the implementation plan under "Gotchas discovered
during Phase 1" and as a comment on issue #135 so it's visible from the
PR review chain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@obj-p obj-p merged commit 2cc4ace into main Apr 22, 2026
4 checks passed
@obj-p obj-p deleted the feat/daemon-heartbeat branch April 22, 2026 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant