Kernel: mod3_speak skips afplay when dashboard subscriber exists (Wave 4.3b)#46
Merged
chazmaniandinkle merged 4 commits intocogos-dev:mainfrom Apr 24, 2026
Merged
Conversation
added 4 commits
April 23, 2026 12:10
…ave 2)
Kernel-side HTTP forwarder that makes the kernel the identity authority
for mod3 channel participants while keeping mod3 the communication-state
owner (voice, queue, device). Implements the decision from ADR-082's
layer-separation rule: "CogOS owns identity; Mod3 owns communication."
Routes (namespaced under /v1/channel-sessions/):
POST /v1/channel-sessions/register mint+forward
POST /v1/channel-sessions/{id}/deregister forward
GET /v1/channel-sessions list (kernel+mod3)
GET /v1/channel-sessions/{id} detail
Namespace choice: the existing /v1/sessions/* family serves agent-session
state with 3-component hyphen-validated IDs tied to the handoff protocol.
Channel-participant registration has an incompatible shape (short UUID
IDs, participant_id/participant_type/voice/device fields). Rather than
weaken ValidateSessionID (which would cascade into handoff semantics),
the new concern takes its own namespace. The channel-provider RFC's
guidance to unify on cogos_session_register with a participant_type
discriminator remains the target at the MCP tool layer (Wave 3); at the
HTTP layer the two surfaces coexist cleanly.
Behavior:
- Kernel mints a session_id (short UUID) if the caller omits one.
- Request is forwarded to Config.Mod3URL (default http://localhost:7860)
with a 5s timeout.
- Response merges kernel identity record + mod3 channel state.
- Mod3 unreachable -> HTTP 502 with clear error body.
- Mod3 error responses are preserved and surfaced.
Config: new Config.Mod3URL field, overridable via MOD3_URL env var.
Tests: serve_sessions_channel_test.go covers ID minting, field passthrough,
response merging, mod3-down 502, and all four sibling endpoints via
httptest.Server fakes.
Wave 3 of the mod3-kernel integration (ADR-082 + channel-provider RFC).
The kernel becomes the MCP front door for mod3 voice tools via an HTTP
proxy — supersedes the installed binary's OpenClaw gateway, which read
mod3's metric headers but silently discarded the audio/wav payload.
Tools registered (mod3_* namespace on the /mcp endpoint):
* mod3_speak synthesize text + play audio locally
* mod3_stop cancel current/queued speech (optional job_id)
* mod3_voices list available voices
* mod3_status probe mod3 /health
* mod3_register_session proxy to POST /v1/sessions/register
* mod3_deregister_session proxy to POST /v1/sessions/{id}/deregister
* mod3_list_sessions proxy to GET /v1/sessions
All tools accept an optional session_id and thread it through to mod3 —
in the request body for synthesize, as a query parameter for stop/voices,
in the URL for register/deregister. Absent session_id → the proxy omits
the field and mod3 routes to its default session.
Playback strategy: Option (A), server-side. Synthesis response bodies
are written to a tempfile, then played by afplay (macOS) / aplay (Linux)
via a fire-and-forget exec. Callers opt in to blocking via blocking=true,
or can skip playback entirely with skip_playback=true (returns the WAV
bytes base64-encoded, forward-compatible with Option B session-routed
playback once the Wave 4 dashboard WebSocket lands). The player command
is injectable (modalityProxy.player) so tests never spawn real audio.
Metrics: mod3's X-Mod3-* response headers are parsed into a metrics
map on the tool result (job_id, duration_sec, rtf, sample_rate, etc.)
with numeric headers coerced to int64/float64 where applicable.
Errors: mod3-unreachable returns IsError=true with "mod3 unreachable: …"
text so the ledger's tool.result event records the failure (same shape
as the serve_sessions_channel.go pattern). Non-2xx responses from mod3
preserve the body text in the error result — callers see mod3's own
422 / 5xx explanation intact.
Fixes: the drop-audio-bytes bug observed in the installed Apr-19 binary
(mcp__cogos__mod3_speak completes in ~1s, returns metrics, but plays
nothing). With this proxy the kernel actually hears what mod3 makes.
Timeout: 30s on the HTTP client (vs 8s on the channel-session forwarder);
accounts for cold-start model loading and multi-sentence synthesis.
Files:
internal/engine/mcp_modality_proxy.go (new, 551 lines)
internal/engine/mcp_modality_proxy_test.go (new, 686 lines)
internal/engine/mcp_server.go (modified, +11)
Tests: 20 new unit tests — synthesis success/error/session-threading,
stop/voices/status/sessions forwarding, metric extraction, server-side
playback via a stub shell-script player (proves the bytes reach the
player, guarding against the drop-audio regression), non-blocking spawn.
All pass. Full ./... suite green.
Out of scope (deferred to later waves):
* Wave 4: dashboard participant UI + session-routed playback
* session-start hook auto-registration
* consolidation with the existing OpenClaw-gateway mod3_speak
(coexist until deprecated)
…ve 3.5) Eliminates the Wave 2 / Wave 3 divergence where the mod3_register_session, mod3_deregister_session, and mod3_list_sessions MCP tools called mod3's /v1/sessions/* surface directly, bypassing Wave 2's kernel-owned session_id minting at /v1/channel-sessions/register. Session-ID authority is now in one place (ADR-082): the kernel's shared RegisterChannelSession / DeregisterChannelSession / ListChannelSessions methods on *Server. Both the HTTP handlers and the MCP tool handlers call through these methods. No self-localhost loop. Approach 2 (refactor shared logic) over Approach 1 (self-HTTP loop): the Wave 2 handler bodies factored cleanly into *Server methods returning typed errors, and wiring an MCPServer field via SetChannelSessionBackend mirrors the existing SetSessionsBackend pattern, so the public surface didn't have to change. Schema alignment with the channel-provider RFC's cogos_session_register primitive: added optional `kinds` (array) and `metadata` (map) fields to both the Wave 2 register endpoint and the MCP tool input. Both flow through to mod3 unchanged (mod3 ignores unknown fields today) and are preserved on the kernel identity record so downstream consumers can filter by capability. - internal/engine/serve_sessions_channel.go: factored handleChannelSession* bodies into RegisterChannelSession / DeregisterChannelSession / ListChannelSessions methods returning a typed channelSessionForwardError; added Kinds / Metadata fields to ChannelSessionRecord and the request wire type; handlers now thin-wrap the shared methods. - internal/engine/mcp_modality_proxy.go: the three session-family MCP tool handlers now call the channelSessionBackend interface on MCPServer; all direct HTTP calls to mod3's /v1/sessions/* removed from this file. - internal/engine/mcp_server.go: added channelSessionBackend field + interface + SetChannelSessionBackend setter. - internal/engine/serve_mcp.go: wires the live Server as the backend. - tests: updated newProxyMCP to wire a live Server so MCP session-family tests exercise the shared code path; added coverage for minting via the MCP tool, kinds/metadata pass-through (both HTTP and MCP), and the no-backend error path.
Wave 4.3 kernel side — close the double-play window between server-side
afplay and the dashboard's /ws/audio/{session_id} WebSocket (mod3 side,
committed separately). Before spawning the platform player on a
session-tagged speak, the kernel asks mod3 whether that session has a
live subscriber. If yes, skip afplay entirely — mod3 is already pushing
the WAV to the dashboard — and report playback_status=routed_ws.
modalityProxy grows an injectable subscriberCheck function field. The
default implementation is an HTTP GET against
{Mod3URL}/v1/sessions/{id}/subscribers with a 1.5s timeout. Transport
errors surface as subscriber_check_error in the tool result AND fall
through to the normal afplay path so a flaky mod3 never orphans audio —
a key safety property carried over from the Wave 3 fire-and-forget
playback design.
session_id="" always bypasses the check. CLI invocations of mod3_speak
(no session_id) keep the exact afplay behavior as before, so this is a
purely additive change scoped to kernel-minted sessions.
Tests (mcp_modality_proxy_test.go, +237 lines):
- TestMod3Speak_NoSessionAlwaysSpawnsPlayer — session_id="" → stub
player invoked once, playback_status=played
- TestMod3Speak_SessionWithSubscriberSkipsPlayer — subscriberCheck
returns true → stub player NOT invoked, playback_status=routed_ws
- TestMod3Speak_SessionWithoutSubscriberSpawnsPlayer — subscriberCheck
returns false → stub player invoked once, playback_status=played
- TestMod3Speak_SubscriberCheckErrorFallsBackToPlayer — transient
probe error → stub player invoked once, subscriber_check_error
surfaced in result, playback_status=played
- TestCheckSessionSubscriber_DefaultImplementationHitsMod3 — default
HTTP path parses cs-yes/cs-no responses correctly
- TestCheckSessionSubscriber_Mod3UnreachableReturnsError — ECONNREFUSED
surfaces as a non-nil error (so toolMod3Speak falls back)
Replaced writeStubPlayer's mutex+int32 pair with a closure-over-getter
pattern. Cleaner signature, no polling goroutine, fewer moving parts.
Full engine test suite passes; golangci-lint clean.
Branch: feat/kernel-wave4, stacked on feat/kernel-session-forwarder
(the branch that currently carries Wave 3.5 session-id minting and
the mcp_modality_proxy.go baseline this commit extends).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
One commit. When
mod3_speakis invoked with asession_idthat has a WebSocket subscriber listening on mod3 (dashboard open), the kernel skips the local afplay and lets mod3 deliver audio via WS. Without a session or subscriber, afplay stays as the fallback — Wave 3 behavior preserved.Depends on:
GET /v1/sessions/{id}/subscribersendpoint)Review
a31396eas the Wave 4 diff; the rest is the stacked dependency.Change detail
modalityProxygrows asubscriberCheck func(ctx, session_id) (subscribed bool, err error)field. Default implementation does a 1.5s-timeoutGET /v1/sessions/{id}/subscribersagainstMod3URL. Injectable for tests.toolMod3Speakpre-gates: when session_id is present and subscriberCheck returnssubscribed=true, response isplayback_status: "routed_ws"and afplay is skipped. On check error, response includessubscriber_check_error: "..."and falls through to afplay (fail-safe — a flaky mod3 never silently drops audio).subscribed=falsepreserves Wave 3 afplay path exactly.Test plan
TestMod3Speak*/TestPlayAudio*tests pass unchangedgolangci-lint run ./internal/engine/cleanArchitectural effect
mod3_speaktool now honors the "dashboard = primary audio interface" target from the final-shape designmod3_speakwithout explicit session_id will discover the caller's session from context