Skip to content

Kernel: mod3_speak skips afplay when dashboard subscriber exists (Wave 4.3b)#46

Merged
chazmaniandinkle merged 4 commits intocogos-dev:mainfrom
chazmaniandinkle:feat/kernel-wave4
Apr 24, 2026
Merged

Kernel: mod3_speak skips afplay when dashboard subscriber exists (Wave 4.3b)#46
chazmaniandinkle merged 4 commits intocogos-dev:mainfrom
chazmaniandinkle:feat/kernel-wave4

Conversation

@chazmaniandinkle
Copy link
Copy Markdown
Contributor

Summary

One commit. When mod3_speak is invoked with a session_id that has a WebSocket subscriber listening on mod3 (dashboard open), the kernel skips the local afplay and lets mod3 deliver audio via WS. Without a session or subscriber, afplay stays as the fallback — Wave 3 behavior preserved.

Depends on:

Review a31396e as the Wave 4 diff; the rest is the stacked dependency.

Change detail

  • modalityProxy grows a subscriberCheck func(ctx, session_id) (subscribed bool, err error) field. Default implementation does a 1.5s-timeout GET /v1/sessions/{id}/subscribers against Mod3URL. Injectable for tests.
  • toolMod3Speak pre-gates: when session_id is present and subscriberCheck returns subscribed=true, response is playback_status: "routed_ws" and afplay is skipped. On check error, response includes subscriber_check_error: "..." and falls through to afplay (fail-safe — a flaky mod3 never silently drops audio).
  • No session_id or subscribed=false preserves Wave 3 afplay path exactly.

Test plan

  • All existing TestMod3Speak* / TestPlayAudio* tests pass unchanged
  • 6 new tests cover: no-session-always-spawns, subscriber-skips-player, no-subscriber-spawns, check-error-falls-back, default-HTTP-implementation-hits-mod3, mod3-unreachable-returns-error
  • golangci-lint run ./internal/engine/ clean
  • Manual smoke test with mod3 Wave 4 branch (see mod3#47 for procedure)

Architectural effect

  • Kernel's mod3_speak tool now honors the "dashboard = primary audio interface" target from the final-shape design
  • Fallback to afplay preserved for headless / no-subscriber sessions (dev scenarios)
  • Next dependent work: Wave 5 SessionStart hook → Claude Code sessions auto-register → mod3_speak without explicit session_id will discover the caller's session from context

Test User added 4 commits April 23, 2026 12:10
…ave 2)

Kernel-side HTTP forwarder that makes the kernel the identity authority
for mod3 channel participants while keeping mod3 the communication-state
owner (voice, queue, device). Implements the decision from ADR-082's
layer-separation rule: "CogOS owns identity; Mod3 owns communication."

Routes (namespaced under /v1/channel-sessions/):

  POST /v1/channel-sessions/register             mint+forward
  POST /v1/channel-sessions/{id}/deregister      forward
  GET  /v1/channel-sessions                      list (kernel+mod3)
  GET  /v1/channel-sessions/{id}                 detail

Namespace choice: the existing /v1/sessions/* family serves agent-session
state with 3-component hyphen-validated IDs tied to the handoff protocol.
Channel-participant registration has an incompatible shape (short UUID
IDs, participant_id/participant_type/voice/device fields). Rather than
weaken ValidateSessionID (which would cascade into handoff semantics),
the new concern takes its own namespace. The channel-provider RFC's
guidance to unify on cogos_session_register with a participant_type
discriminator remains the target at the MCP tool layer (Wave 3); at the
HTTP layer the two surfaces coexist cleanly.

Behavior:
- Kernel mints a session_id (short UUID) if the caller omits one.
- Request is forwarded to Config.Mod3URL (default http://localhost:7860)
  with a 5s timeout.
- Response merges kernel identity record + mod3 channel state.
- Mod3 unreachable -> HTTP 502 with clear error body.
- Mod3 error responses are preserved and surfaced.

Config: new Config.Mod3URL field, overridable via MOD3_URL env var.

Tests: serve_sessions_channel_test.go covers ID minting, field passthrough,
response merging, mod3-down 502, and all four sibling endpoints via
httptest.Server fakes.
Wave 3 of the mod3-kernel integration (ADR-082 + channel-provider RFC).
The kernel becomes the MCP front door for mod3 voice tools via an HTTP
proxy — supersedes the installed binary's OpenClaw gateway, which read
mod3's metric headers but silently discarded the audio/wav payload.

Tools registered (mod3_* namespace on the /mcp endpoint):
  * mod3_speak              synthesize text + play audio locally
  * mod3_stop               cancel current/queued speech (optional job_id)
  * mod3_voices             list available voices
  * mod3_status             probe mod3 /health
  * mod3_register_session   proxy to POST /v1/sessions/register
  * mod3_deregister_session proxy to POST /v1/sessions/{id}/deregister
  * mod3_list_sessions      proxy to GET /v1/sessions

All tools accept an optional session_id and thread it through to mod3 —
in the request body for synthesize, as a query parameter for stop/voices,
in the URL for register/deregister. Absent session_id → the proxy omits
the field and mod3 routes to its default session.

Playback strategy: Option (A), server-side. Synthesis response bodies
are written to a tempfile, then played by afplay (macOS) / aplay (Linux)
via a fire-and-forget exec. Callers opt in to blocking via blocking=true,
or can skip playback entirely with skip_playback=true (returns the WAV
bytes base64-encoded, forward-compatible with Option B session-routed
playback once the Wave 4 dashboard WebSocket lands). The player command
is injectable (modalityProxy.player) so tests never spawn real audio.

Metrics: mod3's X-Mod3-* response headers are parsed into a metrics
map on the tool result (job_id, duration_sec, rtf, sample_rate, etc.)
with numeric headers coerced to int64/float64 where applicable.

Errors: mod3-unreachable returns IsError=true with "mod3 unreachable: …"
text so the ledger's tool.result event records the failure (same shape
as the serve_sessions_channel.go pattern). Non-2xx responses from mod3
preserve the body text in the error result — callers see mod3's own
422 / 5xx explanation intact.

Fixes: the drop-audio-bytes bug observed in the installed Apr-19 binary
(mcp__cogos__mod3_speak completes in ~1s, returns metrics, but plays
nothing). With this proxy the kernel actually hears what mod3 makes.

Timeout: 30s on the HTTP client (vs 8s on the channel-session forwarder);
accounts for cold-start model loading and multi-sentence synthesis.

Files:
  internal/engine/mcp_modality_proxy.go       (new, 551 lines)
  internal/engine/mcp_modality_proxy_test.go  (new, 686 lines)
  internal/engine/mcp_server.go               (modified, +11)

Tests: 20 new unit tests — synthesis success/error/session-threading,
stop/voices/status/sessions forwarding, metric extraction, server-side
playback via a stub shell-script player (proves the bytes reach the
player, guarding against the drop-audio regression), non-blocking spawn.
All pass. Full ./... suite green.

Out of scope (deferred to later waves):
  * Wave 4: dashboard participant UI + session-routed playback
  * session-start hook auto-registration
  * consolidation with the existing OpenClaw-gateway mod3_speak
    (coexist until deprecated)
…ve 3.5)

Eliminates the Wave 2 / Wave 3 divergence where the mod3_register_session,
mod3_deregister_session, and mod3_list_sessions MCP tools called mod3's
/v1/sessions/* surface directly, bypassing Wave 2's kernel-owned session_id
minting at /v1/channel-sessions/register. Session-ID authority is now in
one place (ADR-082): the kernel's shared RegisterChannelSession /
DeregisterChannelSession / ListChannelSessions methods on *Server. Both the
HTTP handlers and the MCP tool handlers call through these methods. No
self-localhost loop.

Approach 2 (refactor shared logic) over Approach 1 (self-HTTP loop): the
Wave 2 handler bodies factored cleanly into *Server methods returning
typed errors, and wiring an MCPServer field via SetChannelSessionBackend
mirrors the existing SetSessionsBackend pattern, so the public surface
didn't have to change.

Schema alignment with the channel-provider RFC's cogos_session_register
primitive: added optional `kinds` (array) and `metadata` (map) fields to
both the Wave 2 register endpoint and the MCP tool input. Both flow
through to mod3 unchanged (mod3 ignores unknown fields today) and are
preserved on the kernel identity record so downstream consumers can
filter by capability.

- internal/engine/serve_sessions_channel.go: factored handleChannelSession*
  bodies into RegisterChannelSession / DeregisterChannelSession /
  ListChannelSessions methods returning a typed channelSessionForwardError;
  added Kinds / Metadata fields to ChannelSessionRecord and the request
  wire type; handlers now thin-wrap the shared methods.
- internal/engine/mcp_modality_proxy.go: the three session-family MCP tool
  handlers now call the channelSessionBackend interface on MCPServer; all
  direct HTTP calls to mod3's /v1/sessions/* removed from this file.
- internal/engine/mcp_server.go: added channelSessionBackend field + interface
  + SetChannelSessionBackend setter.
- internal/engine/serve_mcp.go: wires the live Server as the backend.
- tests: updated newProxyMCP to wire a live Server so MCP session-family
  tests exercise the shared code path; added coverage for minting via the
  MCP tool, kinds/metadata pass-through (both HTTP and MCP), and the
  no-backend error path.
Wave 4.3 kernel side — close the double-play window between server-side
afplay and the dashboard's /ws/audio/{session_id} WebSocket (mod3 side,
committed separately). Before spawning the platform player on a
session-tagged speak, the kernel asks mod3 whether that session has a
live subscriber. If yes, skip afplay entirely — mod3 is already pushing
the WAV to the dashboard — and report playback_status=routed_ws.

modalityProxy grows an injectable subscriberCheck function field. The
default implementation is an HTTP GET against
{Mod3URL}/v1/sessions/{id}/subscribers with a 1.5s timeout. Transport
errors surface as subscriber_check_error in the tool result AND fall
through to the normal afplay path so a flaky mod3 never orphans audio —
a key safety property carried over from the Wave 3 fire-and-forget
playback design.

session_id="" always bypasses the check. CLI invocations of mod3_speak
(no session_id) keep the exact afplay behavior as before, so this is a
purely additive change scoped to kernel-minted sessions.

Tests (mcp_modality_proxy_test.go, +237 lines):
  - TestMod3Speak_NoSessionAlwaysSpawnsPlayer — session_id="" → stub
    player invoked once, playback_status=played
  - TestMod3Speak_SessionWithSubscriberSkipsPlayer — subscriberCheck
    returns true → stub player NOT invoked, playback_status=routed_ws
  - TestMod3Speak_SessionWithoutSubscriberSpawnsPlayer — subscriberCheck
    returns false → stub player invoked once, playback_status=played
  - TestMod3Speak_SubscriberCheckErrorFallsBackToPlayer — transient
    probe error → stub player invoked once, subscriber_check_error
    surfaced in result, playback_status=played
  - TestCheckSessionSubscriber_DefaultImplementationHitsMod3 — default
    HTTP path parses cs-yes/cs-no responses correctly
  - TestCheckSessionSubscriber_Mod3UnreachableReturnsError — ECONNREFUSED
    surfaces as a non-nil error (so toolMod3Speak falls back)

Replaced writeStubPlayer's mutex+int32 pair with a closure-over-getter
pattern. Cleaner signature, no polling goroutine, fewer moving parts.

Full engine test suite passes; golangci-lint clean.

Branch: feat/kernel-wave4, stacked on feat/kernel-session-forwarder
(the branch that currently carries Wave 3.5 session-id minting and
the mcp_modality_proxy.go baseline this commit extends).
@chazmaniandinkle chazmaniandinkle merged commit a31396e into cogos-dev:main Apr 24, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant