Skip to content

kernel-mod3 integration: session forwarder + MCP proxy + tool routing (Waves 2-3.5)#45

Merged
chazmaniandinkle merged 3 commits intocogos-dev:mainfrom
chazmaniandinkle:feat/kernel-session-forwarder
Apr 24, 2026
Merged

kernel-mod3 integration: session forwarder + MCP proxy + tool routing (Waves 2-3.5)#45
chazmaniandinkle merged 3 commits intocogos-dev:mainfrom
chazmaniandinkle:feat/kernel-session-forwarder

Conversation

@chazmaniandinkle
Copy link
Copy Markdown
Contributor

Summary

Three stacked commits that establish the kernel as the identity authority for mod3 channel sessions and expose mod3's capabilities through the kernel's MCP server with working audio playback. Part of the ADR-082 / channel-provider RFC rollout.

Depends on: cogos-dev/mod3#5 (mod3 Phase 1 session registry).

Commits

a2f7904 — feat(engine): forward channel-session registration to mod3 (Wave 2)

New HTTP surface on the kernel, namespaced under /v1/channel-sessions/ to avoid collision with the existing /v1/sessions/ family (agent-session state, 3-component hyphen-validated IDs tied to handoff):

  • POST /v1/channel-sessions/register — mints short UUID when caller omits, forwards to mod3
  • POST /v1/channel-sessions/{id}/deregister
  • GET /v1/channel-sessions
  • GET /v1/channel-sessions/{id}

Mod3 unreachable returns 502 with a clean error body. New Config.Mod3URL (default http://localhost:7860, MOD3_URL env override).

c8e2d8d — feat(engine): MCP proxy for mod3 tools with audio playback (Wave 3)

Seven new MCP tools on the kernel's server:

  • mod3_speak — synth + afplay locally (fire-and-forget; blocking=true or skip_playback=true flags)
  • mod3_stop, mod3_voices, mod3_status
  • mod3_register_session, mod3_deregister_session, mod3_list_sessions

Fixes the audio-bytes-dropped bug in the old OpenClaw-gateway mod3_speak: the kernel now writes WAV bytes to a tempfile and spawns afplay (macOS) or aplay/paplay (Linux). Player is an injectable field → tests use stub shell scripts. skip_playback=true returns base64 audio for Wave 4's dashboard WebSocket routing.

dbe69a8 — refactor(engine): route MCP session tools through kernel endpoint (Wave 3.5)

Eliminates a divergence where Wave 3's session-family tools called mod3 directly, bypassing Wave 2's kernel-owned session-ID minting. Wave 2's handler bodies refactored into *Server methods (RegisterChannelSession, DeregisterChannelSession, ListChannelSessions). Both HTTP handlers AND MCP tool handlers call through these — no self-localhost loop. RFC schema additions: optional kinds and metadata fields accepted at the register endpoint, flow through to mod3.

Architectural effect

  • Session authority unified: kernel is the single minter for channel sessions (ADR-082 "CogOS owns identity")
  • MCP surface unified: mod3's tools available through kernel's MCP — mcp__cogos__mod3_* works end-to-end
  • Audio plays: the old gateway mod3_speak read metrics and discarded audio/wav bytes; the Wave 3 path runs afplay server-side and returns playback_status: spawned to the caller
  • Forward-compatible: skip_playback=true + the session_id threading set up Wave 4 (dashboard participant panel with per-session WebSocket audio routing)

Test plan

  • All new unit tests green (20+ across the three files; fake mod3 via httptest.NewServer)
  • Full go test ./... green in 9.3s
  • gofmt -l clean on all changed files
  • Binary rebuilt and installed, kernel restarted clean
  • Manual: mcp__cogos__mod3_speak plays audio end-to-end via afplay (verified with user)
  • Downstream: merge alongside or after mod3#5 so the 3 session-registry MCP tools stop 502-ing

Test User added 3 commits April 23, 2026 12:10
…ave 2)

Kernel-side HTTP forwarder that makes the kernel the identity authority
for mod3 channel participants while keeping mod3 the communication-state
owner (voice, queue, device). Implements the decision from ADR-082's
layer-separation rule: "CogOS owns identity; Mod3 owns communication."

Routes (namespaced under /v1/channel-sessions/):

  POST /v1/channel-sessions/register             mint+forward
  POST /v1/channel-sessions/{id}/deregister      forward
  GET  /v1/channel-sessions                      list (kernel+mod3)
  GET  /v1/channel-sessions/{id}                 detail

Namespace choice: the existing /v1/sessions/* family serves agent-session
state with 3-component hyphen-validated IDs tied to the handoff protocol.
Channel-participant registration has an incompatible shape (short UUID
IDs, participant_id/participant_type/voice/device fields). Rather than
weaken ValidateSessionID (which would cascade into handoff semantics),
the new concern takes its own namespace. The channel-provider RFC's
guidance to unify on cogos_session_register with a participant_type
discriminator remains the target at the MCP tool layer (Wave 3); at the
HTTP layer the two surfaces coexist cleanly.

Behavior:
- Kernel mints a session_id (short UUID) if the caller omits one.
- Request is forwarded to Config.Mod3URL (default http://localhost:7860)
  with a 5s timeout.
- Response merges kernel identity record + mod3 channel state.
- Mod3 unreachable -> HTTP 502 with clear error body.
- Mod3 error responses are preserved and surfaced.

Config: new Config.Mod3URL field, overridable via MOD3_URL env var.

Tests: serve_sessions_channel_test.go covers ID minting, field passthrough,
response merging, mod3-down 502, and all four sibling endpoints via
httptest.Server fakes.
Wave 3 of the mod3-kernel integration (ADR-082 + channel-provider RFC).
The kernel becomes the MCP front door for mod3 voice tools via an HTTP
proxy — supersedes the installed binary's OpenClaw gateway, which read
mod3's metric headers but silently discarded the audio/wav payload.

Tools registered (mod3_* namespace on the /mcp endpoint):
  * mod3_speak              synthesize text + play audio locally
  * mod3_stop               cancel current/queued speech (optional job_id)
  * mod3_voices             list available voices
  * mod3_status             probe mod3 /health
  * mod3_register_session   proxy to POST /v1/sessions/register
  * mod3_deregister_session proxy to POST /v1/sessions/{id}/deregister
  * mod3_list_sessions      proxy to GET /v1/sessions

All tools accept an optional session_id and thread it through to mod3 —
in the request body for synthesize, as a query parameter for stop/voices,
in the URL for register/deregister. Absent session_id → the proxy omits
the field and mod3 routes to its default session.

Playback strategy: Option (A), server-side. Synthesis response bodies
are written to a tempfile, then played by afplay (macOS) / aplay (Linux)
via a fire-and-forget exec. Callers opt in to blocking via blocking=true,
or can skip playback entirely with skip_playback=true (returns the WAV
bytes base64-encoded, forward-compatible with Option B session-routed
playback once the Wave 4 dashboard WebSocket lands). The player command
is injectable (modalityProxy.player) so tests never spawn real audio.

Metrics: mod3's X-Mod3-* response headers are parsed into a metrics
map on the tool result (job_id, duration_sec, rtf, sample_rate, etc.)
with numeric headers coerced to int64/float64 where applicable.

Errors: mod3-unreachable returns IsError=true with "mod3 unreachable: …"
text so the ledger's tool.result event records the failure (same shape
as the serve_sessions_channel.go pattern). Non-2xx responses from mod3
preserve the body text in the error result — callers see mod3's own
422 / 5xx explanation intact.

Fixes: the drop-audio-bytes bug observed in the installed Apr-19 binary
(mcp__cogos__mod3_speak completes in ~1s, returns metrics, but plays
nothing). With this proxy the kernel actually hears what mod3 makes.

Timeout: 30s on the HTTP client (vs 8s on the channel-session forwarder);
accounts for cold-start model loading and multi-sentence synthesis.

Files:
  internal/engine/mcp_modality_proxy.go       (new, 551 lines)
  internal/engine/mcp_modality_proxy_test.go  (new, 686 lines)
  internal/engine/mcp_server.go               (modified, +11)

Tests: 20 new unit tests — synthesis success/error/session-threading,
stop/voices/status/sessions forwarding, metric extraction, server-side
playback via a stub shell-script player (proves the bytes reach the
player, guarding against the drop-audio regression), non-blocking spawn.
All pass. Full ./... suite green.

Out of scope (deferred to later waves):
  * Wave 4: dashboard participant UI + session-routed playback
  * session-start hook auto-registration
  * consolidation with the existing OpenClaw-gateway mod3_speak
    (coexist until deprecated)
…ve 3.5)

Eliminates the Wave 2 / Wave 3 divergence where the mod3_register_session,
mod3_deregister_session, and mod3_list_sessions MCP tools called mod3's
/v1/sessions/* surface directly, bypassing Wave 2's kernel-owned session_id
minting at /v1/channel-sessions/register. Session-ID authority is now in
one place (ADR-082): the kernel's shared RegisterChannelSession /
DeregisterChannelSession / ListChannelSessions methods on *Server. Both the
HTTP handlers and the MCP tool handlers call through these methods. No
self-localhost loop.

Approach 2 (refactor shared logic) over Approach 1 (self-HTTP loop): the
Wave 2 handler bodies factored cleanly into *Server methods returning
typed errors, and wiring an MCPServer field via SetChannelSessionBackend
mirrors the existing SetSessionsBackend pattern, so the public surface
didn't have to change.

Schema alignment with the channel-provider RFC's cogos_session_register
primitive: added optional `kinds` (array) and `metadata` (map) fields to
both the Wave 2 register endpoint and the MCP tool input. Both flow
through to mod3 unchanged (mod3 ignores unknown fields today) and are
preserved on the kernel identity record so downstream consumers can
filter by capability.

- internal/engine/serve_sessions_channel.go: factored handleChannelSession*
  bodies into RegisterChannelSession / DeregisterChannelSession /
  ListChannelSessions methods returning a typed channelSessionForwardError;
  added Kinds / Metadata fields to ChannelSessionRecord and the request
  wire type; handlers now thin-wrap the shared methods.
- internal/engine/mcp_modality_proxy.go: the three session-family MCP tool
  handlers now call the channelSessionBackend interface on MCPServer; all
  direct HTTP calls to mod3's /v1/sessions/* removed from this file.
- internal/engine/mcp_server.go: added channelSessionBackend field + interface
  + SetChannelSessionBackend setter.
- internal/engine/serve_mcp.go: wires the live Server as the backend.
- tests: updated newProxyMCP to wire a live Server so MCP session-family
  tests exercise the shared code path; added coverage for minting via the
  MCP tool, kinds/metadata pass-through (both HTTP and MCP), and the
  no-backend error path.
@chazmaniandinkle chazmaniandinkle merged commit dbe69a8 into cogos-dev:main Apr 24, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant