kernel-mod3 integration: session forwarder + MCP proxy + tool routing (Waves 2-3.5)#45
Merged
chazmaniandinkle merged 3 commits intocogos-dev:mainfrom Apr 24, 2026
Conversation
added 3 commits
April 23, 2026 12:10
…ave 2)
Kernel-side HTTP forwarder that makes the kernel the identity authority
for mod3 channel participants while keeping mod3 the communication-state
owner (voice, queue, device). Implements the decision from ADR-082's
layer-separation rule: "CogOS owns identity; Mod3 owns communication."
Routes (namespaced under /v1/channel-sessions/):
POST /v1/channel-sessions/register mint+forward
POST /v1/channel-sessions/{id}/deregister forward
GET /v1/channel-sessions list (kernel+mod3)
GET /v1/channel-sessions/{id} detail
Namespace choice: the existing /v1/sessions/* family serves agent-session
state with 3-component hyphen-validated IDs tied to the handoff protocol.
Channel-participant registration has an incompatible shape (short UUID
IDs, participant_id/participant_type/voice/device fields). Rather than
weaken ValidateSessionID (which would cascade into handoff semantics),
the new concern takes its own namespace. The channel-provider RFC's
guidance to unify on cogos_session_register with a participant_type
discriminator remains the target at the MCP tool layer (Wave 3); at the
HTTP layer the two surfaces coexist cleanly.
Behavior:
- Kernel mints a session_id (short UUID) if the caller omits one.
- Request is forwarded to Config.Mod3URL (default http://localhost:7860)
with a 5s timeout.
- Response merges kernel identity record + mod3 channel state.
- Mod3 unreachable -> HTTP 502 with clear error body.
- Mod3 error responses are preserved and surfaced.
Config: new Config.Mod3URL field, overridable via MOD3_URL env var.
Tests: serve_sessions_channel_test.go covers ID minting, field passthrough,
response merging, mod3-down 502, and all four sibling endpoints via
httptest.Server fakes.
Wave 3 of the mod3-kernel integration (ADR-082 + channel-provider RFC).
The kernel becomes the MCP front door for mod3 voice tools via an HTTP
proxy — supersedes the installed binary's OpenClaw gateway, which read
mod3's metric headers but silently discarded the audio/wav payload.
Tools registered (mod3_* namespace on the /mcp endpoint):
* mod3_speak synthesize text + play audio locally
* mod3_stop cancel current/queued speech (optional job_id)
* mod3_voices list available voices
* mod3_status probe mod3 /health
* mod3_register_session proxy to POST /v1/sessions/register
* mod3_deregister_session proxy to POST /v1/sessions/{id}/deregister
* mod3_list_sessions proxy to GET /v1/sessions
All tools accept an optional session_id and thread it through to mod3 —
in the request body for synthesize, as a query parameter for stop/voices,
in the URL for register/deregister. Absent session_id → the proxy omits
the field and mod3 routes to its default session.
Playback strategy: Option (A), server-side. Synthesis response bodies
are written to a tempfile, then played by afplay (macOS) / aplay (Linux)
via a fire-and-forget exec. Callers opt in to blocking via blocking=true,
or can skip playback entirely with skip_playback=true (returns the WAV
bytes base64-encoded, forward-compatible with Option B session-routed
playback once the Wave 4 dashboard WebSocket lands). The player command
is injectable (modalityProxy.player) so tests never spawn real audio.
Metrics: mod3's X-Mod3-* response headers are parsed into a metrics
map on the tool result (job_id, duration_sec, rtf, sample_rate, etc.)
with numeric headers coerced to int64/float64 where applicable.
Errors: mod3-unreachable returns IsError=true with "mod3 unreachable: …"
text so the ledger's tool.result event records the failure (same shape
as the serve_sessions_channel.go pattern). Non-2xx responses from mod3
preserve the body text in the error result — callers see mod3's own
422 / 5xx explanation intact.
Fixes: the drop-audio-bytes bug observed in the installed Apr-19 binary
(mcp__cogos__mod3_speak completes in ~1s, returns metrics, but plays
nothing). With this proxy the kernel actually hears what mod3 makes.
Timeout: 30s on the HTTP client (vs 8s on the channel-session forwarder);
accounts for cold-start model loading and multi-sentence synthesis.
Files:
internal/engine/mcp_modality_proxy.go (new, 551 lines)
internal/engine/mcp_modality_proxy_test.go (new, 686 lines)
internal/engine/mcp_server.go (modified, +11)
Tests: 20 new unit tests — synthesis success/error/session-threading,
stop/voices/status/sessions forwarding, metric extraction, server-side
playback via a stub shell-script player (proves the bytes reach the
player, guarding against the drop-audio regression), non-blocking spawn.
All pass. Full ./... suite green.
Out of scope (deferred to later waves):
* Wave 4: dashboard participant UI + session-routed playback
* session-start hook auto-registration
* consolidation with the existing OpenClaw-gateway mod3_speak
(coexist until deprecated)
…ve 3.5) Eliminates the Wave 2 / Wave 3 divergence where the mod3_register_session, mod3_deregister_session, and mod3_list_sessions MCP tools called mod3's /v1/sessions/* surface directly, bypassing Wave 2's kernel-owned session_id minting at /v1/channel-sessions/register. Session-ID authority is now in one place (ADR-082): the kernel's shared RegisterChannelSession / DeregisterChannelSession / ListChannelSessions methods on *Server. Both the HTTP handlers and the MCP tool handlers call through these methods. No self-localhost loop. Approach 2 (refactor shared logic) over Approach 1 (self-HTTP loop): the Wave 2 handler bodies factored cleanly into *Server methods returning typed errors, and wiring an MCPServer field via SetChannelSessionBackend mirrors the existing SetSessionsBackend pattern, so the public surface didn't have to change. Schema alignment with the channel-provider RFC's cogos_session_register primitive: added optional `kinds` (array) and `metadata` (map) fields to both the Wave 2 register endpoint and the MCP tool input. Both flow through to mod3 unchanged (mod3 ignores unknown fields today) and are preserved on the kernel identity record so downstream consumers can filter by capability. - internal/engine/serve_sessions_channel.go: factored handleChannelSession* bodies into RegisterChannelSession / DeregisterChannelSession / ListChannelSessions methods returning a typed channelSessionForwardError; added Kinds / Metadata fields to ChannelSessionRecord and the request wire type; handlers now thin-wrap the shared methods. - internal/engine/mcp_modality_proxy.go: the three session-family MCP tool handlers now call the channelSessionBackend interface on MCPServer; all direct HTTP calls to mod3's /v1/sessions/* removed from this file. - internal/engine/mcp_server.go: added channelSessionBackend field + interface + SetChannelSessionBackend setter. - internal/engine/serve_mcp.go: wires the live Server as the backend. - tests: updated newProxyMCP to wire a live Server so MCP session-family tests exercise the shared code path; added coverage for minting via the MCP tool, kinds/metadata pass-through (both HTTP and MCP), and the no-backend error path.
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three stacked commits that establish the kernel as the identity authority for mod3 channel sessions and expose mod3's capabilities through the kernel's MCP server with working audio playback. Part of the ADR-082 / channel-provider RFC rollout.
Depends on: cogos-dev/mod3#5 (mod3 Phase 1 session registry).
Commits
a2f7904— feat(engine): forward channel-session registration to mod3 (Wave 2)New HTTP surface on the kernel, namespaced under
/v1/channel-sessions/to avoid collision with the existing/v1/sessions/family (agent-session state, 3-component hyphen-validated IDs tied to handoff):POST /v1/channel-sessions/register— mints short UUID when caller omits, forwards to mod3POST /v1/channel-sessions/{id}/deregisterGET /v1/channel-sessionsGET /v1/channel-sessions/{id}Mod3 unreachable returns 502 with a clean error body. New
Config.Mod3URL(defaulthttp://localhost:7860,MOD3_URLenv override).c8e2d8d— feat(engine): MCP proxy for mod3 tools with audio playback (Wave 3)Seven new MCP tools on the kernel's server:
mod3_speak— synth + afplay locally (fire-and-forget;blocking=trueorskip_playback=trueflags)mod3_stop,mod3_voices,mod3_statusmod3_register_session,mod3_deregister_session,mod3_list_sessionsFixes the audio-bytes-dropped bug in the old OpenClaw-gateway
mod3_speak: the kernel now writes WAV bytes to a tempfile and spawnsafplay(macOS) oraplay/paplay(Linux). Player is an injectable field → tests use stub shell scripts.skip_playback=truereturns base64 audio for Wave 4's dashboard WebSocket routing.dbe69a8— refactor(engine): route MCP session tools through kernel endpoint (Wave 3.5)Eliminates a divergence where Wave 3's session-family tools called mod3 directly, bypassing Wave 2's kernel-owned session-ID minting. Wave 2's handler bodies refactored into
*Servermethods (RegisterChannelSession,DeregisterChannelSession,ListChannelSessions). Both HTTP handlers AND MCP tool handlers call through these — no self-localhost loop. RFC schema additions: optionalkindsandmetadatafields accepted at the register endpoint, flow through to mod3.Architectural effect
mcp__cogos__mod3_*works end-to-endmod3_speakread metrics and discarded audio/wav bytes; the Wave 3 path runs afplay server-side and returnsplayback_status: spawnedto the callerskip_playback=true+ thesession_idthreading set up Wave 4 (dashboard participant panel with per-session WebSocket audio routing)Test plan
httptest.NewServer)go test ./...green in 9.3sgofmt -lclean on all changed filesmcp__cogos__mod3_speakplays audio end-to-end via afplay (verified with user)