feat: serve MCP over HTTP transport from daemon by CalebisGross · Pull Request #388 · AppSprout-dev/mnemonic

CalebisGross · 2026-04-09T19:59:03Z

Summary

Adds POST /mcp endpoint to the daemon API — Claude Code connects via HTTP transport instead of spawning stdio subprocesses
New SessionManager creates/caches MCPServer instances per session ID with 30-minute idle expiry and background reaper
HTTP handler generates session IDs on first request (via Mcp-Session-Id response header), routes subsequent requests to existing sessions
Exports JSONRPCRequest/JSONRPCResponse types and HandleSingleRequest for the HTTP transport layer
Claude Code config updated: {"type": "http", "url": "http://127.0.0.1:9999/mcp"}

Before: Each Claude Code session spawned mnemonic mcp subprocess → loaded Qwen model → ~3GB VRAM per session. 4 sessions + daemon = 15GB on a 16GB GPU.

After: All sessions share the daemon's single model load. Zero VRAM per session. Stale process accumulation eliminated.

Test plan

go vet and go test pass for all changed packages
ROCM=1 make build-embedded compiles
Daemon starts, health OK, LLM loaded
POST /mcp initialize without session ID → returns session ID in header
Subsequent requests with session ID route correctly
DELETE /mcp with session ID cleans up session
Missing session header on initialize creates new session (not error)
rocm-smi --showpids shows single daemon process on GPU
Session lifecycle logged (created, ended)

Closes #384

🤖 Generated with Claude Code

…uning Extract mnemonic's own 1.5-2B model from Gemma 4 31B (30.7B dense, 60 layers) via Sheared-LLaMA-style targeted structural pruning. Phases: full fine-tune baseline → learned pruning masks → continued pretraining → standalone GGUF export. Progressive targets 8B→4B→2B→1.5B to find the quality cliff. Target: >200 tok/s, <1.5GB VRAM, match EXP-26 faithfulness metrics. Hardware: MI300X for pruning, local 7800 XT for deployment. Tracking: #386 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add POST /mcp endpoint to the daemon API, eliminating the need for per-session stdio subprocesses. Claude Code connects via HTTP transport to the already-running daemon, sharing its LLM, store, and agents. - New SessionManager (internal/mcp/session.go) creates and caches MCPServer instances per session ID with 30-minute idle expiry - HTTP handler (internal/api/routes/mcp.go) accepts JSON-RPC requests, generates session IDs on first request (returned via Mcp-Session-Id header), routes subsequent requests to existing sessions - Export JSONRPCRequest/Response types and HandleSingleRequest for the HTTP transport layer - Wire session manager into daemon serve pipeline Claude Code config changes from stdio to HTTP transport: {"type": "http", "url": "http://127.0.0.1:9999/mcp"} Result: N sessions x ~3GB VRAM each → one daemon, one model, ~3GB total. The mcp subcommand remains as fallback for offline/no-daemon usage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CalebisGross and others added 2 commits April 9, 2026 15:44

CalebisGross mentioned this pull request Apr 9, 2026

feat: serve MCP over HTTP transport from daemon to eliminate per-session subprocess spawning #384

Closed

CalebisGross merged commit 65fe6cf into feat/exp25-faithfulness-probe Apr 9, 2026

CalebisGross deleted the feat/384-mcp-http-transport branch April 9, 2026 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: serve MCP over HTTP transport from daemon#388

feat: serve MCP over HTTP transport from daemon#388
CalebisGross merged 2 commits intofeat/exp25-faithfulness-probefrom
feat/384-mcp-http-transport

CalebisGross commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CalebisGross commented Apr 9, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant