Skip to content

feat: serve MCP over HTTP transport from daemon#388

Merged
CalebisGross merged 2 commits intofeat/exp25-faithfulness-probefrom
feat/384-mcp-http-transport
Apr 9, 2026
Merged

feat: serve MCP over HTTP transport from daemon#388
CalebisGross merged 2 commits intofeat/exp25-faithfulness-probefrom
feat/384-mcp-http-transport

Conversation

@CalebisGross
Copy link
Copy Markdown
Collaborator

Summary

  • Adds POST /mcp endpoint to the daemon API — Claude Code connects via HTTP transport instead of spawning stdio subprocesses
  • New SessionManager creates/caches MCPServer instances per session ID with 30-minute idle expiry and background reaper
  • HTTP handler generates session IDs on first request (via Mcp-Session-Id response header), routes subsequent requests to existing sessions
  • Exports JSONRPCRequest/JSONRPCResponse types and HandleSingleRequest for the HTTP transport layer
  • Claude Code config updated: {"type": "http", "url": "http://127.0.0.1:9999/mcp"}

Before: Each Claude Code session spawned mnemonic mcp subprocess → loaded Qwen model → ~3GB VRAM per session. 4 sessions + daemon = 15GB on a 16GB GPU.

After: All sessions share the daemon's single model load. Zero VRAM per session. Stale process accumulation eliminated.

Test plan

  • go vet and go test pass for all changed packages
  • ROCM=1 make build-embedded compiles
  • Daemon starts, health OK, LLM loaded
  • POST /mcp initialize without session ID → returns session ID in header
  • Subsequent requests with session ID route correctly
  • DELETE /mcp with session ID cleans up session
  • Missing session header on initialize creates new session (not error)
  • rocm-smi --showpids shows single daemon process on GPU
  • Session lifecycle logged (created, ended)

Closes #384

🤖 Generated with Claude Code

CalebisGross and others added 2 commits April 9, 2026 15:44
…uning

Extract mnemonic's own 1.5-2B model from Gemma 4 31B (30.7B dense,
60 layers) via Sheared-LLaMA-style targeted structural pruning.

Phases: full fine-tune baseline → learned pruning masks → continued
pretraining → standalone GGUF export. Progressive targets 8B→4B→2B→1.5B
to find the quality cliff.

Target: >200 tok/s, <1.5GB VRAM, match EXP-26 faithfulness metrics.
Hardware: MI300X for pruning, local 7800 XT for deployment.

Tracking: #386

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add POST /mcp endpoint to the daemon API, eliminating the need for
per-session stdio subprocesses. Claude Code connects via HTTP transport
to the already-running daemon, sharing its LLM, store, and agents.

- New SessionManager (internal/mcp/session.go) creates and caches
  MCPServer instances per session ID with 30-minute idle expiry
- HTTP handler (internal/api/routes/mcp.go) accepts JSON-RPC requests,
  generates session IDs on first request (returned via Mcp-Session-Id
  header), routes subsequent requests to existing sessions
- Export JSONRPCRequest/Response types and HandleSingleRequest for
  the HTTP transport layer
- Wire session manager into daemon serve pipeline

Claude Code config changes from stdio to HTTP transport:
  {"type": "http", "url": "http://127.0.0.1:9999/mcp"}

Result: N sessions x ~3GB VRAM each → one daemon, one model, ~3GB total.
The mcp subcommand remains as fallback for offline/no-daemon usage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@CalebisGross CalebisGross merged commit 65fe6cf into feat/exp25-faithfulness-probe Apr 9, 2026
@CalebisGross CalebisGross deleted the feat/384-mcp-http-transport branch April 9, 2026 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant