Problem
Every Claude Code chat session spawns a separate mnemonic mcp subprocess via stdio transport. Each subprocess independently loads the full Qwen model into GPU VRAM (~3GB). Multiple concurrent sessions exhaust the 16GB RX 7800 XT:
Session 1 → mnemonic mcp → loads Qwen → 3GB VRAM
Session 2 → mnemonic mcp → loads Qwen → 3GB VRAM
Session 3 → mnemonic mcp → loads Qwen → 3GB VRAM
Daemon → mnemonic serve → loads Qwen → 3GB VRAM
Total: ~12-15GB
Stale MCP processes from closed sessions often linger, compounding the issue. This is a recurring problem — not a one-time cleanup.
Root Cause
The MCP server config in ~/.claude/settings.local.json uses stdio transport:
"mnemonic": {
"command": "/home/hubcaps/Projects/mem/bin/mnemonic",
"args": ["mcp"]
}
Each session spawns a full process that initializes its own LLM provider, store connection, encoding agent, and retrieval agent — duplicating what the already-running daemon has.
Solution
Serve MCP protocol over HTTP transport from the daemon. The daemon is already running with the model loaded, store open, and all agents active.
Key findings driving the approach:
- The MCP server is pure request/response — no server-initiated notifications. SSE is unnecessary.
- Claude Code supports
"type": "http" transport — SSE is deprecated. HTTP is recommended for already-running servers.
handleRequest() is already transport-agnostic — it takes a jsonRPCRequest and returns a jsonRPCResponse. Only the transport layer needs to change.
Implementation:
- Add
POST /mcp endpoint to daemon API — accepts JSON-RPC request body, calls existing handleRequest(), returns JSON-RPC response
- Per-session state via headers — track session ID (e.g.
Mcp-Session-Id header) so per-session memory tracking and onSessionEnd() still work
- Session lifecycle management — detect when a session disconnects (no requests for N minutes) and call
onSessionEnd() for cleanup
- Update Claude Code config to:
"mnemonic": {
"type": "http",
"url": "http://127.0.0.1:9999/mcp"
}
- Keep
mnemonic mcp (stdio) as fallback for when the daemon isn't running
Result:
Session 1 ──┐
Session 2 ──┼── POST /mcp ──→ daemon :9999 (one process, one model, ~3GB VRAM)
Session 3 ──┘
What changes:
- New HTTP handler in
internal/api/routes/ (~200 lines)
- Session multiplexing in the MCP server (~100 lines)
- Session timeout/cleanup logic
- Claude Code MCP config
What doesn't change:
- All 24 MCP tool handlers (zero changes)
- Store, LLM provider, event bus (shared from daemon)
mnemonic mcp subcommand (kept as offline fallback)
Impact
- Eliminates GPU VRAM exhaustion from concurrent sessions
- Eliminates stale subprocess accumulation
- Reduces per-session overhead from ~1.4GB RAM + 3GB VRAM to zero (shared daemon)
- Faster session startup (no model load, no DB open — daemon already has everything)
Problem
Every Claude Code chat session spawns a separate
mnemonic mcpsubprocess via stdio transport. Each subprocess independently loads the full Qwen model into GPU VRAM (~3GB). Multiple concurrent sessions exhaust the 16GB RX 7800 XT:Stale MCP processes from closed sessions often linger, compounding the issue. This is a recurring problem — not a one-time cleanup.
Root Cause
The MCP server config in
~/.claude/settings.local.jsonuses stdio transport:Each session spawns a full process that initializes its own LLM provider, store connection, encoding agent, and retrieval agent — duplicating what the already-running daemon has.
Solution
Serve MCP protocol over HTTP transport from the daemon. The daemon is already running with the model loaded, store open, and all agents active.
Key findings driving the approach:
"type": "http"transport — SSE is deprecated. HTTP is recommended for already-running servers.handleRequest()is already transport-agnostic — it takes ajsonRPCRequestand returns ajsonRPCResponse. Only the transport layer needs to change.Implementation:
POST /mcpendpoint to daemon API — accepts JSON-RPC request body, calls existinghandleRequest(), returns JSON-RPC responseMcp-Session-Idheader) so per-session memory tracking andonSessionEnd()still workonSessionEnd()for cleanupmnemonic mcp(stdio) as fallback for when the daemon isn't runningResult:
What changes:
internal/api/routes/(~200 lines)What doesn't change:
mnemonic mcpsubcommand (kept as offline fallback)Impact