AI agents forget everything when a session ends. Every new session starts blank — you re-explain your preferences, the agent repeats mistakes you've already corrected, and hard-won project context evaporates.
MemCan fixes this. It gives agents a persistent, searchable memory store that survives across sessions. Agents automatically save learnings, decisions, and preferences as they work, and recall them at the start of the next session. Over time your agents get smarter: they remember your coding style, know which approaches failed before, and understand the quirks of your project without being told again.
Works with any MCP-compatible agent. Tested and optimized for Claude Code.
Built on embedded LanceDB + fastembed (in-process ONNX embeddings) + Ollama (local LLM for fact extraction and deduplication). No cloud, no external database — by default everything runs locally on your machine.
# 1. Install the plugin (run inside a Claude Code session)
/plugin marketplace add lklimek/agents
/plugin install memcan@lklimek
# 2. Run setup — installs CLI, downloads server config, generates API keys
/setup-memcan
# 3. Start the server (command printed by setup, typically:)
cd ~/.config/memcan/server && docker compose up -d/setup-memcan guides you through everything: CLI install, Docker Compose server config, .env generation, and user rule creation. Restart Claude Code after setup. For all configuration options, see the Setup Guide.
MemCan uses a two-component architecture:
- Server (
memcan-server) — long-lived HTTP MCP server handling embeddings, LLM, and storage. Runs as a Docker container or system service on port 8191 (internal), fronted by Traefik on port 8190. - CLI (
memcan) — thin HTTP client. Installed by/setup-memcan. No fastembed/LanceDB deps.
The Claude Code plugin connects to the server via HTTP MCP transport (Streamable HTTP).
- LanceDB — embedded vector database (no server needed, data stored locally)
- fastembed — in-process ONNX embeddings (
MultilingualE5Large, 1024 dimensions, ~1.3 GB model downloaded on first use) - Ollama — LLM inference (
qwen3.5:9bby default, via ollama-rs); MemCan readsOLLAMA_HOSTandOLLAMA_API_KEYfrom settings and passes them to the Ollama client. A GPU is recommended for best performance. - rmcp 1.1 — Rust MCP SDK with Streamable HTTP transport
- axum — HTTP framework mounting MCP service + health endpoint + auth middleware
- DISTILL_MEMORIES — when enabled (default:
true), the LLM extracts structured facts from raw text before storing
| Tool | Description |
|---|---|
add_memory |
Store a memory with optional project scope and metadata (async, returns queued) |
search_memories |
Semantic search across memories |
get_memories |
List all memories for a scope |
delete_memory |
Remove a memory by ID |
update_memory |
Modify existing memory content (async, returns queued) |
count_memories |
Count memories for a scope (without fetching content) |
list_collections |
Discover available collections, point counts, and valid filter values |
search_standards |
Search indexed standards (CWE, OWASP, etc.) by semantic similarity |
search_code |
Search indexed code snippets by semantic similarity |
get_queue_status |
Check status of async add/update operations |
project="penny"→ scoped to project (stored asuser_id=project:penny)- No project → global scope (stored as
user_id=global)
Claude Code loads context into the attention window via several mechanisms. MemCan leverages them to ensure agents always know to use memory:
| Mechanism | Location | When Loaded | Shared? |
|---|---|---|---|
| User CLAUDE.md | ~/.claude/CLAUDE.md |
Every session, all projects | Just you |
| User rules | ~/.claude/rules/*.md |
Every session, all projects | Just you |
| Project CLAUDE.md | ./CLAUDE.md or ./.claude/CLAUDE.md |
When in that project | Team (via git) |
| Project rules | ./.claude/rules/*.md |
When in that project | Team (via git) |
| Local CLAUDE.md | ./CLAUDE.local.md |
When in that project | Just you (gitignored) |
| Path-scoped rules | .claude/rules/*.md with paths: frontmatter |
On-demand, when matching files are touched | Team (via git) |
| Auto memory | ~/.claude/projects/<project>/memory/ |
First 200 lines at session start | Just you |
The user rule created by /setup-memcan lives in ~/.claude/rules/memcan.md — loaded into every session so agents always know to search and save memories.
MemCan uses Ollama for local LLM inference (fact extraction and deduplication). A GPU is strongly recommended — the default model (qwen3.5:9b) runs too slowly on CPU for interactive use.
The setup skill writes COMPOSE_PROFILES=ollama to the server .env, which enables the bundled Ollama container. After docker compose up -d, pull the model into it:
docker compose exec ollama ollama pull qwen3.5:9bDisable bundled Ollama: In the server .env (~/.config/memcan/server/.env), set COMPOSE_PROFILES= (empty) or remove the line entirely, then restart with docker compose up -d. Point MemCan at an external Ollama via OLLAMA_HOST if needed.
GPU acceleration: The bundled Ollama runs in CPU mode by default. To enable GPU, uncomment the runtime: nvidia and deploy.resources blocks in docker-compose.yml (requires NVIDIA drivers and nvidia-container-runtime):
ollama:
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]Open WebUI: Add webui to the profiles (COMPOSE_PROFILES=ollama,webui) to also start Open WebUI.
# Install Ollama, then pull the default model
ollama pull qwen3.5:9bIf Ollama runs on a different machine, point MemCan at it:
OLLAMA_HOST=http://192.168.1.10:11434
# If the endpoint requires auth:
OLLAMA_API_KEY=your-token-hereCloud LLM: Only Ollama is currently supported. If you need a different LLM provider, open an issue.
Status: Removed in v0.35
Alternative: Use the lessons-learned skill (now in the claudius plugin) for deliberate memory extraction.
The automatic extraction hooks (SubagentStop and PreCompact events calling memcan extract) have been removed due to severe quality issues:
- Raw output storage: The hooks captured entire agent outputs — conversation transcripts, research reports, TODO list renders — as "memories" instead of distilling actionable facts
- Massive bloat: In one project, 437 auto-hook memories consumed 760KB (95% of total storage). Three individual memories exceeded 50KB each, with the largest at 220KB (an entire TODO list dump stored verbatim)
- Context overflow: When
searchorrecallreturned these bloated memories, they consumed the entire context window, making Claude Code unusable - Low signal-to-noise: The vast majority of auto-hook memories were ephemeral junk — commit hashes, temp file paths, test pass counts, file rename notifications
The memcan extract CLI binary remains available only for the legacy auto-hook pipeline and manual use; the current lessons-learned flow in the claudius plugin talks to MemCan via the MCP add_memory / remember tools and does not call memcan extract (memories created by memcan extract continue to be tagged with metadata.source="auto-hook" and type="lesson").
To clean up existing auto-hook memories, use memcan-server purge-memories --source auto-hook (planned capability — not yet implemented) or delete them individually via memcan delete.
MIT
Co-authored by Claudius the Magnificent AI Agent