MemCan — Persistent Memory MCP Server

AI agents forget everything when a session ends. Every new session starts blank — you re-explain your preferences, the agent repeats mistakes you've already corrected, and hard-won project context evaporates.

MemCan fixes this. It gives agents a persistent, searchable memory store that survives across sessions. Agents automatically save learnings, decisions, and preferences as they work, and recall them at the start of the next session. Over time your agents get smarter: they remember your coding style, know which approaches failed before, and understand the quirks of your project without being told again.

Works with any MCP-compatible agent. Tested and optimized for Claude Code.

Built on embedded LanceDB + fastembed (in-process ONNX embeddings) + Ollama (local LLM for fact extraction and deduplication). No cloud, no external database — by default everything runs locally on your machine.

Quick Start

# 1. Install the plugin (run inside a Claude Code session)
/plugin marketplace add lklimek/agents
/plugin install memcan@lklimek

# 2. Run setup — installs CLI, downloads server config, generates API keys
/setup-memcan

# 3. Start the server (command printed by setup, typically:)
cd ~/.config/memcan/server && docker compose up -d

/setup-memcan guides you through everything: CLI install, Docker Compose server config, .env generation, and user rule creation. Restart Claude Code after setup. For all configuration options, see the Setup Guide.

Architecture

MemCan uses a two-component architecture:

Server (memcan-server) — long-lived HTTP MCP server handling embeddings, LLM, and storage. Runs as a Docker container or system service on port 8191 (internal), fronted by Traefik on port 8190.
CLI (memcan) — thin HTTP client. Installed by /setup-memcan. No fastembed/LanceDB deps.

The Claude Code plugin connects to the server via HTTP MCP transport (Streamable HTTP).

Stack

LanceDB — embedded vector database (no server needed, data stored locally)
fastembed — in-process ONNX embeddings (MultilingualE5Large, 1024 dimensions, ~1.3 GB model downloaded on first use)
Ollama — LLM inference (qwen3.5:9b by default, via ollama-rs); MemCan reads OLLAMA_HOST and OLLAMA_API_KEY from settings and passes them to the Ollama client. A GPU is recommended for best performance.
rmcp 1.1 — Rust MCP SDK with Streamable HTTP transport
axum — HTTP framework mounting MCP service + health endpoint + auth middleware
DISTILL_MEMORIES — when enabled (default: true), the LLM extracts structured facts from raw text before storing

MCP Tools

Tool	Description
`add_memory`	Store a memory with optional project scope and metadata (async, returns queued)
`search_memories`	Semantic search across memories
`get_memories`	List all memories for a scope
`delete_memory`	Remove a memory by ID
`update_memory`	Modify existing memory content (async, returns queued)
`count_memories`	Count memories for a scope (without fetching content)
`list_collections`	Discover available collections, point counts, and valid filter values
`search_standards`	Search indexed standards (CWE, OWASP, etc.) by semantic similarity
`search_code`	Search indexed code snippets by semantic similarity
`get_queue_status`	Check status of async add/update operations

Memory Scoping

project="penny" → scoped to project (stored as user_id=project:penny)
No project → global scope (stored as user_id=global)

Claude Code Context Persistence

Claude Code loads context into the attention window via several mechanisms. MemCan leverages them to ensure agents always know to use memory:

Mechanism	Location	When Loaded	Shared?
User CLAUDE.md	`~/.claude/CLAUDE.md`	Every session, all projects	Just you
User rules	`~/.claude/rules/*.md`	Every session, all projects	Just you
Project CLAUDE.md	`./CLAUDE.md` or `./.claude/CLAUDE.md`	When in that project	Team (via git)
Project rules	`./.claude/rules/*.md`	When in that project	Team (via git)
Local CLAUDE.md	`./CLAUDE.local.md`	When in that project	Just you (gitignored)
Path-scoped rules	`.claude/rules/*.md` with `paths:` frontmatter	On-demand, when matching files are touched	Team (via git)
Auto memory	`~/.claude/projects/<project>/memory/`	First 200 lines at session start	Just you

The user rule created by /setup-memcan lives in ~/.claude/rules/memcan.md — loaded into every session so agents always know to search and save memories.

Ollama

MemCan uses Ollama for local LLM inference (fact extraction and deduplication). A GPU is strongly recommended — the default model (qwen3.5:9b) runs too slowly on CPU for interactive use.

Using the bundled Ollama (docker compose)

The setup skill writes COMPOSE_PROFILES=ollama to the server .env, which enables the bundled Ollama container. After docker compose up -d, pull the model into it:

docker compose exec ollama ollama pull qwen3.5:9b

Disable bundled Ollama: In the server .env (~/.config/memcan/server/.env), set COMPOSE_PROFILES= (empty) or remove the line entirely, then restart with docker compose up -d. Point MemCan at an external Ollama via OLLAMA_HOST if needed.

GPU acceleration: The bundled Ollama runs in CPU mode by default. To enable GPU, uncomment the runtime: nvidia and deploy.resources blocks in docker-compose.yml (requires NVIDIA drivers and nvidia-container-runtime):

  ollama:
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Open WebUI: Add webui to the profiles (COMPOSE_PROFILES=ollama,webui) to also start Open WebUI.

Using a standalone Ollama

# Install Ollama, then pull the default model
ollama pull qwen3.5:9b

If Ollama runs on a different machine, point MemCan at it:

OLLAMA_HOST=http://192.168.1.10:11434
# If the endpoint requires auth:
OLLAMA_API_KEY=your-token-here

Cloud LLM: Only Ollama is currently supported. If you need a different LLM provider, open an issue.

Deprecated Features

Auto-Hooks (SubagentStop / PreCompact)

Status: Removed in v0.35 Alternative: Use the lessons-learned skill (now in the claudius plugin) for deliberate memory extraction.

The automatic extraction hooks (SubagentStop and PreCompact events calling memcan extract) have been removed due to severe quality issues:

Raw output storage: The hooks captured entire agent outputs — conversation transcripts, research reports, TODO list renders — as "memories" instead of distilling actionable facts
Massive bloat: In one project, 437 auto-hook memories consumed 760KB (95% of total storage). Three individual memories exceeded 50KB each, with the largest at 220KB (an entire TODO list dump stored verbatim)
Context overflow: When search or recall returned these bloated memories, they consumed the entire context window, making Claude Code unusable
Low signal-to-noise: The vast majority of auto-hook memories were ephemeral junk — commit hashes, temp file paths, test pass counts, file rename notifications

The memcan extract CLI binary remains available only for the legacy auto-hook pipeline and manual use; the current lessons-learned flow in the claudius plugin talks to MemCan via the MCP add_memory / remember tools and does not call memcan extract (memories created by memcan extract continue to be tagged with metadata.source="auto-hook" and type="lesson").

To clean up existing auto-hook memories, use memcan-server purge-memories --source auto-hook (planned capability — not yet implemented) or delete them individually via memcan delete.

License

MIT

_{Co-authored by Claudius the Magnificent AI Agent}

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
.claude-plugin		.claude-plugin
.claude		.claude
.github/workflows		.github/workflows
rs		rs
scripts		scripts
skills		skills
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.rustfmt.toml		.rustfmt.toml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LESSONS_LEARNED.md		LESSONS_LEARNED.md
LICENSE		LICENSE
README.md		README.md
SETUP.md		SETUP.md
audit.toml		audit.toml
docker-compose.yml		docker-compose.yml
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MemCan — Persistent Memory MCP Server

Quick Start

Architecture

Stack

MCP Tools

Memory Scoping

Claude Code Context Persistence

Ollama

Using the bundled Ollama (docker compose)

Using a standalone Ollama

Deprecated Features

Auto-Hooks (SubagentStop / PreCompact)

License

About

Uh oh!

Releases 14

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MemCan — Persistent Memory MCP Server

Quick Start

Architecture

Stack

MCP Tools

Memory Scoping

Claude Code Context Persistence

Ollama

Using the bundled Ollama (docker compose)

Using a standalone Ollama

Deprecated Features

Auto-Hooks (SubagentStop / PreCompact)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages