Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ npx @agentmemory/agentmemory
</tr>
</table>

> Embedding model: `all-MiniLM-L6-v2` (local, free, no API key). Full reports: [`benchmark/LONGMEMEVAL.md`](benchmark/LONGMEMEVAL.md), [`benchmark/QUALITY.md`](benchmark/QUALITY.md), [`benchmark/SCALE.md`](benchmark/SCALE.md)
> Embedding model: `all-MiniLM-L6-v2` (local, free, no API key). Full reports: [`benchmark/LONGMEMEVAL.md`](benchmark/LONGMEMEVAL.md), [`benchmark/QUALITY.md`](benchmark/QUALITY.md), [`benchmark/SCALE.md`](benchmark/SCALE.md). Competitor comparison: [`benchmark/COMPARISON.md`](benchmark/COMPARISON.md) — agentmemory vs mem0, Letta, Khoj, claude-mem, Hippo.

---

Expand Down Expand Up @@ -210,6 +210,20 @@ agentmemory works with any agent that supports hooks, MCP, or REST API. All agen

## Quick Start

### Try it in 30 seconds

```bash
# Terminal 1: start the server
npx @agentmemory/agentmemory

# Terminal 2: seed sample data and see recall in action
npx @agentmemory/agentmemory demo
```

`demo` seeds 3 realistic sessions (JWT auth, N+1 query fix, rate limiting) and runs semantic searches against them. You'll see it find "N+1 query fix" when you search "database performance optimization" — keyword matching can't do that.

Open `http://localhost:3113` to watch the memory build live.

### Claude Code (one block, paste it)

```
Expand All @@ -225,7 +239,7 @@ Then add the MCP config for your agent:
| Agent | Setup |
|---|---|
| **Cursor** | Add to `~/.cursor/mcp.json`: `{"mcpServers": {"agentmemory": {"command": "npx", "args": ["agentmemory-mcp"]}}}` |
| **OpenClaw** | Add to MCP config: `{"mcpServers": {"agentmemory": {"command": "npx", "args": ["agentmemory-mcp"]}}}` |
| **OpenClaw** | Add to MCP config: `{"mcpServers": {"agentmemory": {"command": "npx", "args": ["agentmemory-mcp"]}}}` or use the [gateway plugin](integrations/openclaw/) |
| **Gemini CLI** | `gemini mcp add agentmemory -- npx agentmemory-mcp` |
| **Codex CLI** | Add to `.codex/config.yaml`: `mcp_servers: {agentmemory: {command: npx, args: ["agentmemory-mcp"]}}` |
| **OpenCode** | Add to `.opencode/config.json`: `{"mcpServers": {"agentmemory": {"command": "npx", "args": ["agentmemory-mcp"]}}}` |
Expand Down
151 changes: 151 additions & 0 deletions benchmark/COMPARISON.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# AI Agent Memory: Benchmark Comparison

How agentmemory compares against other persistent memory solutions for AI coding agents.

All numbers here come from published benchmarks or public repositories. We link to primary sources wherever possible so you can reproduce.

---

## Retrieval Accuracy (LongMemEval)

[LongMemEval](https://arxiv.org/abs/2410.10813) (ICLR 2025) measures long-term memory retrieval across ~48 sessions per question on the S variant (500 questions, ~115K tokens each).

| System | Benchmark | R@5 | Notes |
|---|---|---|---|
| **agentmemory** (BM25 + Vector) | LongMemEval-S | **95.2%** | `all-MiniLM-L6-v2` embeddings, no API key |
| agentmemory (BM25-only) | LongMemEval-S | 86.2% | Fallback when no embedding provider available |
| MemPalace | LongMemEval-S | ~96.6% | Vector-only, bigger embedding model |
| Letta / MemGPT | LoCoMo | 83.2% | Different benchmark (LoCoMo, not LongMemEval) |
| Mem0 | LoCoMo | 68.5% | Different benchmark (LoCoMo, not LongMemEval) |

**⚠️ Apples vs oranges caveat:** agentmemory and MemPalace are measured on LongMemEval-S. Letta and Mem0 publish on [LoCoMo](https://snap-stanford.github.io/LoCoMo/), a different benchmark. We're showing both so you can see the ballpark. We'd love to run all four on the same dataset — if any maintainer wants to collaborate, open an issue.

Full agentmemory methodology: [`LONGMEMEVAL.md`](LONGMEMEVAL.md)

---

## Feature Matrix

| Feature | agentmemory | mem0 | Letta/MemGPT | Khoj | claude-mem | Hippo |
|---|---|---|---|---|---|---|
| **GitHub stars** | Growing | 53K+ | 22K+ | 34K+ | 46K+ | Trending |
| **Type** | Memory engine + MCP server | Memory layer API | Full agent runtime | Personal AI | MCP server | Memory system |
| **Auto-capture via hooks** | ✅ 12 lifecycle hooks | ❌ Manual `add()` | ❌ Agent self-edits | ❌ Manual | ✅ Limited | ❌ Manual |
| **Search strategy** | BM25 + Vector + Graph | Vector + Graph | Vector (archival) | Semantic | FTS5 | Decay-weighted |
| **Multi-agent coordination** | ✅ Leases + signals + mesh | ❌ | Runtime-internal only | ❌ | ❌ | Multi-agent shared |
| **Framework lock-in** | None | None | High | Standalone | Claude Code | None |
| **External deps** | None | Qdrant/pgvector | Postgres + vector | Multiple | None (SQLite) | None |
| **Self-hostable** | ✅ default | Optional | Optional | ✅ | ✅ | ✅ |
| **Knowledge graph** | ✅ Entity extraction + BFS | ✅ Mem0g variant | ❌ | Doc links | ❌ | ❌ |
| **Memory decay** | ✅ Ebbinghaus + tiered | ❌ | ❌ | ❌ | ❌ | ✅ Half-lives |
| **4-tier consolidation** | ✅ Working → episodic → semantic → procedural | ❌ | OS-inspired tiers | ❌ | ❌ | Episodic + semantic |
| **Version / supersession** | ✅ Jaccard-based | Passive | ❌ | ❌ | ❌ | ❌ |
| **Real-time viewer** | ✅ Port 3113 | Cloud dashboard | Cloud dashboard | Web UI | ❌ | ❌ |
| **Privacy filtering** | ✅ Strips secrets pre-store | ❌ | ❌ | ❌ | ❌ | ❌ |
| **Obsidian export** | ✅ Built-in | ❌ | ❌ | Native format | ❌ | ❌ |
| **Cross-agent** | ✅ MCP + REST | API calls | Within runtime | Standalone | Claude-only | Multi-agent shared |
| **Audit trail** | ✅ All mutations logged | ❌ | Limited | ❌ | ❌ | ❌ |
| **Language SDKs** | Any (REST + MCP) | Python + TS | Python only | API | Any (MCP) | Node |

---

## Token Efficiency

The main reason to use persistent memory at all: token cost. Here's what one year of heavy agent use looks like across approaches.

| Approach | Tokens / year | Cost / year | Notes |
|---|---|---|---|
| Paste full history into context | 19.5M+ | Impossible | Exceeds context window after ~200 observations |
| LLM-summarized memory (extraction-based) | ~650K | ~$500 | Lossy — summarization drops detail |
| **agentmemory (API embeddings)** | **~170K** | **~$10** | Token-budgeted, only relevant memories injected |
| **agentmemory (local embeddings)** | **~170K** | **$0** | `all-MiniLM-L6-v2` runs in-process |
| claude-mem | Reports ~10x savings | — | SQLite + FTS5 + 3-layer filter |
| Mem0 | Varies by integration | — | Extraction-based, no token budget |

**agentmemory ships with a built-in token savings calculator.** Run `npx @agentmemory/agentmemory status` after a few sessions and you'll see exactly how many tokens you've saved vs. pasting the full history.

---

## What Each Tool Is Best At

This isn't a "agentmemory wins everything" page. Different tools solve different problems.

**Choose agentmemory if you want:**
- Automatic capture with zero manual `add()` calls
- MCP server that works across Claude Code, Cursor, Codex, Gemini CLI, etc.
- Hybrid BM25 + vector + graph search
- Real-time viewer to see what your agent is learning
- Self-hostable with zero external databases
- Privacy filtering on API keys and secrets
- Multi-agent coordination (leases, signals, routines)

**Choose Mem0 if you want:**
- Framework-agnostic API to bolt onto an existing agent
- Managed cloud option with a dashboard
- Python + TypeScript SDKs for direct integration
- Entity/relationship extraction as the primary abstraction

**Choose Letta/MemGPT if you want:**
- A full agent runtime, not just memory
- OS-inspired memory tiers (core/archival/recall)
- Agents that self-edit their memory via function calls
- Long-running conversational agents (weeks/months)

**Choose Khoj if you want:**
- A personal AI second brain, not agent infrastructure
- Document-first search over your files and the web
- Obsidian/Notion/Emacs integrations
- Scheduled automations and research tasks

**Choose claude-mem if you want:**
- Claude Code-specific tooling with SQLite + FTS5
- Minimal install footprint
- Token compression via LLM

**Choose Hippo if you want:**
- Biologically-inspired memory model (decay, consolidation, sleep)
- Multi-agent shared memory as a primary feature
- "Forget by default, earn persistence through use" philosophy

---

## Running Your Own Benchmarks

We encourage you to measure this yourself rather than trust any README. Here's how:

```bash
# Clone the repo
git clone https://github.com/rohitg00/agentmemory.git
cd agentmemory && npm install

# Run LongMemEval-S
npm run bench:longmemeval

# Run quality benchmark (240 observations, 20 queries)
npm run bench:quality

# Run scale benchmark
npm run bench:scale

# Run real embeddings benchmark
npm run bench:real-embeddings
```

Results land in `benchmark/results/`. All scripts, datasets, and results are committed for reproducibility.

---

## Corrections Welcome

If you maintain one of these tools and we got a number wrong, please open an issue or PR. We'd rather have accurate numbers than convenient ones.

If you want to add your tool to this comparison, open a PR with:
1. A link to your benchmark methodology
2. The metric and dataset you're measuring on
3. A commit hash / version so we can reproduce

**Sources:**
- Mem0 LoCoMo benchmark: [mem0.ai blog](https://mem0.ai)
- Letta LoCoMo benchmark: [letta.com/blog/benchmarking-ai-agent-memory](https://letta.com/blog/benchmarking-ai-agent-memory)
- LongMemEval paper: [arxiv.org/abs/2410.10813](https://arxiv.org/abs/2410.10813)
- LoCoMo paper: [snap-stanford.github.io/LoCoMo](https://snap-stanford.github.io/LoCoMo/)
122 changes: 122 additions & 0 deletions integrations/openclaw/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# agentmemory for OpenClaw

Persistent cross-session memory for [OpenClaw](https://github.com/openclaw/openclaw) via agentmemory. Gives every OpenClaw agent a searchable long-term memory with 95.2% retrieval accuracy on [LongMemEval-S](https://arxiv.org/abs/2410.10813).

## Why you want this

OpenClaw agents restart fresh every session. You waste tokens re-explaining architecture, re-discovering bugs, re-teaching preferences. agentmemory captures every tool use automatically and injects relevant context when the next session starts.

- **92% fewer tokens** per session vs full-context pasting
- **12 auto-capture hooks** — zero manual `memory.add()` calls
- **MCP-native** — same server works for Claude Code, Cursor, Gemini CLI, Hermes, and OpenClaw at the same time
- **Self-hosted** — no external database, no cloud, no API key needed for embeddings

## Quick setup

### Option 1: MCP server (zero code)

Start the agentmemory server in a separate terminal:

```bash
npx @agentmemory/agentmemory
```

Then add to your OpenClaw MCP config:

```json
{
"mcpServers": {
"agentmemory": {
"command": "npx",
"args": ["agentmemory-mcp"]
}
}
}
```

OpenClaw now has access to all 43 MCP tools including `memory_recall`, `memory_save`, `memory_smart_search`, `memory_timeline`, `memory_profile`, and more.

### Option 2: Gateway plugin (deeper integration)

If you're running an OpenClaw gateway, drop this folder into your gateway's plugins directory:

```bash
cp -r integrations/openclaw ~/.openclaw/plugins/memory/agentmemory
```

Start the agentmemory server:

```bash
npx @agentmemory/agentmemory
```

The plugin auto-detects the running server and hooks into the OpenClaw agent loop:

- `onSessionStart` starts a new session on the agentmemory server and injects any returned context
- `onPreLlmCall` injects token-budgeted memories before each LLM call (BM25 + vector + graph fusion)
- `onPostToolUse` records every tool use, error, and decision after execution
- `onSessionEnd` marks the session complete so raw observations can be compressed into structured memory

Configure via `~/.openclaw/plugins/memory/agentmemory/config.yaml`:

```yaml
enabled: true
base_url: http://localhost:3111
token_budget: 2000
min_confidence: 0.5
```

## What your agent gets

### Automatic context injection

When a session starts, agentmemory injects ~1,900 tokens of the most relevant past context:

```text
Project profile:
- Auth uses JWT middleware in src/middleware/auth.ts (jose, not jsonwebtoken)
- Tests in test/auth.test.ts cover token validation
- Database uses Prisma with include{} to avoid N+1 queries
- Rate limiting: 100 req/min default, Redis for prod

Recent decisions:
- Chose jose over jsonwebtoken for Edge compatibility (2026-03-15)
- N+1 fix dropped query time 450ms → 28ms (2026-03-20)
```

### Semantic search across sessions

Ask "what was that fix for slow user queries?" and the agent finds the Prisma include{} decision from three weeks ago. BM25 + vector + knowledge graph fusion.

### Privacy filtering

Every captured observation is scanned for API keys, secrets, bearer tokens, and `<private>` tags. These are stripped before storage. Modern token formats supported: `sk-`, `sk-proj-`, `ghp_/ghs_/ghu_`, AWS keys, and more.

### Multi-agent coordination

If you're running multiple OpenClaw agents on the same codebase:

- **Leases** give one agent exclusive claim on an action so they don't stomp each other
- **Signals** let agents send threaded messages to each other with read receipts
- **Mesh sync** shares memory between agentmemory instances (requires `AGENTMEMORY_SECRET`)

## Troubleshooting

**"Connection refused on port 3111"** — The agentmemory server isn't running. Start it with `npx @agentmemory/agentmemory` in a separate terminal.

**"No memories returned"** — Check `http://localhost:3113` (the real-time viewer). If there are no observations, the hooks aren't firing. Make sure your OpenClaw plugin is loaded and enabled.

**"Search returns irrelevant results"** — Install local embeddings: `npm install @xenova/transformers`. This enables vector search for +8pp recall over BM25-only.

**"I want to see what agentmemory is learning"** — Open `http://localhost:3113` in a browser. Live observation stream, session explorer, memory graph, and health dashboard.

## See also

- [agentmemory main README](../../README.md)
- [Benchmark results](../../benchmark/LONGMEMEVAL.md) — 95.2% R@5 on LongMemEval-S
- [Competitor comparison](../../benchmark/COMPARISON.md) — vs mem0, Letta, Khoj, claude-mem, Hippo
- [Hermes integration](../hermes/README.md) — same server also works with Hermes Agent

## License

Apache-2.0 (same as agentmemory)
Loading
Loading