rohitg00 · rohitg00 · Apr 12, 2026 · Apr 12, 2026 · Apr 12, 2026 · Apr 12, 2026
diff --git a/README.md b/README.md
@@ -84,7 +84,7 @@ npx @agentmemory/agentmemory
 </tr>
 </table>
 
-> Embedding model: `all-MiniLM-L6-v2` (local, free, no API key). Full reports: [`benchmark/LONGMEMEVAL.md`](benchmark/LONGMEMEVAL.md), [`benchmark/QUALITY.md`](benchmark/QUALITY.md), [`benchmark/SCALE.md`](benchmark/SCALE.md)
+> Embedding model: `all-MiniLM-L6-v2` (local, free, no API key). Full reports: [`benchmark/LONGMEMEVAL.md`](benchmark/LONGMEMEVAL.md), [`benchmark/QUALITY.md`](benchmark/QUALITY.md), [`benchmark/SCALE.md`](benchmark/SCALE.md). Competitor comparison: [`benchmark/COMPARISON.md`](benchmark/COMPARISON.md) — agentmemory vs mem0, Letta, Khoj, claude-mem, Hippo.
 
 ---
 
@@ -210,6 +210,20 @@ agentmemory works with any agent that supports hooks, MCP, or REST API. All agen
 
 ## Quick Start
 
+### Try it in 30 seconds
+
+```bash
+# Terminal 1: start the server
+npx @agentmemory/agentmemory
+
+# Terminal 2: seed sample data and see recall in action
+npx @agentmemory/agentmemory demo
+```
+
+`demo` seeds 3 realistic sessions (JWT auth, N+1 query fix, rate limiting) and runs semantic searches against them. You'll see it find "N+1 query fix" when you search "database performance optimization" — keyword matching can't do that.
+
+Open `http://localhost:3113` to watch the memory build live.
+
 ### Claude Code (one block, paste it)
 
 ```
@@ -225,7 +239,7 @@ Then add the MCP config for your agent:
 | Agent | Setup |
 |---|---|
 | **Cursor** | Add to `~/.cursor/mcp.json`: `{"mcpServers": {"agentmemory": {"command": "npx", "args": ["agentmemory-mcp"]}}}` |
-| **OpenClaw** | Add to MCP config: `{"mcpServers": {"agentmemory": {"command": "npx", "args": ["agentmemory-mcp"]}}}` |
+| **OpenClaw** | Add to MCP config: `{"mcpServers": {"agentmemory": {"command": "npx", "args": ["agentmemory-mcp"]}}}` or use the [gateway plugin](integrations/openclaw/) |
 | **Gemini CLI** | `gemini mcp add agentmemory -- npx agentmemory-mcp` |
 | **Codex CLI** | Add to `.codex/config.yaml`: `mcp_servers: {agentmemory: {command: npx, args: ["agentmemory-mcp"]}}` |
 | **OpenCode** | Add to `.opencode/config.json`: `{"mcpServers": {"agentmemory": {"command": "npx", "args": ["agentmemory-mcp"]}}}` |

diff --git a/benchmark/COMPARISON.md b/benchmark/COMPARISON.md
@@ -0,0 +1,151 @@
+# AI Agent Memory: Benchmark Comparison
+
+How agentmemory compares against other persistent memory solutions for AI coding agents.
+
+All numbers here come from published benchmarks or public repositories. We link to primary sources wherever possible so you can reproduce.
+
+---
+
+## Retrieval Accuracy (LongMemEval)
+
+[LongMemEval](https://arxiv.org/abs/2410.10813) (ICLR 2025) measures long-term memory retrieval across ~48 sessions per question on the S variant (500 questions, ~115K tokens each).
+
+| System | Benchmark | R@5 | Notes |
+|---|---|---|---|
+| **agentmemory** (BM25 + Vector) | LongMemEval-S | **95.2%** | `all-MiniLM-L6-v2` embeddings, no API key |
+| agentmemory (BM25-only) | LongMemEval-S | 86.2% | Fallback when no embedding provider available |
+| MemPalace | LongMemEval-S | ~96.6% | Vector-only, bigger embedding model |
+| Letta / MemGPT | LoCoMo | 83.2% | Different benchmark (LoCoMo, not LongMemEval) |
+| Mem0 | LoCoMo | 68.5% | Different benchmark (LoCoMo, not LongMemEval) |
+
+**⚠️ Apples vs oranges caveat:** agentmemory and MemPalace are measured on LongMemEval-S. Letta and Mem0 publish on [LoCoMo](https://snap-stanford.github.io/LoCoMo/), a different benchmark. We're showing both so you can see the ballpark. We'd love to run all four on the same dataset — if any maintainer wants to collaborate, open an issue.
+
+Full agentmemory methodology: [`LONGMEMEVAL.md`](LONGMEMEVAL.md)
+
+---
+
+## Feature Matrix
+
+| Feature | agentmemory | mem0 | Letta/MemGPT | Khoj | claude-mem | Hippo |
+|---|---|---|---|---|---|---|
+| **GitHub stars** | Growing | 53K+ | 22K+ | 34K+ | 46K+ | Trending |
+| **Type** | Memory engine + MCP server | Memory layer API | Full agent runtime | Personal AI | MCP server | Memory system |
+| **Auto-capture via hooks** | ✅ 12 lifecycle hooks | ❌ Manual `add()` | ❌ Agent self-edits | ❌ Manual | ✅ Limited | ❌ Manual |
+| **Search strategy** | BM25 + Vector + Graph | Vector + Graph | Vector (archival) | Semantic | FTS5 | Decay-weighted |
+| **Multi-agent coordination** | ✅ Leases + signals + mesh | ❌ | Runtime-internal only | ❌ | ❌ | Multi-agent shared |
+| **Framework lock-in** | None | None | High | Standalone | Claude Code | None |
+| **External deps** | None | Qdrant/pgvector | Postgres + vector | Multiple | None (SQLite) | None |
+| **Self-hostable** | ✅ default | Optional | Optional | ✅ | ✅ | ✅ |
+| **Knowledge graph** | ✅ Entity extraction + BFS | ✅ Mem0g variant | ❌ | Doc links | ❌ | ❌ |
+| **Memory decay** | ✅ Ebbinghaus + tiered | ❌ | ❌ | ❌ | ❌ | ✅ Half-lives |
+| **4-tier consolidation** | ✅ Working → episodic → semantic → procedural | ❌ | OS-inspired tiers | ❌ | ❌ | Episodic + semantic |
+| **Version / supersession** | ✅ Jaccard-based | Passive | ❌ | ❌ | ❌ | ❌ |
+| **Real-time viewer** | ✅ Port 3113 | Cloud dashboard | Cloud dashboard | Web UI | ❌ | ❌ |
+| **Privacy filtering** | ✅ Strips secrets pre-store | ❌ | ❌ | ❌ | ❌ | ❌ |
+| **Obsidian export** | ✅ Built-in | ❌ | ❌ | Native format | ❌ | ❌ |
+| **Cross-agent** | ✅ MCP + REST | API calls | Within runtime | Standalone | Claude-only | Multi-agent shared |
+| **Audit trail** | ✅ All mutations logged | ❌ | Limited | ❌ | ❌ | ❌ |
+| **Language SDKs** | Any (REST + MCP) | Python + TS | Python only | API | Any (MCP) | Node |
+
+---
+
+## Token Efficiency
+
+The main reason to use persistent memory at all: token cost. Here's what one year of heavy agent use looks like across approaches.
+
+| Approach | Tokens / year | Cost / year | Notes |
+|---|---|---|---|
+| Paste full history into context | 19.5M+ | Impossible | Exceeds context window after ~200 observations |
+| LLM-summarized memory (extraction-based) | ~650K | ~$500 | Lossy — summarization drops detail |
+| **agentmemory (API embeddings)** | **~170K** | **~$10** | Token-budgeted, only relevant memories injected |
+| **agentmemory (local embeddings)** | **~170K** | **$0** | `all-MiniLM-L6-v2` runs in-process |
+| claude-mem | Reports ~10x savings | — | SQLite + FTS5 + 3-layer filter |
+| Mem0 | Varies by integration | — | Extraction-based, no token budget |
+
+**agentmemory ships with a built-in token savings calculator.** Run `npx @agentmemory/agentmemory status` after a few sessions and you'll see exactly how many tokens you've saved vs. pasting the full history.
+
+---
+
+## What Each Tool Is Best At
+
+This isn't a "agentmemory wins everything" page. Different tools solve different problems.
+
+**Choose agentmemory if you want:**
+- Automatic capture with zero manual `add()` calls
+- MCP server that works across Claude Code, Cursor, Codex, Gemini CLI, etc.
+- Hybrid BM25 + vector + graph search
+- Real-time viewer to see what your agent is learning
+- Self-hostable with zero external databases
+- Privacy filtering on API keys and secrets
+- Multi-agent coordination (leases, signals, routines)
+
+**Choose Mem0 if you want:**
+- Framework-agnostic API to bolt onto an existing agent
+- Managed cloud option with a dashboard
+- Python + TypeScript SDKs for direct integration
+- Entity/relationship extraction as the primary abstraction
+
+**Choose Letta/MemGPT if you want:**
+- A full agent runtime, not just memory
+- OS-inspired memory tiers (core/archival/recall)
+- Agents that self-edit their memory via function calls
+- Long-running conversational agents (weeks/months)
+
+**Choose Khoj if you want:**
+- A personal AI second brain, not agent infrastructure
+- Document-first search over your files and the web
+- Obsidian/Notion/Emacs integrations
+- Scheduled automations and research tasks
+
+**Choose claude-mem if you want:**
+- Claude Code-specific tooling with SQLite + FTS5
+- Minimal install footprint
+- Token compression via LLM
+
+**Choose Hippo if you want:**
+- Biologically-inspired memory model (decay, consolidation, sleep)
+- Multi-agent shared memory as a primary feature
+- "Forget by default, earn persistence through use" philosophy
+
+---
+
+## Running Your Own Benchmarks
+
+We encourage you to measure this yourself rather than trust any README. Here's how:
+
+```bash
+# Clone the repo
+git clone https://github.com/rohitg00/agentmemory.git
+cd agentmemory && npm install
+
+# Run LongMemEval-S
+npm run bench:longmemeval
+
+# Run quality benchmark (240 observations, 20 queries)
+npm run bench:quality
+
+# Run scale benchmark
+npm run bench:scale
+
+# Run real embeddings benchmark
+npm run bench:real-embeddings
+```
+
+Results land in `benchmark/results/`. All scripts, datasets, and results are committed for reproducibility.
+
+---
+
+## Corrections Welcome
+
+If you maintain one of these tools and we got a number wrong, please open an issue or PR. We'd rather have accurate numbers than convenient ones.
+
+If you want to add your tool to this comparison, open a PR with:
+1. A link to your benchmark methodology
+2. The metric and dataset you're measuring on
+3. A commit hash / version so we can reproduce
+
+**Sources:**
+- Mem0 LoCoMo benchmark: [mem0.ai blog](https://mem0.ai)
+- Letta LoCoMo benchmark: [letta.com/blog/benchmarking-ai-agent-memory](https://letta.com/blog/benchmarking-ai-agent-memory)
+- LongMemEval paper: [arxiv.org/abs/2410.10813](https://arxiv.org/abs/2410.10813)
+- LoCoMo paper: [snap-stanford.github.io/LoCoMo](https://snap-stanford.github.io/LoCoMo/)
diff --git a/integrations/openclaw/README.md b/integrations/openclaw/README.md
@@ -0,0 +1,122 @@
+# agentmemory for OpenClaw
+
+Persistent cross-session memory for [OpenClaw](https://github.com/openclaw/openclaw) via agentmemory. Gives every OpenClaw agent a searchable long-term memory with 95.2% retrieval accuracy on [LongMemEval-S](https://arxiv.org/abs/2410.10813).
+
+## Why you want this
+
+OpenClaw agents restart fresh every session. You waste tokens re-explaining architecture, re-discovering bugs, re-teaching preferences. agentmemory captures every tool use automatically and injects relevant context when the next session starts.
+
+- **92% fewer tokens** per session vs full-context pasting
+- **12 auto-capture hooks** — zero manual `memory.add()` calls
+- **MCP-native** — same server works for Claude Code, Cursor, Gemini CLI, Hermes, and OpenClaw at the same time
+- **Self-hosted** — no external database, no cloud, no API key needed for embeddings
+
+## Quick setup
+
+### Option 1: MCP server (zero code)
+
+Start the agentmemory server in a separate terminal:
+
+```bash
+npx @agentmemory/agentmemory
+```
+
+Then add to your OpenClaw MCP config:
+
+```json
+{
+  "mcpServers": {
+    "agentmemory": {
+      "command": "npx",
+      "args": ["agentmemory-mcp"]
+    }
+  }
+}
+```
+
+OpenClaw now has access to all 43 MCP tools including `memory_recall`, `memory_save`, `memory_smart_search`, `memory_timeline`, `memory_profile`, and more.
+
+### Option 2: Gateway plugin (deeper integration)
+
+If you're running an OpenClaw gateway, drop this folder into your gateway's plugins directory:
+
+```bash
+cp -r integrations/openclaw ~/.openclaw/plugins/memory/agentmemory
+```
+
+Start the agentmemory server:
+
+```bash
+npx @agentmemory/agentmemory
+```
+
+The plugin auto-detects the running server and hooks into the OpenClaw agent loop:
+
+- `onSessionStart` starts a new session on the agentmemory server and injects any returned context
+- `onPreLlmCall` injects token-budgeted memories before each LLM call (BM25 + vector + graph fusion)
+- `onPostToolUse` records every tool use, error, and decision after execution
+- `onSessionEnd` marks the session complete so raw observations can be compressed into structured memory
+
+Configure via `~/.openclaw/plugins/memory/agentmemory/config.yaml`:
+
+```yaml
+enabled: true
+base_url: http://localhost:3111
+token_budget: 2000
+min_confidence: 0.5
+```
+
+## What your agent gets
+
+### Automatic context injection
+
+When a session starts, agentmemory injects ~1,900 tokens of the most relevant past context:
+
+```text
+Project profile:
+  - Auth uses JWT middleware in src/middleware/auth.ts (jose, not jsonwebtoken)
+  - Tests in test/auth.test.ts cover token validation
+  - Database uses Prisma with include{} to avoid N+1 queries
+  - Rate limiting: 100 req/min default, Redis for prod
+
+Recent decisions:
+  - Chose jose over jsonwebtoken for Edge compatibility (2026-03-15)
+  - N+1 fix dropped query time 450ms → 28ms (2026-03-20)
+```
+
+### Semantic search across sessions
+
+Ask "what was that fix for slow user queries?" and the agent finds the Prisma include{} decision from three weeks ago. BM25 + vector + knowledge graph fusion.
+
+### Privacy filtering
+
+Every captured observation is scanned for API keys, secrets, bearer tokens, and `<private>` tags. These are stripped before storage. Modern token formats supported: `sk-`, `sk-proj-`, `ghp_/ghs_/ghu_`, AWS keys, and more.
+
+### Multi-agent coordination
+
+If you're running multiple OpenClaw agents on the same codebase:
+
+- **Leases** give one agent exclusive claim on an action so they don't stomp each other
+- **Signals** let agents send threaded messages to each other with read receipts
+- **Mesh sync** shares memory between agentmemory instances (requires `AGENTMEMORY_SECRET`)
+
+## Troubleshooting
+
+**"Connection refused on port 3111"** — The agentmemory server isn't running. Start it with `npx @agentmemory/agentmemory` in a separate terminal.
+
+**"No memories returned"** — Check `http://localhost:3113` (the real-time viewer). If there are no observations, the hooks aren't firing. Make sure your OpenClaw plugin is loaded and enabled.
+
+**"Search returns irrelevant results"** — Install local embeddings: `npm install @xenova/transformers`. This enables vector search for +8pp recall over BM25-only.
+
+**"I want to see what agentmemory is learning"** — Open `http://localhost:3113` in a browser. Live observation stream, session explorer, memory graph, and health dashboard.
+
+## See also
+
+- [agentmemory main README](../../README.md)
+- [Benchmark results](../../benchmark/LONGMEMEVAL.md) — 95.2% R@5 on LongMemEval-S
+- [Competitor comparison](../../benchmark/COMPARISON.md) — vs mem0, Letta, Khoj, claude-mem, Hippo
+- [Hermes integration](../hermes/README.md) — same server also works with Hermes Agent
+
+## License
+
+Apache-2.0 (same as agentmemory)