From e602f6e58aa93548aa4a1511e033bc5b2f56ea6e Mon Sep 17 00:00:00 2001
From: Rohit Ghumare
Date: Thu, 9 Apr 2026 13:05:55 +0100
Subject: [PATCH 1/2] =?UTF-8?q?docs:=20README=20polish=20=E2=80=94=20badge?=
=?UTF-8?q?s,=20cost=20table,=20flow=20diagram,=20methodology=20note?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
- Add npm version, CI status, license, and stars badges
- Add stats table below intro (95.2% R@5, 92% fewer tokens, 43 tools, 12 hooks, 0 deps)
- Add cost comparison table ($500/yr extraction-based vs $10/yr agentmemory vs $0 local)
- Add memory flow diagram (observe → compress → index → inject)
- Add methodology transparency note on benchmark section
- Update nav links (add Benchmarks, shorten labels)
---
README.md | 49 +++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 45 insertions(+), 4 deletions(-)
diff --git a/README.md b/README.md
index 0a14e0b..28136e9 100644
--- a/README.md
+++ b/README.md
@@ -7,6 +7,13 @@
Persistent memory for Claude Code, Cursor, Gemini CLI, OpenCode, and any MCP client.
+
+
+
+
+
+
+
@@ -14,13 +21,12 @@
Quick Start •
Why •
- Agents •
+ Benchmarks •
How It Works •
Search •
- Memory Evolution •
MCP •
Viewer •
- Configuration •
+ Config •
API
@@ -30,7 +36,13 @@ You explain the same architecture every session. You re-discover the same bugs.
**What changes:** Session 1 you set up JWT auth. Session 2 you ask for rate limiting — the agent already knows your auth uses jose middleware in `src/middleware/auth.ts`, your tests cover token validation, and you chose jose over jsonwebtoken for Edge compatibility. No re-explaining. No copy-pasting. The agent just *knows*.
-**95.2% retrieval accuracy** on [LongMemEval](https://arxiv.org/abs/2410.10813) (ICLR 2025). 43 MCP tools. 12 hooks. Real-time viewer. Works with Claude Code, Cursor, Gemini CLI, OpenCode, and any MCP client. 646 tests. Zero external DB dependencies.
+| | |
+|---|---|
+| **95.2% R@5** | [LongMemEval](https://arxiv.org/abs/2410.10813) (ICLR 2025) retrieval accuracy |
+| **92% fewer tokens** | ~1,900 injected vs ~19,000 full context ($10/yr vs $500+/yr) |
+| **43 MCP tools** | Search, remember, forget, actions, leases, signals, mesh sync |
+| **12 hooks** | Captures every tool use automatically — zero manual effort |
+| **0 external deps** | No Postgres, no Redis, no vector DB. Just iii-engine (auto-installed) |
```bash
npx @agentmemory/agentmemory # installs iii-engine if missing, starts everything
@@ -128,6 +140,33 @@ agentmemory is the searchable database behind the sticky notes.
| Knowledge graph | No | Entity extraction + temporal versioning |
| Observability | Read files manually | Real-time viewer on :3113 |
+### What it costs (spoiler: almost nothing)
+
+| Approach | Tokens/year | Annual cost | Notes |
+|---|---|---|---|
+| Paste full history into context | 19.5M+ | Impossible | Exceeds context window after ~200 observations |
+| LLM-summarized memory (extraction-based) | ~650K | ~$500/yr | Loses context, summarization is lossy |
+| **agentmemory context injection** | **~170K** | **~$10/yr** | Token-budgeted, only relevant memories injected |
+| agentmemory with local embeddings | ~170K | **$0** | all-MiniLM-L6-v2 runs locally, no API calls |
+
+### How memory flows
+
+```text
+PostToolUse hook fires
+ -> SHA-256 dedup (5min window)
+ -> Privacy filter (strip secrets, API keys)
+ -> Store raw observation
+ -> LLM compress -> structured facts + concepts + narrative
+ -> Generate vector embedding
+ -> Index in BM25 + vector + knowledge graph
+
+SessionStart hook fires
+ -> Load project profile (top concepts, files, patterns)
+ -> Hybrid search (BM25 + vector + graph) for recent context
+ -> Apply token budget (default: 2000 tokens)
+ -> Inject into conversation via stdout
+```
+
### Benchmarks (measured, not projected)
#### LongMemEval-S (ICLR 2025, 500 questions)
@@ -151,6 +190,8 @@ These are retrieval recall scores (not end-to-end QA accuracy). Embedding model:
agentmemory finds "N+1 query fix" when you search "database performance optimization" — something keyword matching literally cannot do.
+> **Methodology note:** All LongMemEval numbers are retrieval recall (`recall_any@K`), not end-to-end QA accuracy. We clearly distinguish these because the LongMemEval leaderboard measures QA accuracy (retrieve + generate + judge). No hyperparameters were tuned on the test set. Full scripts and results are committed and reproducible.
+
Full benchmark reports: [`benchmark/LONGMEMEVAL.md`](benchmark/LONGMEMEVAL.md), [`benchmark/QUALITY.md`](benchmark/QUALITY.md), [`benchmark/SCALE.md`](benchmark/SCALE.md), [`benchmark/REAL-EMBEDDINGS.md`](benchmark/REAL-EMBEDDINGS.md)
## Supported Agents
From 6fd22d3b1d313839f38a3425c5429a09b4aa04ce Mon Sep 17 00:00:00 2001
From: Rohit Ghumare
Date: Thu, 9 Apr 2026 13:08:05 +0100
Subject: [PATCH 2/2] fix: remove emdashes and AI-sloppy phrasing from README
---
README.md | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/README.md b/README.md
index 28136e9..1aa7c4c 100644
--- a/README.md
+++ b/README.md
@@ -32,16 +32,16 @@
---
-You explain the same architecture every session. You re-discover the same bugs. You re-teach the same preferences. Built-in memory (CLAUDE.md, .cursorrules) caps out at 200 lines and goes stale. agentmemory fixes this — it silently captures what your agent does, compresses it into searchable memory, and injects the right context when the next session starts. One command. Works across agents.
+You explain the same architecture every session. You re-discover the same bugs. You re-teach the same preferences. Built-in memory (CLAUDE.md, .cursorrules) caps out at 200 lines and goes stale. agentmemory fixes this. It silently captures what your agent does, compresses it into searchable memory, and injects the right context when the next session starts. One command. Works across agents.
-**What changes:** Session 1 you set up JWT auth. Session 2 you ask for rate limiting — the agent already knows your auth uses jose middleware in `src/middleware/auth.ts`, your tests cover token validation, and you chose jose over jsonwebtoken for Edge compatibility. No re-explaining. No copy-pasting. The agent just *knows*.
+**What changes:** Session 1 you set up JWT auth. Session 2 you ask for rate limiting. The agent already knows your auth uses jose middleware in `src/middleware/auth.ts`, your tests cover token validation, and you chose jose over jsonwebtoken for Edge compatibility. No re-explaining. No copy-pasting. The agent just *knows*.
| | |
|---|---|
| **95.2% R@5** | [LongMemEval](https://arxiv.org/abs/2410.10813) (ICLR 2025) retrieval accuracy |
| **92% fewer tokens** | ~1,900 injected vs ~19,000 full context ($10/yr vs $500+/yr) |
| **43 MCP tools** | Search, remember, forget, actions, leases, signals, mesh sync |
-| **12 hooks** | Captures every tool use automatically — zero manual effort |
+| **12 hooks** | Captures every tool use automatically, zero manual effort |
| **0 external deps** | No Postgres, no Redis, no vector DB. Just iii-engine (auto-installed) |
```bash
@@ -52,7 +52,7 @@ npx @agentmemory/agentmemory # installs iii-engine if missing, starts everythi
## Quick Start
-### Claude Code (paste this — the agent does the rest)
+### Claude Code (paste this, the agent does the rest)
```
Install agentmemory: run `npx @agentmemory/agentmemory` in a separate terminal to start the memory server. Then run `/plugin marketplace add rohitg00/agentmemory` and `/plugin install agentmemory` to register all 12 hooks, 4 skills, and 43 MCP tools. Verify with `curl http://localhost:3111/agentmemory/health`. The real-time viewer is at http://localhost:3113.
@@ -119,7 +119,7 @@ Session 2: "Now add rate limiting"
### How it compares to built-in agent memory
-Every AI coding agent now ships with built-in memory — Claude Code has `MEMORY.md`, Cursor has notepads, Windsurf has Cascade memories, Cline has memory bank. These work like sticky notes: fast, always-on, but fundamentally limited.
+Every AI coding agent now ships with built-in memory. Claude Code has `MEMORY.md`, Cursor has notepads, Cline has memory bank. These work like sticky notes: fast, always-on, but fundamentally limited.
agentmemory is the searchable database behind the sticky notes.
@@ -188,7 +188,7 @@ These are retrieval recall scores (not end-to-end QA accuracy). Embedding model:
| agentmemory BM25 (stemmed + synonyms) | 55.9% | 82.7% | 95.5% | 1,571 |
| agentmemory + Xenova embeddings | **64.1%** | **94.9%** | **100.0%** | **1,571** |
-agentmemory finds "N+1 query fix" when you search "database performance optimization" — something keyword matching literally cannot do.
+agentmemory finds "N+1 query fix" when you search "database performance optimization". Keyword matching can't do this.
> **Methodology note:** All LongMemEval numbers are retrieval recall (`recall_any@K`), not end-to-end QA accuracy. We clearly distinguish these because the LongMemEval leaderboard measures QA accuracy (retrieve + generate + judge). No hyperparameters were tuned on the test set. Full scripts and results are committed and reproducible.
@@ -252,7 +252,7 @@ npm install && npm run build && npm start
## First Steps After Install
-Once hooks are installed, memory builds silently. No action needed — just use your agent normally.
+Once hooks are installed, memory builds silently. No action needed. Just use your agent normally.
### Session 1: Your agent works as usual
@@ -416,7 +416,7 @@ agentmemory automatically cleans itself:
| Mechanism | What it does |
|---|---|
| **TTL expiry** | Memories with `forgetAfter` date are deleted when expired |
-| **Contradiction detection** | Near-duplicate memories (Jaccard > 0.9) — older one is demoted |
+| **Contradiction detection** | Near-duplicate memories (Jaccard > 0.9), older one is demoted |
| **Low-value eviction** | Observations older than 90 days with importance < 3 are removed |
| **Per-project cap** | Projects are capped at 10,000 observations (lowest importance evicted first) |