From 360bdcc9981afe526ad0009b76899cd05218e8c3 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]" <github-actions[bot]@users.noreply.github.com>
Date: Wed, 25 Feb 2026 23:19:55 -0700
Subject: [PATCH 1/8] =?UTF-8?q?feat:=20add=20codegraph=20path=20for=20A?=
 =?UTF-8?q?=E2=86=92B=20symbol=20pathfinding?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add `codegraph path <from> <to>` — BFS shortest-path search on the
call graph. Given two symbol names, finds the shortest call chain
with hop count, intermediate nodes, edge kinds, and alternate path
count. Supports --reverse, --max-depth, --kinds, --from-file/--to-file,
-T, -j, -k flags. Exposed as symbol_path MCP tool.

Impact: 4 functions changed, 3 affected
---
 .claude/skills/dogfood/SKILL.md      |   2 +-
 README.md                            |   8 +-
 docs/examples/CLI.md                 |  54 ++++++
 docs/examples/MCP.md                 |  68 ++++++++
 docs/guides/ai-agent-guide.md        |  18 ++
 docs/guides/recommended-practices.md |   2 +-
 src/cli.js                           |  31 ++++
 src/index.js                         |   1 +
 src/mcp.js                           |  39 +++++
 src/queries.js                       | 249 +++++++++++++++++++++++++++
 tests/integration/cli.test.js        |   9 +
 tests/integration/queries.test.js    |  91 ++++++++++
 tests/unit/mcp.test.js               |  30 ++++
 13 files changed, 598 insertions(+), 4 deletions(-)
diff --git a/.claude/skills/dogfood/SKILL.md b/.claude/skills/dogfood/SKILL.md
index 941cc797..c740849c 100644
--- a/.claude/skills/dogfood/SKILL.md
+++ b/.claude/skills/dogfood/SKILL.md
@@ -203,7 +203,7 @@ Before writing the report, **stop and think** about:
 
 - What testing approaches am I missing?
 - **Cross-command pipelines:** Have I tested `build` → `embed` → `search` → modify → `build` → `search`? Have I tested `watch` detecting changes then `diff-impact`?
-- **MCP server:** Have I tested the `mcp` command? Initialize via JSON-RPC on stdin, send `tools/list`, verify all 17 tools are present. Test single-repo mode (default — `list_repos` should be absent, no `repo` parameter on tools) vs `--multi-repo` mode.
+- **MCP server:** Have I tested the `mcp` command? Initialize via JSON-RPC on stdin, send `tools/list`, verify all 18 tools are present. Test single-repo mode (default — `list_repos` should be absent, no `repo` parameter on tools) vs `--multi-repo` mode.
 - **Programmatic API:** Have I tested `require('@optave/codegraph')` or `import` from `index.js`? Key exports to verify: `buildGraph`, `loadConfig`, `openDb`, `findDbPath`, `contextData`, `explainData`, `whereData`, `fnDepsData`, `diffImpactData`, `statsData`, `isNativeAvailable`, `EXTENSIONS`, `IGNORE_DIRS`, `ALL_SYMBOL_KINDS`, `MODELS`.
 - **Config options:** Have I tested `.codegraphrc.json`? Create one with `include`/`exclude` patterns, custom `aliases`, `build.incremental: false`, `query.defaultDepth`, `search.defaultMinScore`. Verify overrides work.
 - **Env var overrides:** `CODEGRAPH_LLM_PROVIDER`, `CODEGRAPH_LLM_API_KEY`, `CODEGRAPH_LLM_MODEL`, `CODEGRAPH_REGISTRY_PATH`.
diff --git a/README.md b/README.md
index 7eb524a8..8e56c2dc 100644
--- a/README.md
+++ b/README.md
@@ -158,7 +158,7 @@ Full agent setup: [AI Agent Guide](docs/guides/ai-agent-guide.md) &middot; [CLAU
 | 🔍 | **Symbol search** | Find any function, class, or method by name — exact match priority, relevance scoring, `--file` and `--kind` filters |
 | 📁 | **File dependencies** | See what a file imports and what imports it |
 | 💥 | **Impact analysis** | Trace every file affected by a change (transitive) |
-| 🧬 | **Function-level tracing** | Call chains, caller trees, and function-level impact with qualified call resolution |
+| 🧬 | **Function-level tracing** | Call chains, caller trees, function-level impact, and A→B pathfinding with qualified call resolution |
 | 🎯 | **Deep context** | `context` gives AI agents source, deps, callers, signature, and tests for a function in one call; `explain` gives structural summaries of files or functions |
 | 📍 | **Fast lookup** | `where` shows exactly where a symbol is defined and used — minimal, fast |
 | 📊 | **Diff impact** | Parse `git diff`, find overlapping functions, trace their callers |
@@ -217,6 +217,9 @@ codegraph impact <file>        # Transitive reverse dependency trace
 codegraph fn <name>            # Function-level: callers, callees, call chain
 codegraph fn <name> --no-tests --depth 5
 codegraph fn-impact <name>     # What functions break if this one changes
+codegraph path <from> <to>     # Shortest path between two symbols (A calls...calls B)
+codegraph path <from> <to> --reverse  # Follow edges backward
+codegraph path <from> <to> --max-depth 5 --kinds calls,imports
 codegraph diff-impact          # Impact of unstaged git changes
 codegraph diff-impact --staged # Impact of staged changes
 codegraph diff-impact HEAD~3   # Impact vs a specific ref
@@ -316,7 +319,7 @@ codegraph registry remove <name>  # Unregister
 | Flag | Description |
 |---|---|
 | `-d, --db <path>` | Custom path to `graph.db` |
-| `-T, --no-tests` | Exclude `.test.`, `.spec.`, `__test__` files (available on `fn`, `fn-impact`, `context`, `explain`, `where`, `diff-impact`, `search`, `map`, `hotspots`, `deps`, `impact`) |
+| `-T, --no-tests` | Exclude `.test.`, `.spec.`, `__test__` files (available on `fn`, `fn-impact`, `path`, `context`, `explain`, `where`, `diff-impact`, `search`, `map`, `hotspots`, `deps`, `impact`) |
 | `--depth <n>` | Transitive trace depth (default varies by command) |
 | `-j, --json` | Output as JSON |
 | `-v, --verbose` | Enable debug output |
@@ -462,6 +465,7 @@ This project uses codegraph. The database is at `.codegraph/graph.db`.
 - `codegraph build .` — rebuild the graph (incremental by default)
 - `codegraph map` — module overview
 - `codegraph fn <name> -T` — function call chain
+- `codegraph path <from> <to> -T` — shortest call path between two symbols
 - `codegraph deps <file>` — file-level dependencies
 - `codegraph search "<query>"` — semantic search (requires `codegraph embed`)
 - `codegraph cycles` — check for circular dependencies
diff --git a/docs/examples/CLI.md b/docs/examples/CLI.md
index 7f4ec6c0..6c3a0f8a 100644
--- a/docs/examples/CLI.md
+++ b/docs/examples/CLI.md
@@ -286,6 +286,60 @@ Function impact: f buildGraph -- src/builder.js:335
 
 ---
 
+## path — Shortest path between two symbols
+
+Find how symbol A reaches symbol B through the call graph:
+
+```bash
+codegraph path buildGraph openDb -T
+```
+
+```
+Path from buildGraph to openDb (1 hop):
+
+  f buildGraph (function) -- src/builder.js:335
+    --[calls]--> f openDb (function) -- src/db.js:76
+```
+
+Multi-hop paths show each intermediate step:
+
+```bash
+codegraph path resolveNoTests openDb -T
+```
+
+```
+Path from resolveNoTests to openDb (2 hops):
+
+  f resolveNoTests (function) -- src/cli.js:59
+    --[calls]--> f buildGraph (function) -- src/builder.js:335
+      --[calls]--> f openDb (function) -- src/db.js:76
+```
+
+Reverse direction — follow edges backward (B is called by... called by A):
+
+```bash
+codegraph path openDb buildGraph -T --reverse
+```
+
+```
+Path from openDb to buildGraph (1 hop) (reverse):
+
+  f openDb (function) -- src/db.js:76
+    --[calls]--> f buildGraph (function) -- src/builder.js:335
+```
+
+When no path exists:
+
+```bash
+codegraph path openDb buildGraph -T
+```
+
+```
+No path from "openDb" to "buildGraph" within 10 hops.
+```
+
+---
+
 ## impact — File-level transitive dependents
 
 ```bash
diff --git a/docs/examples/MCP.md b/docs/examples/MCP.md
index 5d9e0f54..8941e30b 100644
--- a/docs/examples/MCP.md
+++ b/docs/examples/MCP.md
@@ -247,6 +247,74 @@ Function impact: f buildGraph -- src/builder.js:335
 
 ---
 
+## symbol_path — Shortest path between two symbols
+
+Find how one function reaches another through the call graph.
+
+```json
+{
+  "tool": "symbol_path",
+  "arguments": { "from": "resolveNoTests", "to": "openDb", "no_tests": true }
+}
+```
+
+```json
+{
+  "from": "resolveNoTests",
+  "to": "openDb",
+  "found": true,
+  "hops": 2,
+  "path": [
+    { "name": "resolveNoTests", "kind": "function", "file": "src/cli.js", "line": 59, "edgeKind": null },
+    { "name": "buildGraph", "kind": "function", "file": "src/builder.js", "line": 335, "edgeKind": "calls" },
+    { "name": "openDb", "kind": "function", "file": "src/db.js", "line": 76, "edgeKind": "calls" }
+  ],
+  "alternateCount": 0,
+  "edgeKinds": ["calls"],
+  "reverse": false,
+  "maxDepth": 10
+}
+```
+
+Reverse direction — follow edges backward:
+
+```json
+{
+  "tool": "symbol_path",
+  "arguments": { "from": "openDb", "to": "buildGraph", "reverse": true, "no_tests": true }
+}
+```
+
+```json
+{
+  "from": "openDb",
+  "to": "buildGraph",
+  "found": true,
+  "hops": 1,
+  "path": [
+    { "name": "openDb", "kind": "function", "file": "src/db.js", "line": 76, "edgeKind": null },
+    { "name": "buildGraph", "kind": "function", "file": "src/builder.js", "line": 335, "edgeKind": "calls" }
+  ],
+  "alternateCount": 0,
+  "reverse": true
+}
+```
+
+When no path exists, `found` is `false` and the path is empty:
+
+```json
+{
+  "from": "openDb",
+  "to": "buildGraph",
+  "found": false,
+  "hops": null,
+  "path": [],
+  "alternateCount": 0
+}
+```
+
+---
+
 ## impact_analysis — File-level transitive dependents
 
 ```json
diff --git a/docs/guides/ai-agent-guide.md b/docs/guides/ai-agent-guide.md
index 4ea79c59..ddb991de 100644
--- a/docs/guides/ai-agent-guide.md
+++ b/docs/guides/ai-agent-guide.md
@@ -230,6 +230,23 @@ codegraph fn-impact resolve --file resolve.js --depth 3
 | **When to use** | Before modifying a function — know who depends on it |
 | **Output** | Affected functions at each depth level, total count |
 
+#### `path` — Shortest path between two symbols
+
+Find how symbol A reaches symbol B through the call graph.
+
+```bash
+codegraph path buildGraph openDb -T           # Forward: A calls...calls B
+codegraph path validateToken handleRoute --reverse  # Backward: B is called by...A
+codegraph path parseConfig loadFile --max-depth 5
+```
+
+| | |
+|---|---|
+| **MCP tool** | `symbol_path` |
+| **Key flags** | `--max-depth <n>` (default: 10), `--kinds <kinds>` (default: calls), `--reverse`, `--from-file`, `--to-file`, `-k, --kind`, `-T` (no tests), `-j` (JSON) |
+| **When to use** | Understanding how two functions are connected through the call chain |
+| **Output** | Ordered path with edge kinds, hop count, alternate path count |
+
 #### `impact` — File-level transitive impact
 
 Show all files that transitively depend on a given file.
@@ -475,6 +492,7 @@ codegraph mcp --repos "myapp,lib"      # Restricted repo list
 | `module_map` | `map` | Most-connected files overview |
 | `fn_deps` | `fn <name>` | Function-level call chain |
 | `fn_impact` | `fn-impact <name>` | Function-level blast radius |
+| `symbol_path` | `path <from> <to>` | Shortest path between two symbols |
 | `context` | `context <name>` | Full function context |
 | `explain` | `explain <target>` | Structural summary |
 | `where` | `where <name>` | Symbol definition and usage |
diff --git a/docs/guides/recommended-practices.md b/docs/guides/recommended-practices.md
index f261df20..8ae3cb51 100644
--- a/docs/guides/recommended-practices.md
+++ b/docs/guides/recommended-practices.md
@@ -143,7 +143,7 @@ By default, the MCP server runs in **single-repo mode** — the AI agent can onl
 
 Enable `--multi-repo` to let the agent query any registered repository, or use `--repos` to restrict access to a specific set of repos.
 
-The server exposes 17 tools: `query_function`, `file_deps`, `impact_analysis`, `find_cycles`, `module_map`, `fn_deps`, `fn_impact`, `context`, `explain`, `where`, `diff_impact`, `semantic_search`, `export_graph`, `list_functions`, `structure`, `hotspots`, and `list_repos` (multi-repo only). See the [AI Agent Guide MCP reference](./ai-agent-guide.md#mcp-server-reference) for the full tool-to-CLI mapping table.
+The server exposes 18 tools: `query_function`, `file_deps`, `impact_analysis`, `find_cycles`, `module_map`, `fn_deps`, `fn_impact`, `symbol_path`, `context`, `explain`, `where`, `diff_impact`, `semantic_search`, `export_graph`, `list_functions`, `structure`, `hotspots`, and `list_repos` (multi-repo only). See the [AI Agent Guide MCP reference](./ai-agent-guide.md#mcp-server-reference) for the full tool-to-CLI mapping table.
 
 ### CLAUDE.md for your project
 
diff --git a/src/cli.js b/src/cli.js
index 11e97347..b76a1a91 100644
--- a/src/cli.js
+++ b/src/cli.js
@@ -29,6 +29,7 @@ import {
   queryName,
   roles,
   stats,
+  symbolPath,
   VALID_ROLES,
   where,
 } from './queries.js';
@@ -199,6 +200,36 @@ program
     });
   });
 
+program
+  .command('path <from> <to>')
+  .description('Find shortest path between two symbols (A calls...calls B)')
+  .option('-d, --db <path>', 'Path to graph.db')
+  .option('--max-depth <n>', 'Maximum BFS depth', '10')
+  .option('--kinds <kinds>', 'Comma-separated edge kinds to follow (default: calls)')
+  .option('--reverse', 'Follow edges backward (B is called by...called by A)')
+  .option('--from-file <path>', 'Disambiguate source symbol by file (partial match)')
+  .option('--to-file <path>', 'Disambiguate target symbol by file (partial match)')
+  .option('-k, --kind <kind>', 'Filter both symbols by kind')
+  .option('-T, --no-tests', 'Exclude test/spec files from results')
+  .option('--include-tests', 'Include test/spec files (overrides excludeTests config)')
+  .option('-j, --json', 'Output as JSON')
+  .action((from, to, opts) => {
+    if (opts.kind && !ALL_SYMBOL_KINDS.includes(opts.kind)) {
+      console.error(`Invalid kind "${opts.kind}". Valid: ${ALL_SYMBOL_KINDS.join(', ')}`);
+      process.exit(1);
+    }
+    symbolPath(from, to, opts.db, {
+      maxDepth: parseInt(opts.maxDepth, 10),
+      edgeKinds: opts.kinds ? opts.kinds.split(',').map((s) => s.trim()) : undefined,
+      reverse: opts.reverse,
+      fromFile: opts.fromFile,
+      toFile: opts.toFile,
+      kind: opts.kind,
+      noTests: resolveNoTests(opts),
+      json: opts.json,
+    });
+  });
+
 program
   .command('context <name>')
   .description('Full context for a function: source, deps, callers, tests, signature')
diff --git a/src/index.js b/src/index.js
index b77c99be..3b4b4d92 100644
--- a/src/index.js
+++ b/src/index.js
@@ -71,6 +71,7 @@ export {
   impactAnalysisData,
   kindIcon,
   moduleMapData,
+  pathData,
   queryNameData,
   rolesData,
   statsData,
diff --git a/src/mcp.js b/src/mcp.js
index 31efc8e0..e5e3f1fc 100644
--- a/src/mcp.js
+++ b/src/mcp.js
@@ -123,6 +123,33 @@ const BASE_TOOLS = [
       required: ['name'],
     },
   },
+  {
+    name: 'symbol_path',
+    description: 'Find the shortest path between two symbols in the call graph (A calls...calls B)',
+    inputSchema: {
+      type: 'object',
+      properties: {
+        from: { type: 'string', description: 'Source symbol name (partial match)' },
+        to: { type: 'string', description: 'Target symbol name (partial match)' },
+        max_depth: { type: 'number', description: 'Maximum BFS depth', default: 10 },
+        edge_kinds: {
+          type: 'array',
+          items: { type: 'string' },
+          description: 'Edge kinds to follow (default: ["calls"])',
+        },
+        reverse: { type: 'boolean', description: 'Follow edges backward', default: false },
+        from_file: { type: 'string', description: 'Disambiguate source by file (partial match)' },
+        to_file: { type: 'string', description: 'Disambiguate target by file (partial match)' },
+        kind: {
+          type: 'string',
+          enum: ALL_SYMBOL_KINDS,
+          description: 'Filter both symbols by kind',
+        },
+        no_tests: { type: 'boolean', description: 'Exclude test files', default: false },
+      },
+      required: ['from', 'to'],
+    },
+  },
   {
     name: 'context',
     description:
@@ -448,6 +475,7 @@ export async function startMCPServer(customDbPath, options = {}) {
     fileDepsData,
     fnDepsData,
     fnImpactData,
+    pathData,
     contextData,
     explainData,
     whereData,
@@ -534,6 +562,17 @@ export async function startMCPServer(customDbPath, options = {}) {
             noTests: args.no_tests,
           });
           break;
+        case 'symbol_path':
+          result = pathData(args.from, args.to, dbPath, {
+            maxDepth: args.max_depth,
+            edgeKinds: args.edge_kinds,
+            reverse: args.reverse,
+            fromFile: args.from_file,
+            toFile: args.to_file,
+            kind: args.kind,
+            noTests: args.no_tests,
+          });
+          break;
         case 'context':
           result = contextData(args.name, dbPath, {
             depth: args.depth,
diff --git a/src/queries.js b/src/queries.js
index fbf346f2..dea8dc5d 100644
--- a/src/queries.js
+++ b/src/queries.js
@@ -565,6 +565,255 @@ export function fnImpactData(name, customDbPath, opts = {}) {
   return { name, results };
 }
 
+export function pathData(from, to, customDbPath, opts = {}) {
+  const db = openReadonlyOrFail(customDbPath);
+  const noTests = opts.noTests || false;
+  const maxDepth = opts.maxDepth || 10;
+  const edgeKinds = opts.edgeKinds || ['calls'];
+  const reverse = opts.reverse || false;
+
+  const fromNodes = findMatchingNodes(db, from, {
+    noTests,
+    file: opts.fromFile,
+    kind: opts.kind,
+  });
+  if (fromNodes.length === 0) {
+    db.close();
+    return {
+      from,
+      to,
+      found: false,
+      error: `No symbol matching "${from}"`,
+      fromCandidates: [],
+      toCandidates: [],
+    };
+  }
+
+  const toNodes = findMatchingNodes(db, to, {
+    noTests,
+    file: opts.toFile,
+    kind: opts.kind,
+  });
+  if (toNodes.length === 0) {
+    db.close();
+    return {
+      from,
+      to,
+      found: false,
+      error: `No symbol matching "${to}"`,
+      fromCandidates: fromNodes
+        .slice(0, 5)
+        .map((n) => ({ name: n.name, kind: n.kind, file: n.file, line: n.line })),
+      toCandidates: [],
+    };
+  }
+
+  const sourceNode = fromNodes[0];
+  const targetNode = toNodes[0];
+
+  const fromCandidates = fromNodes
+    .slice(0, 5)
+    .map((n) => ({ name: n.name, kind: n.kind, file: n.file, line: n.line }));
+  const toCandidates = toNodes
+    .slice(0, 5)
+    .map((n) => ({ name: n.name, kind: n.kind, file: n.file, line: n.line }));
+
+  // Self-path
+  if (sourceNode.id === targetNode.id) {
+    db.close();
+    return {
+      from,
+      to,
+      fromCandidates,
+      toCandidates,
+      found: true,
+      hops: 0,
+      path: [
+        {
+          name: sourceNode.name,
+          kind: sourceNode.kind,
+          file: sourceNode.file,
+          line: sourceNode.line,
+          edgeKind: null,
+        },
+      ],
+      alternateCount: 0,
+      edgeKinds,
+      reverse,
+      maxDepth,
+    };
+  }
+
+  // Build edge kind filter
+  const kindPlaceholders = edgeKinds.map(() => '?').join(', ');
+
+  // BFS — direction depends on `reverse` flag
+  // Forward: source_id → target_id (A calls... calls B)
+  // Reverse: target_id → source_id (B is called by... called by A)
+  const neighborQuery = reverse
+    ? `SELECT n.id, n.name, n.kind, n.file, n.line, e.kind AS edge_kind
+       FROM edges e JOIN nodes n ON e.source_id = n.id
+       WHERE e.target_id = ? AND e.kind IN (${kindPlaceholders})`
+    : `SELECT n.id, n.name, n.kind, n.file, n.line, e.kind AS edge_kind
+       FROM edges e JOIN nodes n ON e.target_id = n.id
+       WHERE e.source_id = ? AND e.kind IN (${kindPlaceholders})`;
+  const neighborStmt = db.prepare(neighborQuery);
+
+  const visited = new Set([sourceNode.id]);
+  // parent map: nodeId → { parentId, edgeKind }
+  const parent = new Map();
+  let queue = [sourceNode.id];
+  let found = false;
+  let alternateCount = 0;
+  let foundDepth = -1;
+
+  for (let depth = 1; depth <= maxDepth; depth++) {
+    const nextQueue = [];
+    for (const currentId of queue) {
+      const neighbors = neighborStmt.all(currentId, ...edgeKinds);
+      for (const n of neighbors) {
+        if (noTests && isTestFile(n.file)) continue;
+        if (n.id === targetNode.id) {
+          if (!found) {
+            found = true;
+            foundDepth = depth;
+            parent.set(n.id, { parentId: currentId, edgeKind: n.edge_kind });
+          }
+          alternateCount++;
+          continue;
+        }
+        if (!visited.has(n.id)) {
+          visited.add(n.id);
+          parent.set(n.id, { parentId: currentId, edgeKind: n.edge_kind });
+          nextQueue.push(n.id);
+        }
+      }
+    }
+    if (found) break;
+    queue = nextQueue;
+    if (queue.length === 0) break;
+  }
+
+  if (!found) {
+    db.close();
+    return {
+      from,
+      to,
+      fromCandidates,
+      toCandidates,
+      found: false,
+      hops: null,
+      path: [],
+      alternateCount: 0,
+      edgeKinds,
+      reverse,
+      maxDepth,
+    };
+  }
+
+  // alternateCount includes the one we kept; subtract 1 for "alternates"
+  alternateCount = Math.max(0, alternateCount - 1);
+
+  // Reconstruct path from target back to source
+  const pathIds = [targetNode.id];
+  let cur = targetNode.id;
+  while (cur !== sourceNode.id) {
+    const p = parent.get(cur);
+    pathIds.push(p.parentId);
+    cur = p.parentId;
+  }
+  pathIds.reverse();
+
+  // Build path with node info
+  const nodeCache = new Map();
+  const getNode = (id) => {
+    if (nodeCache.has(id)) return nodeCache.get(id);
+    const row = db.prepare('SELECT name, kind, file, line FROM nodes WHERE id = ?').get(id);
+    nodeCache.set(id, row);
+    return row;
+  };
+
+  const resultPath = pathIds.map((id, idx) => {
+    const node = getNode(id);
+    const edgeKind = idx === 0 ? null : parent.get(id).edgeKind;
+    return { name: node.name, kind: node.kind, file: node.file, line: node.line, edgeKind };
+  });
+
+  db.close();
+  return {
+    from,
+    to,
+    fromCandidates,
+    toCandidates,
+    found: true,
+    hops: foundDepth,
+    path: resultPath,
+    alternateCount,
+    edgeKinds,
+    reverse,
+    maxDepth,
+  };
+}
+
+export function symbolPath(from, to, customDbPath, opts = {}) {
+  const data = pathData(from, to, customDbPath, opts);
+  if (opts.json) {
+    console.log(JSON.stringify(data, null, 2));
+    return;
+  }
+
+  if (data.error) {
+    console.log(data.error);
+    return;
+  }
+
+  if (!data.found) {
+    const dir = data.reverse ? 'reverse ' : '';
+    console.log(`No ${dir}path from "${from}" to "${to}" within ${data.maxDepth} hops.`);
+    if (data.fromCandidates.length > 1) {
+      console.log(
+        `\n  "${from}" matched ${data.fromCandidates.length} symbols — using top match: ${data.fromCandidates[0].name} (${data.fromCandidates[0].file}:${data.fromCandidates[0].line})`,
+      );
+    }
+    if (data.toCandidates.length > 1) {
+      console.log(
+        `  "${to}" matched ${data.toCandidates.length} symbols — using top match: ${data.toCandidates[0].name} (${data.toCandidates[0].file}:${data.toCandidates[0].line})`,
+      );
+    }
+    return;
+  }
+
+  if (data.hops === 0) {
+    console.log(`\n"${from}" and "${to}" resolve to the same symbol (0 hops):`);
+    const n = data.path[0];
+    console.log(`  ${kindIcon(n.kind)} ${n.name} (${n.kind}) -- ${n.file}:${n.line}\n`);
+    return;
+  }
+
+  const dir = data.reverse ? ' (reverse)' : '';
+  console.log(
+    `\nPath from ${from} to ${to} (${data.hops} ${data.hops === 1 ? 'hop' : 'hops'})${dir}:\n`,
+  );
+  for (let i = 0; i < data.path.length; i++) {
+    const n = data.path[i];
+    const indent = '  '.repeat(i + 1);
+    if (i === 0) {
+      console.log(`${indent}${kindIcon(n.kind)} ${n.name} (${n.kind}) -- ${n.file}:${n.line}`);
+    } else {
+      console.log(
+        `${indent}--[${n.edgeKind}]--> ${kindIcon(n.kind)} ${n.name} (${n.kind}) -- ${n.file}:${n.line}`,
+      );
+    }
+  }
+
+  if (data.alternateCount > 0) {
+    console.log(
+      `\n  (${data.alternateCount} alternate shortest ${data.alternateCount === 1 ? 'path' : 'paths'} at same depth)`,
+    );
+  }
+  console.log();
+}
+
 /**
  * Fix #2: Shell injection vulnerability.
  * Uses execFileSync instead of execSync to prevent shell interpretation of user input.
diff --git a/tests/integration/cli.test.js b/tests/integration/cli.test.js
index 750200c5..10eac6d2 100644
--- a/tests/integration/cli.test.js
+++ b/tests/integration/cli.test.js
@@ -115,6 +115,15 @@ describe('CLI smoke tests', () => {
     expect(data).toHaveProperty('results');
   });
 
+  // ─── Path ───────────────────────────────────────────────────────────
+  test('path --json returns valid JSON with path info', () => {
+    const out = run('path', 'sumOfSquares', 'add', '--db', dbPath, '--json');
+    const data = JSON.parse(out);
+    expect(data).toHaveProperty('found');
+    expect(data).toHaveProperty('path');
+    expect(data).toHaveProperty('hops');
+  });
+
   // ─── Cycles ──────────────────────────────────────────────────────────
   test('cycles --json returns valid JSON', () => {
     const out = run('cycles', '--db', dbPath, '--json');
diff --git a/tests/integration/queries.test.js b/tests/integration/queries.test.js
index 6c982bca..69cf916b 100644
--- a/tests/integration/queries.test.js
+++ b/tests/integration/queries.test.js
@@ -33,6 +33,7 @@ import {
   fnImpactData,
   impactAnalysisData,
   moduleMapData,
+  pathData,
   queryNameData,
   statsData,
   whereData,
@@ -324,6 +325,96 @@ describe('fnImpactData', () => {
   });
 });
 
+// ─── pathData ─────────────────────────────────────────────────────────
+
+describe('pathData', () => {
+  test('finds direct 1-hop path', () => {
+    const data = pathData('authMiddleware', 'authenticate', dbPath);
+    expect(data.found).toBe(true);
+    expect(data.hops).toBe(1);
+    expect(data.path).toHaveLength(2);
+    expect(data.path[0].name).toBe('authMiddleware');
+    expect(data.path[0].edgeKind).toBeNull();
+    expect(data.path[1].name).toBe('authenticate');
+    expect(data.path[1].edgeKind).toBe('calls');
+  });
+
+  test('finds multi-hop path', () => {
+    const data = pathData('handleRoute', 'validateToken', dbPath);
+    expect(data.found).toBe(true);
+    expect(data.hops).toBe(2);
+    expect(data.path).toHaveLength(3);
+    expect(data.path[0].name).toBe('handleRoute');
+    expect(data.path[data.path.length - 1].name).toBe('validateToken');
+  });
+
+  test('returns not found when no forward path exists', () => {
+    const data = pathData('validateToken', 'handleRoute', dbPath);
+    expect(data.found).toBe(false);
+    expect(data.path).toHaveLength(0);
+  });
+
+  test('reverse direction finds upstream path', () => {
+    const data = pathData('validateToken', 'handleRoute', dbPath, { reverse: true });
+    expect(data.found).toBe(true);
+    expect(data.hops).toBeGreaterThanOrEqual(1);
+    expect(data.path[0].name).toBe('validateToken');
+    expect(data.path[data.path.length - 1].name).toBe('handleRoute');
+    expect(data.reverse).toBe(true);
+  });
+
+  test('self-path returns 0 hops', () => {
+    const data = pathData('authenticate', 'authenticate', dbPath);
+    expect(data.found).toBe(true);
+    expect(data.hops).toBe(0);
+    expect(data.path).toHaveLength(1);
+    expect(data.path[0].name).toBe('authenticate');
+  });
+
+  test('maxDepth limits search', () => {
+    // handleRoute → validateToken is 2 hops; maxDepth=1 should miss it
+    const data = pathData('handleRoute', 'validateToken', dbPath, { maxDepth: 1 });
+    expect(data.found).toBe(false);
+  });
+
+  test('nonexistent from symbol returns error', () => {
+    const data = pathData('nonexistent', 'authenticate', dbPath);
+    expect(data.found).toBe(false);
+    expect(data.error).toContain('nonexistent');
+  });
+
+  test('nonexistent to symbol returns error', () => {
+    const data = pathData('authenticate', 'nonexistent', dbPath);
+    expect(data.found).toBe(false);
+    expect(data.error).toContain('nonexistent');
+  });
+
+  test('noTests filters test file nodes', () => {
+    // testAuthenticate → authenticate exists, but with noTests testAuthenticate is excluded
+    const data = pathData('testAuthenticate', 'validateToken', dbPath, { noTests: true });
+    expect(data.found).toBe(false);
+    expect(data.fromCandidates).toHaveLength(0);
+  });
+
+  test('alternateCount reports alternate shortest paths', () => {
+    // handleRoute → validateToken: two 2-hop paths
+    //   handleRoute → authMiddleware → validateToken
+    //   handleRoute → authenticate → validateToken
+    // (also handleRoute → formatResponse → validateToken at 0.3 confidence)
+    const data = pathData('handleRoute', 'validateToken', dbPath);
+    expect(data.found).toBe(true);
+    expect(data.alternateCount).toBeGreaterThanOrEqual(1);
+  });
+
+  test('populates fromCandidates and toCandidates', () => {
+    const data = pathData('authMiddleware', 'authenticate', dbPath);
+    expect(data.fromCandidates.length).toBeGreaterThanOrEqual(1);
+    expect(data.toCandidates.length).toBeGreaterThanOrEqual(1);
+    expect(data.fromCandidates[0]).toHaveProperty('name');
+    expect(data.fromCandidates[0]).toHaveProperty('file');
+  });
+});
+
 // ─── diffImpactData ───────────────────────────────────────────────────
 
 describe('diffImpactData', () => {
diff --git a/tests/unit/mcp.test.js b/tests/unit/mcp.test.js
index 6b603367..e7f958de 100644
--- a/tests/unit/mcp.test.js
+++ b/tests/unit/mcp.test.js
@@ -16,6 +16,7 @@ const ALL_TOOL_NAMES = [
   'module_map',
   'fn_deps',
   'fn_impact',
+  'symbol_path',
   'context',
   'explain',
   'where',
@@ -98,6 +99,21 @@ describe('TOOLS', () => {
     expect(fi.inputSchema.properties.kind.enum).toBeDefined();
   });
 
+  it('symbol_path requires from and to parameters', () => {
+    const sp = TOOLS.find((t) => t.name === 'symbol_path');
+    expect(sp).toBeDefined();
+    expect(sp.inputSchema.required).toContain('from');
+    expect(sp.inputSchema.required).toContain('to');
+    expect(sp.inputSchema.properties).toHaveProperty('max_depth');
+    expect(sp.inputSchema.properties).toHaveProperty('edge_kinds');
+    expect(sp.inputSchema.properties).toHaveProperty('reverse');
+    expect(sp.inputSchema.properties).toHaveProperty('from_file');
+    expect(sp.inputSchema.properties).toHaveProperty('to_file');
+    expect(sp.inputSchema.properties).toHaveProperty('kind');
+    expect(sp.inputSchema.properties.kind.enum).toBeDefined();
+    expect(sp.inputSchema.properties).toHaveProperty('no_tests');
+  });
+
   it('where requires target parameter', () => {
     const w = TOOLS.find((t) => t.name === 'where');
     expect(w).toBeDefined();
@@ -237,6 +253,7 @@ describe('startMCPServer handler dispatch', () => {
       diffImpactData: vi.fn(() => ({ changedFiles: 0, affectedFunctions: [] })),
       listFunctionsData: vi.fn(() => ({ count: 0, functions: [] })),
       rolesData: vi.fn(() => ({ count: 0, summary: {}, symbols: [] })),
+      pathData: vi.fn(() => ({ from: 'a', to: 'b', found: false })),
     }));
 
     // Clear module cache and reimport
@@ -300,6 +317,7 @@ describe('startMCPServer handler dispatch', () => {
       diffImpactData: vi.fn(),
       listFunctionsData: vi.fn(),
       rolesData: vi.fn(),
+      pathData: vi.fn(() => ({ from: 'a', to: 'b', found: false })),
     }));
 
     const { startMCPServer } = await import('../../src/mcp.js');
@@ -357,6 +375,7 @@ describe('startMCPServer handler dispatch', () => {
       diffImpactData: vi.fn(),
       listFunctionsData: vi.fn(),
       rolesData: vi.fn(),
+      pathData: vi.fn(() => ({ from: 'a', to: 'b', found: false })),
     }));
 
     const { startMCPServer } = await import('../../src/mcp.js');
@@ -409,6 +428,7 @@ describe('startMCPServer handler dispatch', () => {
       diffImpactData: diffImpactMock,
       listFunctionsData: vi.fn(),
       rolesData: vi.fn(),
+      pathData: vi.fn(() => ({ from: 'a', to: 'b', found: false })),
     }));
 
     const { startMCPServer } = await import('../../src/mcp.js');
@@ -466,6 +486,7 @@ describe('startMCPServer handler dispatch', () => {
       diffImpactData: vi.fn(),
       listFunctionsData: listFnMock,
       rolesData: vi.fn(),
+      pathData: vi.fn(() => ({ from: 'a', to: 'b', found: false })),
     }));
 
     const { startMCPServer } = await import('../../src/mcp.js');
@@ -524,6 +545,7 @@ describe('startMCPServer handler dispatch', () => {
       diffImpactData: vi.fn(),
       listFunctionsData: vi.fn(),
       rolesData: vi.fn(),
+      pathData: vi.fn(() => ({ from: 'a', to: 'b', found: false })),
     }));
 
     const { startMCPServer } = await import('../../src/mcp.js');
@@ -577,6 +599,7 @@ describe('startMCPServer handler dispatch', () => {
       diffImpactData: vi.fn(),
       listFunctionsData: vi.fn(),
       rolesData: vi.fn(),
+      pathData: vi.fn(() => ({ from: 'a', to: 'b', found: false })),
     }));
 
     const { startMCPServer } = await import('../../src/mcp.js');
@@ -629,6 +652,7 @@ describe('startMCPServer handler dispatch', () => {
       diffImpactData: vi.fn(),
       listFunctionsData: vi.fn(),
       rolesData: vi.fn(),
+      pathData: vi.fn(() => ({ from: 'a', to: 'b', found: false })),
     }));
 
     const { startMCPServer } = await import('../../src/mcp.js');
@@ -683,6 +707,7 @@ describe('startMCPServer handler dispatch', () => {
       diffImpactData: vi.fn(),
       listFunctionsData: vi.fn(),
       rolesData: vi.fn(),
+      pathData: vi.fn(() => ({ from: 'a', to: 'b', found: false })),
     }));
 
     const { startMCPServer } = await import('../../src/mcp.js');
@@ -740,6 +765,7 @@ describe('startMCPServer handler dispatch', () => {
       diffImpactData: vi.fn(),
       listFunctionsData: vi.fn(),
       rolesData: vi.fn(),
+      pathData: vi.fn(() => ({ from: 'a', to: 'b', found: false })),
     }));
 
     const { startMCPServer } = await import('../../src/mcp.js');
@@ -797,6 +823,7 @@ describe('startMCPServer handler dispatch', () => {
       diffImpactData: vi.fn(),
       listFunctionsData: vi.fn(),
       rolesData: vi.fn(),
+      pathData: vi.fn(() => ({ from: 'a', to: 'b', found: false })),
     }));
 
     const { startMCPServer } = await import('../../src/mcp.js');
@@ -845,6 +872,7 @@ describe('startMCPServer handler dispatch', () => {
       diffImpactData: vi.fn(),
       listFunctionsData: vi.fn(),
       rolesData: vi.fn(),
+      pathData: vi.fn(() => ({ from: 'a', to: 'b', found: false })),
     }));
 
     const { startMCPServer } = await import('../../src/mcp.js');
@@ -893,6 +921,7 @@ describe('startMCPServer handler dispatch', () => {
       diffImpactData: vi.fn(),
       listFunctionsData: vi.fn(),
       rolesData: vi.fn(),
+      pathData: vi.fn(() => ({ from: 'a', to: 'b', found: false })),
     }));
 
     const { startMCPServer } = await import('../../src/mcp.js');
@@ -941,6 +970,7 @@ describe('startMCPServer handler dispatch', () => {
       diffImpactData: vi.fn(),
       listFunctionsData: vi.fn(),
       rolesData: vi.fn(),
+      pathData: vi.fn(() => ({ from: 'a', to: 'b', found: false })),
     }));
 
     const { startMCPServer } = await import('../../src/mcp.js');

From bad02f692e0b1be9673dc61fad2c3d4f5117f0de Mon Sep 17 00:00:00 2001
From: "github-actions[bot]" <github-actions[bot]@users.noreply.github.com>
Date: Thu, 26 Feb 2026 00:37:50 -0700
Subject: [PATCH 2/8] docs: add Titan Paradigm use case, update docs with
 roles/co-change/path
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Create docs/use-cases/titan-paradigm.md — maps Johannes R.'s multi-agent
  codebase cleanup architecture (RECON, GAUNTLET, GLOBAL SYNC, STATE MACHINE)
  to codegraph commands, roadmap items, and post-LLM-integration recommendations

- Update roadmap/BACKLOG.md: mark #4 (node classification), #9 (git change
  coupling), #1 (dead code), #2 (shortest path), #12 (execution flow) as DONE;
  add 6 new Titan Paradigm-inspired items (#21-#26): composite audit, batch
  querying, triage priority queue, change validation predicates, graph
  snapshots, MCP orchestration tools

- Update README.md: add roles + co-change to features table, differentiators,
  commands section, agent template, common flags, comparison table; update MCP
  tool count 18 → 19

- Update docs/recommended-practices.md: update MCP tool count and tool list,
  add roles/co-change/path to CLAUDE.md template and developer workflow, add
  "Understand architectural roles" and "Surface hidden coupling" sections,
  add co-change step to setup checklist

- Add full examples with real output for roles, co-change, and path to
  docs/examples/CLI.md and docs/examples/MCP.md

- Update GitHub repo description with new capabilities
---
 README.md                            |   5 +-
 docs/examples/CLI.md                 | 186 +++++++++++++++++
 docs/examples/MCP.md                 | 149 ++++++++++++++
 docs/guides/recommended-practices.md |  46 ++++-
 docs/roadmap/BACKLOG.md              |  14 +-
 docs/use-cases/titan-paradigm.md     | 286 +++++++++++++++++++++++++++
 6 files changed, 676 insertions(+), 10 deletions(-)
 create mode 100644 docs/use-cases/titan-paradigm.md

diff --git a/README.md b/README.md
index 8e56c2dc..6989c6b2 100644
--- a/README.md
+++ b/README.md
@@ -319,7 +319,7 @@ codegraph registry remove <name>  # Unregister
 | Flag | Description |
 |---|---|
 | `-d, --db <path>` | Custom path to `graph.db` |
-| `-T, --no-tests` | Exclude `.test.`, `.spec.`, `__test__` files (available on `fn`, `fn-impact`, `path`, `context`, `explain`, `where`, `diff-impact`, `search`, `map`, `hotspots`, `deps`, `impact`) |
+| `-T, --no-tests` | Exclude `.test.`, `.spec.`, `__test__` files (available on `fn`, `fn-impact`, `path`, `context`, `explain`, `where`, `diff-impact`, `search`, `map`, `hotspots`, `roles`, `co-change`, `deps`, `impact`) |
 | `--depth <n>` | Transitive trace depth (default varies by command) |
 | `-j, --json` | Output as JSON |
 | `-v, --verbose` | Enable debug output |
@@ -467,6 +467,9 @@ This project uses codegraph. The database is at `.codegraph/graph.db`.
 - `codegraph fn <name> -T` — function call chain
 - `codegraph path <from> <to> -T` — shortest call path between two symbols
 - `codegraph deps <file>` — file-level dependencies
+- `codegraph roles --role dead -T` — find dead code (unreferenced symbols)
+- `codegraph roles --role core -T` — find core symbols (high fan-in)
+- `codegraph co-change <file>` — files that historically change together
 - `codegraph search "<query>"` — semantic search (requires `codegraph embed`)
 - `codegraph cycles` — check for circular dependencies
 
diff --git a/docs/examples/CLI.md b/docs/examples/CLI.md
index 6c3a0f8a..f5795a9a 100644
--- a/docs/examples/CLI.md
+++ b/docs/examples/CLI.md
@@ -620,6 +620,192 @@ Codegraph Diagnostics
 
 ---
 
+## roles — Node role classification
+
+```bash
+codegraph roles -T
+```
+
+```
+Node roles (639 symbols):
+
+  core: 168  utility: 285  entry: 29  dead: 137  leaf: 20
+
+## core (168)
+  f safePath           src/queries.js:14
+  f isTestFile         src/queries.js:21
+  f getClassHierarchy  src/queries.js:76
+  f findMatchingNodes  src/queries.js:127
+  f kindIcon           src/queries.js:175
+  ...
+```
+
+Filter by role and file:
+
+```bash
+codegraph roles --role dead -T
+```
+
+```
+Node roles (137 symbols):
+
+  dead: 137
+
+## dead (137)
+  f main                 crates/codegraph-core/build.rs:3
+  - TarjanState          crates/codegraph-core/src/cycles.rs:38
+  - CSharpExtractor      crates/codegraph-core/src/extractors/csharp.rs:6
+  o CSharpExtractor.extract  crates/codegraph-core/src/extractors/csharp.rs:9
+  ...
+```
+
+```bash
+codegraph roles --role entry -T
+```
+
+```
+Node roles (29 symbols):
+
+  entry: 29
+
+## entry (29)
+  f command:build        src/cli.js:89
+  f command:query        src/cli.js:102
+  f command:impact       src/cli.js:113
+  f command:map          src/cli.js:125
+  f command:stats        src/cli.js:139
+  ...
+```
+
+```bash
+codegraph roles --role core --file src/queries.js
+```
+
+```
+Node roles (16 symbols):
+
+  core: 16
+
+## core (16)
+  f safePath             src/queries.js:14
+  f isTestFile           src/queries.js:21
+  f getClassHierarchy    src/queries.js:76
+  f resolveMethodViaHierarchy  src/queries.js:97
+  f findMatchingNodes    src/queries.js:127
+  f kindIcon             src/queries.js:175
+  f moduleMapData        src/queries.js:310
+  f diffImpactMermaid    src/queries.js:766
+  ...
+```
+
+---
+
+## co-change — Git co-change analysis
+
+First, scan git history:
+
+```bash
+codegraph co-change --analyze
+```
+
+```
+Co-change analysis complete: 173 pairs from 289 commits (since: 1 year ago)
+```
+
+Then query globally or per file:
+
+```bash
+codegraph co-change
+```
+
+```
+Top co-change pairs:
+
+  100%     3 commits  src/extractors/csharp.js  <->  src/extractors/go.js
+  100%     3 commits  src/extractors/csharp.js  <->  src/extractors/java.js
+  100%     3 commits  src/extractors/csharp.js  <->  src/extractors/php.js
+  100%     3 commits  src/extractors/csharp.js  <->  src/extractors/ruby.js
+  100%     3 commits  src/extractors/go.js      <->  src/extractors/java.js
+  ...
+
+  Analyzed: 2026-02-26 | Window: 1 year ago
+```
+
+```bash
+codegraph co-change src/queries.js
+```
+
+```
+Co-change partners for src/queries.js:
+
+   43%    12 commits  src/mcp.js
+
+  Analyzed: 2026-02-26 | Window: 1 year ago
+```
+
+```bash
+codegraph co-change --min-jaccard 0.5 --min-support 5
+```
+
+```
+Top co-change pairs:
+
+  100%     5 commits  src/parser.js  <->  src/constants.js
+   78%     7 commits  src/builder.js  <->  src/resolve.js
+
+  Analyzed: 2026-02-26 | Window: 1 year ago
+```
+
+---
+
+## path — Shortest path between two symbols
+
+```bash
+codegraph path buildGraph resolveImports -T
+```
+
+```
+Path: buildGraph → resolveImports (1 hop)
+
+  buildGraph  src/builder.js:335  →(calls)→  resolveImports  src/resolve.js:42
+
+  Hops: 1 | Alternate paths: 0
+```
+
+```bash
+codegraph path buildGraph isTestFile -T
+```
+
+```
+Path: buildGraph → isTestFile (2 hops)
+
+  buildGraph      src/builder.js:335
+    →(calls)→  collectFiles  src/builder.js:45
+    →(calls)→  isTestFile    src/queries.js:21
+
+  Hops: 2 | Alternate paths: 1
+```
+
+```bash
+codegraph path buildGraph isTestFile -T --json
+```
+
+```json
+{
+  "from": "buildGraph",
+  "to": "isTestFile",
+  "hops": 2,
+  "path": [
+    { "name": "buildGraph", "file": "src/builder.js", "line": 335 },
+    { "name": "collectFiles", "file": "src/builder.js", "line": 45, "edgeKind": "calls" },
+    { "name": "isTestFile", "file": "src/queries.js", "line": 21, "edgeKind": "calls" }
+  ],
+  "alternatePaths": 1
+}
+```
+
+---
+
 ## registry — Multi-repo management
 
 ```bash
diff --git a/docs/examples/MCP.md b/docs/examples/MCP.md
index 8941e30b..e64ce9ac 100644
--- a/docs/examples/MCP.md
+++ b/docs/examples/MCP.md
@@ -583,6 +583,155 @@ graph LR
 
 ---
 
+## node_roles — Node role classification
+
+```json
+{
+  "tool": "node_roles",
+  "arguments": { "no_tests": true }
+}
+```
+
+```
+Node roles (639 symbols):
+
+  core: 168  utility: 285  entry: 29  dead: 137  leaf: 20
+
+## core (168)
+  f safePath           src/queries.js:14
+  f isTestFile         src/queries.js:21
+  f getClassHierarchy  src/queries.js:76
+  ...
+
+## entry (29)
+  f command:build      src/cli.js:89
+  f command:query      src/cli.js:102
+  ...
+```
+
+Filter by role:
+
+```json
+{
+  "tool": "node_roles",
+  "arguments": { "role": "dead", "no_tests": true }
+}
+```
+
+```
+Node roles (137 symbols):
+
+  dead: 137
+
+## dead (137)
+  f main                 crates/codegraph-core/build.rs:3
+  - TarjanState          crates/codegraph-core/src/cycles.rs:38
+  - CSharpExtractor      crates/codegraph-core/src/extractors/csharp.rs:6
+  ...
+```
+
+Filter by role and file:
+
+```json
+{
+  "tool": "node_roles",
+  "arguments": { "role": "core", "file": "src/queries.js" }
+}
+```
+
+```
+Node roles (16 symbols):
+
+  core: 16
+
+## core (16)
+  f safePath             src/queries.js:14
+  f isTestFile           src/queries.js:21
+  f getClassHierarchy    src/queries.js:76
+  f resolveMethodViaHierarchy  src/queries.js:97
+  f findMatchingNodes    src/queries.js:127
+  ...
+```
+
+---
+
+## co_changes — Git co-change analysis
+
+Query top co-changing file pairs:
+
+```json
+{
+  "tool": "co_changes",
+  "arguments": { "no_tests": true }
+}
+```
+
+```
+Top co-change pairs:
+
+  100%     3 commits  src/extractors/csharp.js  <->  src/extractors/go.js
+  100%     3 commits  src/extractors/csharp.js  <->  src/extractors/java.js
+  100%     3 commits  src/extractors/go.js      <->  src/extractors/java.js
+  ...
+
+  Analyzed: 2026-02-26 | Window: 1 year ago
+```
+
+Query co-change partners for a specific file:
+
+```json
+{
+  "tool": "co_changes",
+  "arguments": { "file": "src/queries.js" }
+}
+```
+
+```
+Co-change partners for src/queries.js:
+
+   43%    12 commits  src/mcp.js
+
+  Analyzed: 2026-02-26 | Window: 1 year ago
+```
+
+---
+
+## symbol_path — Shortest path between two symbols
+
+```json
+{
+  "tool": "symbol_path",
+  "arguments": { "from": "buildGraph", "to": "resolveImports", "no_tests": true }
+}
+```
+
+```
+Path: buildGraph → resolveImports (1 hop)
+
+  buildGraph  src/builder.js:335  →(calls)→  resolveImports  src/resolve.js:42
+
+  Hops: 1 | Alternate paths: 0
+```
+
+```json
+{
+  "tool": "symbol_path",
+  "arguments": { "from": "buildGraph", "to": "isTestFile", "no_tests": true }
+}
+```
+
+```
+Path: buildGraph → isTestFile (2 hops)
+
+  buildGraph      src/builder.js:335
+    →(calls)→  collectFiles  src/builder.js:45
+    →(calls)→  isTestFile    src/queries.js:21
+
+  Hops: 2 | Alternate paths: 1
+```
+
+---
+
 ## list_repos — Multi-repo registry (multi-repo mode only)
 
 Only available when the MCP server is started with `--multi-repo`.
diff --git a/docs/guides/recommended-practices.md b/docs/guides/recommended-practices.md
index 8ae3cb51..15734001 100644
--- a/docs/guides/recommended-practices.md
+++ b/docs/guides/recommended-practices.md
@@ -143,7 +143,7 @@ By default, the MCP server runs in **single-repo mode** — the AI agent can onl
 
 Enable `--multi-repo` to let the agent query any registered repository, or use `--repos` to restrict access to a specific set of repos.
 
-The server exposes 18 tools: `query_function`, `file_deps`, `impact_analysis`, `find_cycles`, `module_map`, `fn_deps`, `fn_impact`, `symbol_path`, `context`, `explain`, `where`, `diff_impact`, `semantic_search`, `export_graph`, `list_functions`, `structure`, `hotspots`, and `list_repos` (multi-repo only). See the [AI Agent Guide MCP reference](./ai-agent-guide.md#mcp-server-reference) for the full tool-to-CLI mapping table.
+The server exposes 20 tools: `query_function`, `file_deps`, `impact_analysis`, `find_cycles`, `module_map`, `fn_deps`, `fn_impact`, `symbol_path`, `context`, `explain`, `where`, `diff_impact`, `semantic_search`, `export_graph`, `list_functions`, `structure`, `hotspots`, `node_roles`, `co_changes`, and `list_repos` (multi-repo only). See the [AI Agent Guide MCP reference](./ai-agent-guide.md#mcp-server-reference) for the full tool-to-CLI mapping table.
 
 ### CLAUDE.md for your project
 
@@ -167,7 +167,11 @@ This project uses codegraph. The database is at `.codegraph/graph.db`.
 - `codegraph build .` — rebuild the graph (incremental by default)
 - `codegraph map` — module overview
 - `codegraph fn <name> -T` — function call chain
+- `codegraph path <from> <to> -T` — shortest call path between two symbols
 - `codegraph deps <file>` — file-level dependencies
+- `codegraph roles --role dead -T` — find dead code (unreferenced symbols)
+- `codegraph roles --role core -T` — find core symbols (high fan-in)
+- `codegraph co-change <file>` — files that historically change together
 - `codegraph search "<query>"` — semantic search (requires `codegraph embed`)
 - `codegraph cycles` — check for circular dependencies
 
@@ -278,11 +282,14 @@ Changes are picked up incrementally — no manual rebuilds needed.
 
 ### Explore before you edit
 
-Before touching a function, check its blast radius:
+Before touching a function, understand its role and blast radius:
 
 ```bash
-codegraph fn myFunction --no-tests      # callers, callees, call chain
+codegraph where myFunction               # where it's defined and used
+codegraph roles --file src/utils/auth.ts # role of every symbol in the file (entry/core/utility/dead)
+codegraph fn myFunction --no-tests       # callers, callees, call chain
 codegraph fn-impact myFunction --no-tests  # what breaks if this changes
+codegraph path myFunction otherFunction -T # how two symbols are connected
 ```
 
 Before touching a file:
@@ -290,8 +297,34 @@ Before touching a file:
 ```bash
 codegraph deps src/utils/auth.ts         # imports and importers
 codegraph impact src/utils/auth.ts       # transitive reverse deps
+codegraph co-change src/utils/auth.ts    # files that historically change together with this one
 ```
 
+### Understand architectural roles
+
+Every symbol is auto-classified based on its connectivity pattern. Use this to prioritize what to review, find dead code, or understand a module's structure:
+
+```bash
+codegraph roles -T                       # all roles across the codebase
+codegraph roles --role dead -T           # unreferenced, non-exported symbols (cleanup candidates)
+codegraph roles --role entry -T          # entry points (high fan-out, low fan-in)
+codegraph roles --role core -T           # core symbols (high fan-in — break these, break everything)
+codegraph roles --role core --file src/builder.js  # core symbols in a specific file
+```
+
+### Surface hidden coupling with co-change analysis
+
+Static imports don't tell the full story. Files that always change together in git history are coupled — even if they don't import each other:
+
+```bash
+codegraph co-change --analyze            # scan git history (run once, then incremental)
+codegraph co-change src/parser.js        # what files always change with parser.js?
+codegraph co-change                      # top co-changing file pairs globally
+codegraph co-change --min-jaccard 0.5    # only strong coupling
+```
+
+Co-change data is automatically included in `diff-impact` output — historically coupled files appear alongside the static dependency analysis.
+
 ### Find circular dependencies early
 
 ```bash
@@ -513,9 +546,12 @@ echo "codegraph build" > .husky/pre-commit
 mkdir -p .github/workflows
 cp node_modules/@optave/codegraph/.github/workflows/codegraph-impact.yml .github/workflows/
 
-# 6. (Optional) Build embeddings for semantic search
+# 6. (Optional) Scan git history for co-change coupling
+codegraph co-change --analyze
+
+# 7. (Optional) Build embeddings for semantic search
 codegraph embed
 
-# 7. (Optional) Add CLAUDE.md for AI agents
+# 8. (Optional) Add CLAUDE.md for AI agents
 # See docs/guides/ai-agent-guide.md for the full template
 ```
diff --git a/docs/roadmap/BACKLOG.md b/docs/roadmap/BACKLOG.md
index 18892107..46d40020 100644
--- a/docs/roadmap/BACKLOG.md
+++ b/docs/roadmap/BACKLOG.md
@@ -26,18 +26,24 @@ Non-breaking, ordered by problem-fit:
 
 | ID | Title | Description | Category | Benefit | Zero-dep | Foundation-aligned | Problem-fit (1-5) | Breaking |
 |----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|
-| 4 | ~~Node classification~~ | ~~Auto-tag symbols as Entry Point / Core / Utility / Adapter based on in-degree/out-degree patterns. High fan-in + low fan-out = Core. Zero fan-in + non-export = Dead. Inspired by arbor.~~ | Intelligence | ~~Agents immediately understand architectural role of any symbol without reading surrounding code — fewer orientation tokens~~ | ✓ | ✓ | 5 | No | **DONE** — `classifyNodeRoles()` in `structure.js` auto-tags every symbol as `entry`/`core`/`utility`/`adapter`/`dead`/`leaf` using median-based fan-in/fan-out thresholds. Roles stored in DB (`role` column, migration v5), surfaced in `where`/`explain`/`context`/`stats`/`list-functions`, new `roles` CLI command, new `node_roles` MCP tool (18 tools total). Includes `--role` and `--file` filters. |
-| 9 | ~~Git change coupling~~ | ~~Analyze git history for files/functions that always change together. Surfaces hidden dependencies that the static graph can't see. Enhances `diff-impact` with historical co-change data. Inspired by axon.~~ | Analysis | ~~`diff-impact` catches more breakage by including historically coupled files; agents get a more complete blast radius picture~~ | ✓ | ✓ | 5 | No | **DONE** — `src/cochange.js` module with scan, compute, analyze, and query functions. DB migration v5 adds `co_changes` + `co_change_meta` tables. CLI command `codegraph co-change [file]` with `--analyze`, `--since`, `--min-support`, `--min-jaccard`, `--full` options. Integrates into `diff-impact` output via `historicallyCoupled` section. New `co_changes` MCP tool (19 tools total). Uses Jaccard similarity on commit history. |
+| 4 | ~~Node classification~~ | ~~Auto-tag symbols as Entry Point / Core / Utility / Adapter based on in-degree/out-degree patterns. High fan-in + low fan-out = Core. Zero fan-in + non-export = Dead. Inspired by arbor.~~ | Intelligence | ~~Agents immediately understand architectural role of any symbol without reading surrounding code — fewer orientation tokens~~ | ✓ | ✓ | 5 | No | **DONE** — `classifyNodeRoles()` in `structure.js` auto-tags every symbol as `entry`/`core`/`utility`/`adapter`/`dead`/`leaf` using median-based fan-in/fan-out thresholds. Roles stored in DB (`role` column, migration v5), surfaced in `where`/`explain`/`context`/`stats`/`list-functions`, new `roles` CLI command, new `node_roles` MCP tool. Includes `--role` and `--file` filters. |
+| 9 | ~~Git change coupling~~ | ~~Analyze git history for files/functions that always change together. Surfaces hidden dependencies that the static graph can't see. Enhances `diff-impact` with historical co-change data. Inspired by axon.~~ | Analysis | ~~`diff-impact` catches more breakage by including historically coupled files; agents get a more complete blast radius picture~~ | ✓ | ✓ | 5 | No | **DONE** — `src/cochange.js` module with scan, compute, analyze, and query functions. DB migration v5 adds `co_changes` + `co_change_meta` tables. CLI command `codegraph co-change [file]` with `--analyze`, `--since`, `--min-support`, `--min-jaccard`, `--full` options. Integrates into `diff-impact` output via `historicallyCoupled` section. New `co_changes` MCP tool. Uses Jaccard similarity on commit history. |
 | 1 | ~~Dead code detection~~ | ~~Find symbols with zero incoming edges (excluding entry points and exports). Agents constantly ask "is this used?" — the graph already has the data, we just need to surface it. Inspired by narsil-mcp, axon, codexray, CKB.~~ | Analysis | ~~Agents stop wasting tokens investigating dead code; developers get actionable cleanup lists without external tools~~ | ✓ | ✓ | 4 | No | **DONE** — Delivered as part of node classification (ID 4). `codegraph roles --role dead -T` lists all symbols with zero fan-in that aren't exported. |
-| 2 | Shortest path A→B | BFS/Dijkstra on the existing edges table to find how symbol A reaches symbol B. We have `fn` for single-node chains but no A→B pathfinding. Inspired by codexray, arbor. | Navigation | Agents can answer "how does this function reach that one?" in one call instead of manually tracing chains | ✓ | ✓ | 4 | No |
-| 12 | Execution flow tracing | Framework-aware entry point detection (Express routes, CLI commands, event handlers) + BFS flow tracing from entry to leaf. Inspired by axon, GitNexus, code-context-mcp. | Navigation | Agents can answer "what happens when a user hits POST /login?" by tracing the full execution path in one query | ✓ | ✓ | 4 | No |
+| 2 | ~~Shortest path A→B~~ | ~~BFS/Dijkstra on the existing edges table to find how symbol A reaches symbol B. We have `fn` for single-node chains but no A→B pathfinding. Inspired by codexray, arbor.~~ | Navigation | ~~Agents can answer "how does this function reach that one?" in one call instead of manually tracing chains~~ | ✓ | ✓ | 4 | No | **DONE** — `codegraph path <from> <to>` command with `--reverse`, `--max-depth`, `--kinds` options. BFS pathfinding on the edges table. `symbol_path` MCP tool. |
+| 12 | ~~Execution flow tracing~~ | ~~Framework-aware entry point detection (Express routes, CLI commands, event handlers) + BFS flow tracing from entry to leaf. Inspired by axon, GitNexus, code-context-mcp.~~ | Navigation | ~~Agents can answer "what happens when a user hits POST /login?" by tracing the full execution path in one query~~ | ✓ | ✓ | 4 | No | **DONE** — `codegraph flow` command with entry point detection and BFS flow tracing. MCP tools `flow` and `entry_points` added. Merged in PR #118. |
 | 16 | Branch structural diff | Compare code structure between two branches using git worktrees. Show added/removed/changed symbols and their impact. Inspired by axon. | Analysis | Teams can review structural impact of feature branches before merge; agents get branch-aware context | ✓ | ✓ | 4 | No |
 | 20 | Streaming / chunked results | Support streaming output for large query results so MCP clients and programmatic consumers can process incrementally. | Embeddability | Large codebases don't blow up agent context windows; consumers process results as they arrive instead of waiting for the full payload | ✓ | ✓ | 4 | No |
+| 21 | Composite audit command | Single `codegraph audit <file-or-function>` that combines `explain`, `fn-impact`, and code health metrics into one structured report per function. Core version uses graph data; enhanced version includes Phase 3.4 `risk_score`/`complexity_notes`/`side_effects` when available. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) Gauntlet phase. | Orchestration | Each sub-agent in a multi-agent swarm gets everything it needs to assess a function in one call instead of 3-4 — directly reduces token waste and round-trips | ✓ | ✓ | 4 | No |
+| 22 | Batch querying | Accept a list of targets (file or JSON) and return all query results in one JSON payload. Applies to `audit`, `fn-impact`, `context`, and other per-symbol commands. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) swarm pattern. | Orchestration | A swarm of 20+ agents auditing different files can be fed from a single orchestrator call instead of N sequential invocations — reduces overhead and enables parallel dispatch | ✓ | ✓ | 4 | No |
+| 23 | Triage priority queue | Single `codegraph triage` command that merges `map` connectivity, `hotspots` fan-in/fan-out, node roles, and optionally git churn + `risk_score` into one ranked audit queue. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) RECON phase. | Orchestration | Orchestrating agent gets a single prioritized list of what to audit first — replaces manual synthesis of 3+ commands, saves RECON phase from burning tokens on orientation | ✓ | ✓ | 4 | No |
+| 24 | Change validation predicates | `codegraph check --staged` with configurable predicates: `--no-new-cycles`, `--max-blast-radius N`, `--no-signature-changes`, `--no-boundary-violations`. Returns exit code 0/1 for CI gates and state machines. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) STATE MACHINE phase. | CI | Automated rollback triggers without parsing JSON — orchestrators and CI pipelines get first-class pass/fail signals for blast radius, cycles, and contract changes | ✓ | ✓ | 4 | No |
+| 26 | MCP orchestration tools | Expose `audit`, `triage`, and `check` as MCP tools alongside existing tools. Enables multi-agent orchestrators (Claude Code agent teams, custom MCP clients) to run the full Titan Paradigm loop through the MCP protocol without CLI overhead. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md). | Embeddability | Agents query the graph through MCP with zero CLI overhead — fewer tokens, faster round-trips, native integration with AI agent frameworks | ✓ | ✓ | 4 | No |
 | 5 | TF-IDF lightweight search | SQLite FTS5 + TF-IDF as a middle tier (~50MB) between "no search" and full transformer embeddings (~500MB). Provides decent keyword search with near-zero overhead. Inspired by codexray. | Search | Users get useful search without the 500MB embedding model download; faster startup for small projects | ✓ | ✓ | 3 | No |
 | 13 | Architecture boundary rules | User-defined rules for allowed/forbidden dependencies between modules (e.g., "controllers must not import from other controllers"). Violations flagged in `diff-impact` and CI. Inspired by codegraph-rust, stratify. | Architecture | Prevents architectural decay in CI; agents are warned before introducing forbidden cross-module dependencies | ✓ | ✓ | 3 | No |
 | 15 | Hybrid BM25 + semantic search | Combine BM25 keyword matching with embedding-based semantic search using Reciprocal Rank Fusion. Better recall than either approach alone. Inspired by GitNexus, claude-context-local. | Search | Search results improve dramatically — keyword matches catch exact names, embeddings catch conceptual matches, RRF merges both | ✓ | ✓ | 3 | No |
 | 18 | CODEOWNERS integration | Map graph nodes to CODEOWNERS entries. Show who owns each function, surface ownership boundaries in `diff-impact`. Inspired by CKB. | Developer Experience | `diff-impact` tells agents which teams to notify; ownership-aware impact analysis reduces missed reviews | ✓ | ✓ | 3 | No |
 | 22 | Manifesto-driven pass/fail | User-defined rule engine with custom thresholds (e.g. "cognitive > 15 = fail", "cyclomatic > 10 = fail", "imports > 10 = decompose"). Outputs pass/fail per function/file. Generalizes ID 13 (boundary rules) into a generic rule system. | Analysis | Enables autonomous multi-agent audit workflows (GAUNTLET pattern); CI integration for code health gates with configurable thresholds | ✓ | ✓ | 3 | No |
+| 25 | Graph snapshots | `codegraph snapshot save <name>` / `codegraph snapshot restore <name>` for lightweight SQLite DB backup and restore. Enables orchestrators to checkpoint before refactoring passes and instantly rollback without rebuilding. After Phase 4, also preserves embeddings and semantic metadata. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) STATE MACHINE phase. | Orchestration | Multi-agent workflows get instant rollback without re-running expensive builds or LLM calls — orchestrator checkpoints before each pass and restores on failure | ✓ | ✓ | 3 | No |
 | 6 | Formal code health metrics | Cyclomatic complexity, Maintainability Index, and Halstead metrics per function — we already parse the AST, the data is there. Inspired by code-health-meter (published in ACM TOSEM 2025). | Analysis | Agents can prioritize refactoring targets; `hotspots` becomes richer with quantitative health scores per function | ✓ | ✓ | 2 | No |
 | 7 | OWASP/CWE pattern detection | Security pattern scanning on the existing AST — hardcoded secrets, SQL injection patterns, eval usage, XSS sinks. Lightweight static rules, not full taint analysis. Inspired by narsil-mcp, CKB. | Security | Catches low-hanging security issues during `diff-impact`; agents can flag risky patterns before they're committed | ✓ | ✓ | 2 | No |
 | 11 | Community detection | Leiden/Louvain algorithm to discover natural module boundaries vs actual file organization. Reveals which symbols are tightly coupled and whether the directory structure matches. Inspired by axon, GitNexus, CodeGraphMCPServer. | Intelligence | Surfaces architectural drift — when directory structure no longer matches actual dependency clusters; guides refactoring | ✓ | ✓ | 2 | No |
diff --git a/docs/use-cases/titan-paradigm.md b/docs/use-cases/titan-paradigm.md
new file mode 100644
index 00000000..9a0962f3
--- /dev/null
+++ b/docs/use-cases/titan-paradigm.md
@@ -0,0 +1,286 @@
+# Use Case: The Titan Paradigm — Autonomous Codebase Cleanup
+
+> How codegraph powers the RECON, GAUNTLET, GLOBAL SYNC, and STATE MACHINE phases of multi-agent codebase refactoring.
+
+---
+
+## The Problem
+
+In a [viral LinkedIn post](https://www.linkedin.com/posts/johannesr314_claude-vibecoding-activity-7432157088828678144-CiI_), **Johannes R.**, Senior Software Engineer at Google, described the #1 challenge of "vibe coding": keeping a fast-moving codebase from rotting.
+
+His answer isn't a better prompt. It's a different architecture.
+
+He calls it the **Titan Paradigm** — moving from a single chat to an autonomous multi-agent orchestration. It is, in his words, *"the only way I've found to fully autonomously get a massive codebase into Google-standard shape."*
+
+### The architecture
+
+| Phase | What it does |
+|-------|-------------|
+| **RECON** | One agent maps the dependency graph. It identifies "high-traffic" files and audits them first to prevent logic drift downstream |
+| **THE GAUNTLET** | A swarm of sub-agents audits every file against a strict manifesto. Complexity > 7 is a failure. Nesting > 3 is a failure. If it needs 10+ mocks to test, it gets decomposed |
+| **GLOBAL SYNC** | A lead agent identifies overlapping fixes across the repo to build shared abstractions before the swarm starts coding |
+| **STATE MACHINE** | Everything is tracked in a JSON state file. If a change breaks the build or fails a linter, the system auto-rolls back. Your intent survives even if the session resets |
+
+The insight is powerful: a single AI agent chatting with you cannot maintain a large codebase. You need **structure** — a dependency-aware orchestration layer that tells agents *where* to look, *what* to prioritize, and *what breaks* when they change things.
+
+That's exactly what codegraph provides.
+
+---
+
+## How Codegraph Helps — Today
+
+### RECON: Map the dependency graph, prioritize high-traffic files
+
+This is codegraph's bread and butter. The RECON phase needs a dependency graph — codegraph **is** a dependency graph.
+
+```bash
+# Build the graph (sub-second incremental rebuilds after the first run)
+codegraph build .
+
+# Identify high-traffic files — most-connected modules, ranked
+codegraph map --limit 30 --no-tests
+
+# Find structural hotspots — extreme fan-in, fan-out, coupling
+codegraph hotspots --no-tests
+
+# Graph health overview — node/edge counts, quality score
+codegraph stats
+```
+
+An orchestrating agent can use `map` and `hotspots` to build a priority queue: audit the most-connected files first, because changes there have the highest blast radius. The `--json` flag on every command makes it trivial to feed results into a state file or orchestration script.
+
+```bash
+# JSON output for programmatic consumption
+codegraph map --limit 50 --no-tests --json > recon-priority.json
+codegraph hotspots --no-tests --json >> recon-priority.json
+```
+
+For deeper structural understanding before touching anything:
+
+```bash
+# Structural summary of a high-traffic file — public API, internals, data flow
+codegraph explain src/builder.js
+
+# Understand a specific function before auditing it
+codegraph context buildGraph -T
+
+# Where is a symbol defined and who uses it?
+codegraph where resolveImports
+```
+
+### THE GAUNTLET: Audit every file against strict standards
+
+The Gauntlet needs each sub-agent to understand what a file does, what depends on it, and how risky changes are. Codegraph gives each agent full context without burning tokens on `grep`/`find`/`cat`:
+
+```bash
+# For each file the sub-agent is auditing:
+
+# 1. What does this file export, import, and contain?
+codegraph explain src/parser.js
+
+# 2. For each function that might need decomposition:
+#    Full context — source, deps, callers, signature
+codegraph context wasmExtractSymbols -T
+
+# 3. How many callers? What's the blast radius if we refactor?
+codegraph fn-impact wasmExtractSymbols -T
+
+# 4. What's the full call chain?
+codegraph fn wasmExtractSymbols -T --depth 5
+```
+
+When a sub-agent decides a function needs decomposition (complexity > 7, nesting > 3, 10+ mocks), it needs to know what breaks. `fn-impact` gives the complete blast radius **before** the agent writes a single line of code.
+
+The `--json` flag lets the orchestrator aggregate results across all sub-agents:
+
+```bash
+# Each sub-agent reports its audit findings as JSON
+codegraph fn-impact parseConfig -T --json > audit/parser.json
+```
+
+### GLOBAL SYNC: Identify overlapping fixes, build shared abstractions
+
+Before the swarm starts coding, a lead agent needs to see the big picture: which files are tightly coupled, where circular dependencies exist, and what shared abstractions could be extracted.
+
+```bash
+# Detect circular dependencies — these are prime candidates for abstraction
+codegraph cycles
+codegraph cycles --functions  # Function-level cycles
+
+# Find how two symbols are connected — reveals shared dependencies
+codegraph path parseConfig loadConfig -T
+codegraph path buildGraph resolveImports -T
+
+# File-level dependency map — what does this file import and what imports it?
+codegraph deps src/builder.js
+
+# Semantic search to find related code across the codebase
+codegraph search "config loading; settings parsing; env resolution"
+
+# Directory-level cohesion — which directories are well-organized vs tangled?
+codegraph structure
+```
+
+The lead agent can use `cycles` to identify dependency knots, `path` to understand how modules relate, and `structure` to assess directory cohesion. This analysis informs which shared abstractions to build before individual agents start their refactoring work.
+
+### STATE MACHINE: Track changes, verify impact, enable rollback
+
+The State Machine phase needs to validate that every change is safe. Codegraph's `diff-impact` is purpose-built for this:
+
+```bash
+# After a sub-agent makes changes and stages them:
+codegraph diff-impact --staged -T
+
+# Output: which functions changed, which callers are affected,
+# full transitive blast radius — all in one call
+
+# Compare current branch against main to see cumulative impact
+codegraph diff-impact main -T
+
+# Visual blast radius as a Mermaid diagram
+codegraph diff-impact --staged --format mermaid -T
+
+# JSON for the state machine to parse and validate
+codegraph diff-impact --staged -T --json > state/impact-check.json
+```
+
+The orchestrator can gate every commit: run `diff-impact --staged --json`, check that the blast radius is within acceptable bounds, and auto-rollback if it exceeds thresholds. Combined with `codegraph watch` for real-time graph updates, the state machine always has a current picture of the codebase.
+
+```bash
+# Watch mode — graph updates automatically as agents edit files
+codegraph watch .
+
+# After rollback, verify the graph is back to expected state
+codegraph stats --json
+```
+
+---
+
+## What's on the Roadmap
+
+Several planned features would make codegraph even more powerful for the Titan Paradigm. These are tracked in the [roadmap](../../roadmap/ROADMAP.md) and [backlog](../../roadmap/BACKLOG.md):
+
+### For RECON
+
+| Feature | Status | How it helps |
+|---------|--------|-------------|
+| **Node classification** ([Backlog #4](../../roadmap/BACKLOG.md)) | **Done** | Auto-tags every symbol as Entry Point, Core, Utility, or Adapter based on fan-in/fan-out. Available via `codegraph roles`, `where`, `explain`, `context`, and the `node_roles` MCP tool |
+| **Git change coupling** ([Backlog #9](../../roadmap/BACKLOG.md)) | **Done** | `codegraph co-change` analyzes git history for files that always change together. Integrated into `diff-impact` output via `historicallyCoupled` section. MCP tool `co_changes` |
+
+### For THE GAUNTLET
+
+| Feature | Status | How it helps |
+|---------|--------|-------------|
+| **Formal code health metrics** ([Backlog #6](../../roadmap/BACKLOG.md)) | Planned | Cyclomatic complexity, Maintainability Index, and Halstead metrics per function — directly maps to the Gauntlet's "complexity > 7 is a failure" rule. Computed from the AST we already parse |
+| **Build-time semantic metadata** ([Roadmap Phase 3.4](../../roadmap/ROADMAP.md#34--build-time-semantic-metadata)) | Planned | LLM-generated `complexity_notes`, `risk_score`, and `side_effects` per function. A sub-agent could query `codegraph assess <name>` and get "3 responsibilities, low cohesion — consider splitting" without analyzing the code itself |
+| **Community detection** ([Backlog #11](../../roadmap/BACKLOG.md)) | Planned | Leiden/Louvain algorithm to discover natural module boundaries vs actual file organization. Reveals which functions are tightly coupled and whether decomposition should follow the directory structure or propose a new one |
+
+### For GLOBAL SYNC
+
+| Feature | Status | How it helps |
+|---------|--------|-------------|
+| **Architecture boundary rules** ([Backlog #13](../../roadmap/BACKLOG.md)) | Planned | User-defined rules for allowed/forbidden dependencies between modules (e.g., "controllers must not import from other controllers"). The GLOBAL SYNC agent can enforce architectural standards automatically |
+| **Refactoring analysis** ([Roadmap Phase 7.5](../../roadmap/ROADMAP.md#75--refactoring-analysis)) | Planned | `split_analysis`, `extraction_candidates`, `boundary_analysis` — LLM-powered structural analysis that identifies exactly where shared abstractions should be created |
+| **Dead code detection** ([Backlog #1](../../roadmap/BACKLOG.md)) | **Done** | `codegraph roles --role dead -T` lists all symbols with zero fan-in that aren't exported. Delivered as part of node classification |
+
+### For STATE MACHINE
+
+| Feature | Status | How it helps |
+|---------|--------|-------------|
+| **Branch structural diff** ([Backlog #16](../../roadmap/BACKLOG.md)) | Planned | Compare code structure between two branches using git worktrees. Shows added/removed/changed symbols and their impact — perfect for validating that a refactoring branch hasn't broken the structural contract |
+| **GitHub Action + CI integration** ([Roadmap Phase 6](../../roadmap/ROADMAP.md#phase-6--github-integration--ci)) | Planned | Reusable GitHub Action that runs `diff-impact` on every PR, posts visual impact graphs, and fails if thresholds are exceeded — the STATE MACHINE becomes a CI gate |
+| **Streaming / chunked results** ([Backlog #20](../../roadmap/BACKLOG.md)) | Planned | Large codebases don't blow up agent context windows; consumers process results as they arrive instead of waiting for the full payload |
+
+---
+
+## Recommendations: Making Codegraph Even Better for This Use Case
+
+The features above cover what codegraph can do today and what's already planned. Beyond those, the Titan Paradigm points to a class of enhancements that would naturally follow the [LLM integration work](../../roadmap/ROADMAP.md#phase-3--intelligent-embeddings) (Roadmap Phase 3) — combining codegraph's structural graph with LLM intelligence to serve multi-agent orchestration directly.
+
+### 1. `codegraph audit` — one-call file assessment
+
+Once [build-time semantic metadata](../../roadmap/ROADMAP.md#34--build-time-semantic-metadata) (Phase 3.4) lands, codegraph will have `risk_score`, `complexity_notes`, and `side_effects` per function. A natural next step is a single `audit` command that combines these with `explain` and `fn-impact` into one structured report — exactly what each Gauntlet sub-agent needs.
+
+```bash
+# One call per file, everything a sub-agent needs to decide pass/fail
+codegraph audit src/parser.js --json
+# → { functions: [{ name, complexity, nesting_depth, fan_in, fan_out,
+#      risk_score, side_effects, callers_count, decomposition_hint }] }
+```
+
+With LLM-generated `complexity_notes`, the `decomposition_hint` could go beyond numbers ("complexity > 7") to actionable guidance ("3 responsibilities — split validation from persistence from notification").
+
+### 2. Batch querying for swarm agents
+
+Today, each query is a separate CLI invocation. For a swarm of 20+ sub-agents each auditing different files, a batch mode that accepts a list of targets and returns all results in one JSON payload would dramatically reduce overhead.
+
+```bash
+# Orchestrator sends one request, gets audit results for all targets
+codegraph audit --batch targets.json --json > audit-results.json
+```
+
+This becomes especially powerful after [module summaries](../../roadmap/ROADMAP.md#35--module-summaries) (Phase 3.5) — the batch output can include file-level narratives alongside function-level metrics, so sub-agents understand the module's role before diving into individual functions.
+
+### 3. `codegraph triage` — orchestrator-friendly priority queue
+
+`map` and `hotspots` give ranked lists, but the Titan Paradigm needs a single prioritized audit queue. After LLM integration, codegraph could combine graph centrality, `risk_score`, [git change coupling](../../roadmap/BACKLOG.md) (Backlog #9), and LLM-assessed complexity into one ranked list:
+
+```bash
+codegraph triage --limit 50 -T --json
+# → Ranked list: highest-risk, most-connected, most-churned files first
+# → Each entry includes: connectivity rank, risk_score, churn frequency,
+#    coupling cluster, estimated refactoring complexity
+```
+
+This replaces the RECON agent's synthesis work with a single call.
+
+### 4. `codegraph check` — change validation predicates
+
+The STATE MACHINE needs yes/no answers: "Did this change introduce a cycle?" "Did blast radius exceed N?" "Did any public API signature change?" Today this requires parsing JSON output. First-class exit codes or a `check` command with configurable predicates would make the state machine trivially scriptable:
+
+```bash
+# Exit code 1 if any predicate fails — perfect for CI gates and rollback triggers
+codegraph check --staged --no-new-cycles --max-blast-radius 20 --no-signature-changes
+```
+
+After [architecture boundary rules](../../roadmap/BACKLOG.md) (Backlog #13), this could also enforce "no new cross-boundary violations."
+
+### 5. Session-aware graph snapshots
+
+The STATE MACHINE tracks state across agent sessions. If codegraph could snapshot and restore graph states (lightweight — just the SQLite DB), the orchestrator could take a snapshot before each refactoring pass and restore on rollback, without rebuilding:
+
+```bash
+codegraph snapshot save pre-gauntlet
+# ... agents make changes ...
+codegraph snapshot restore pre-gauntlet   # instant rollback
+```
+
+After LLM integration, snapshots would also preserve embeddings, descriptions, and semantic metadata — so rolling back doesn't require re-running expensive LLM calls.
+
+### 6. MCP-native orchestration
+
+The Titan Paradigm's agents could run entirely through codegraph's [MCP server](../examples/MCP.md) instead of shelling out to the CLI. With 18 tools already exposed, the main gap is the `audit`/`triage`/`check` commands described above. After Phase 3, adding these as MCP tools — alongside [`ask_codebase`](../../roadmap/ROADMAP.md#43--mcp-integration) (Phase 4.3) for natural-language queries — would let orchestrators like Claude Code's agent teams query the graph with zero CLI overhead. The RECON agent asks the MCP server "what are the riskiest files?", each Gauntlet agent asks "should this function be decomposed?", and the STATE MACHINE asks "is this change safe?" — all through the same protocol.
+
+---
+
+## Getting Started
+
+To try the Titan Paradigm with codegraph today:
+
+```bash
+npm install -g @optave/codegraph
+cd your-project
+codegraph build
+```
+
+Then wire your orchestrator's RECON phase to start with:
+
+```bash
+codegraph map --limit 50 -T --json      # Priority queue
+codegraph hotspots -T --json             # Risk signals
+codegraph stats --json                   # Health baseline
+```
+
+Feed the results to your sub-agents, give each one `codegraph context` and `codegraph fn-impact`, and gate every commit through `codegraph diff-impact --staged --json`.
+
+For the full agent integration guide, see [AI Agent Guide](../ai-agent-guide.md). For MCP server setup, see [MCP Examples](../examples/MCP.md).

From 53f9a83beb0af80118dada643c934d78b0cdc170 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]" <github-actions[bot]@users.noreply.github.com>
Date: Thu, 26 Feb 2026 00:59:54 -0700
Subject: [PATCH 3/8] docs: restore Architecture Refactoring phase, fix
 references

- Restore Phase 3 (Architectural Refactoring) to ROADMAP
- Renumber phases 4-8 and all cross-references
- Fix MCP tool count per Greptile review
---
 docs/guides/recommended-practices.md |   2 +-
 docs/roadmap/BACKLOG.md              |   2 +-
 docs/roadmap/ROADMAP.md              | 389 +++++++++++++++++++++++----
 docs/use-cases/titan-paradigm.md     |  14 +-
 4 files changed, 348 insertions(+), 59 deletions(-)

diff --git a/docs/guides/recommended-practices.md b/docs/guides/recommended-practices.md
index 15734001..1eae4fe7 100644
--- a/docs/guides/recommended-practices.md
+++ b/docs/guides/recommended-practices.md
@@ -143,7 +143,7 @@ By default, the MCP server runs in **single-repo mode** — the AI agent can onl
 
 Enable `--multi-repo` to let the agent query any registered repository, or use `--repos` to restrict access to a specific set of repos.
 
-The server exposes 20 tools: `query_function`, `file_deps`, `impact_analysis`, `find_cycles`, `module_map`, `fn_deps`, `fn_impact`, `symbol_path`, `context`, `explain`, `where`, `diff_impact`, `semantic_search`, `export_graph`, `list_functions`, `structure`, `hotspots`, `node_roles`, `co_changes`, and `list_repos` (multi-repo only). See the [AI Agent Guide MCP reference](./ai-agent-guide.md#mcp-server-reference) for the full tool-to-CLI mapping table.
+The server exposes 19 tools (20 in multi-repo mode): `query_function`, `file_deps`, `impact_analysis`, `find_cycles`, `module_map`, `fn_deps`, `fn_impact`, `symbol_path`, `context`, `explain`, `where`, `diff_impact`, `semantic_search`, `export_graph`, `list_functions`, `structure`, `hotspots`, `node_roles`, `co_changes`, and `list_repos` (multi-repo only). See the [AI Agent Guide MCP reference](./ai-agent-guide.md#mcp-server-reference) for the full tool-to-CLI mapping table.
 
 ### CLAUDE.md for your project
 
diff --git a/docs/roadmap/BACKLOG.md b/docs/roadmap/BACKLOG.md
index 46d40020..39004f1e 100644
--- a/docs/roadmap/BACKLOG.md
+++ b/docs/roadmap/BACKLOG.md
@@ -33,7 +33,7 @@ Non-breaking, ordered by problem-fit:
 | 12 | ~~Execution flow tracing~~ | ~~Framework-aware entry point detection (Express routes, CLI commands, event handlers) + BFS flow tracing from entry to leaf. Inspired by axon, GitNexus, code-context-mcp.~~ | Navigation | ~~Agents can answer "what happens when a user hits POST /login?" by tracing the full execution path in one query~~ | ✓ | ✓ | 4 | No | **DONE** — `codegraph flow` command with entry point detection and BFS flow tracing. MCP tools `flow` and `entry_points` added. Merged in PR #118. |
 | 16 | Branch structural diff | Compare code structure between two branches using git worktrees. Show added/removed/changed symbols and their impact. Inspired by axon. | Analysis | Teams can review structural impact of feature branches before merge; agents get branch-aware context | ✓ | ✓ | 4 | No |
 | 20 | Streaming / chunked results | Support streaming output for large query results so MCP clients and programmatic consumers can process incrementally. | Embeddability | Large codebases don't blow up agent context windows; consumers process results as they arrive instead of waiting for the full payload | ✓ | ✓ | 4 | No |
-| 21 | Composite audit command | Single `codegraph audit <file-or-function>` that combines `explain`, `fn-impact`, and code health metrics into one structured report per function. Core version uses graph data; enhanced version includes Phase 3.4 `risk_score`/`complexity_notes`/`side_effects` when available. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) Gauntlet phase. | Orchestration | Each sub-agent in a multi-agent swarm gets everything it needs to assess a function in one call instead of 3-4 — directly reduces token waste and round-trips | ✓ | ✓ | 4 | No |
+| 21 | Composite audit command | Single `codegraph audit <file-or-function>` that combines `explain`, `fn-impact`, and code health metrics into one structured report per function. Core version uses graph data; enhanced version includes Phase 4.4 `risk_score`/`complexity_notes`/`side_effects` when available. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) Gauntlet phase. | Orchestration | Each sub-agent in a multi-agent swarm gets everything it needs to assess a function in one call instead of 3-4 — directly reduces token waste and round-trips | ✓ | ✓ | 4 | No |
 | 22 | Batch querying | Accept a list of targets (file or JSON) and return all query results in one JSON payload. Applies to `audit`, `fn-impact`, `context`, and other per-symbol commands. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) swarm pattern. | Orchestration | A swarm of 20+ agents auditing different files can be fed from a single orchestrator call instead of N sequential invocations — reduces overhead and enables parallel dispatch | ✓ | ✓ | 4 | No |
 | 23 | Triage priority queue | Single `codegraph triage` command that merges `map` connectivity, `hotspots` fan-in/fan-out, node roles, and optionally git churn + `risk_score` into one ranked audit queue. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) RECON phase. | Orchestration | Orchestrating agent gets a single prioritized list of what to audit first — replaces manual synthesis of 3+ commands, saves RECON phase from burning tokens on orientation | ✓ | ✓ | 4 | No |
 | 24 | Change validation predicates | `codegraph check --staged` with configurable predicates: `--no-new-cycles`, `--max-blast-radius N`, `--no-signature-changes`, `--no-boundary-violations`. Returns exit code 0/1 for CI gates and state machines. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) STATE MACHINE phase. | CI | Automated rollback triggers without parsing JSON — orchestrators and CI pipelines get first-class pass/fail signals for blast radius, cycles, and contract changes | ✓ | ✓ | 4 | No |
diff --git a/docs/roadmap/ROADMAP.md b/docs/roadmap/ROADMAP.md
index a351059c..8d25f2bd 100644
--- a/docs/roadmap/ROADMAP.md
+++ b/docs/roadmap/ROADMAP.md
@@ -2,7 +2,7 @@
 
 > **Current version:** 1.4.0 | **Status:** Active development | **Updated:** February 2026
 
-Codegraph is a strong local-first code graph CLI. This roadmap describes planned improvements across seven phases — closing gaps with commercial code intelligence platforms while preserving codegraph's core strengths: fully local, open source, zero cloud dependency by default.
+Codegraph is a strong local-first code graph CLI. This roadmap describes planned improvements across eight phases — closing gaps with commercial code intelligence platforms while preserving codegraph's core strengths: fully local, open source, zero cloud dependency by default.
 
 **LLM strategy:** All LLM-powered features are **optional enhancements**. Everything works without an API key. When configured (OpenAI, Anthropic, Ollama, or any OpenAI-compatible endpoint), users unlock richer semantic search and natural language queries.
 
@@ -14,21 +14,23 @@ Codegraph is a strong local-first code graph CLI. This roadmap describes planned
 |-------|-------|-----------------|--------|
 | [**1**](#phase-1--rust-core) | Rust Core | Rust parsing engine via napi-rs, parallel parsing, incremental tree-sitter, JS orchestration layer | **Complete** (v1.3.0) |
 | [**2**](#phase-2--foundation-hardening) | Foundation Hardening | Parser registry, complete MCP, test coverage, enhanced config, multi-repo MCP | **Complete** (v1.4.0) |
-| [**3**](#phase-3--intelligent-embeddings) | Intelligent Embeddings | LLM-generated descriptions, hybrid search, build-time semantic metadata, module summaries | Planned |
-| [**4**](#phase-4--natural-language-queries) | Natural Language Queries | `ask` command, conversational sessions, LLM-narrated graph queries, onboarding tools | Planned |
-| [**5**](#phase-5--expanded-language-support) | Expanded Language Support | 8 new languages (12 → 20), parser utilities | Planned |
-| [**6**](#phase-6--github-integration--ci) | GitHub Integration & CI | Reusable GitHub Action, LLM-enhanced PR review, visual impact graphs, SARIF output | Planned |
-| [**7**](#phase-7--interactive-visualization--advanced-features) | Visualization & Advanced | Web UI, dead code detection, monorepo, agentic search, refactoring analysis | Planned |
+| [**3**](#phase-3--architectural-refactoring) | Architectural Refactoring | Parser plugin system, repository pattern, pipeline builder, engine strategy, analysis/formatting split, domain errors, CLI commands, composable MCP, curated API | Planned |
+| [**4**](#phase-4--intelligent-embeddings) | Intelligent Embeddings | LLM-generated descriptions, hybrid search, build-time semantic metadata, module summaries | Planned |
+| [**5**](#phase-5--natural-language-queries) | Natural Language Queries | `ask` command, conversational sessions, LLM-narrated graph queries, onboarding tools | Planned |
+| [**6**](#phase-6--expanded-language-support) | Expanded Language Support | 8 new languages (12 → 20), parser utilities | Planned |
+| [**7**](#phase-7--github-integration--ci) | GitHub Integration & CI | Reusable GitHub Action, LLM-enhanced PR review, visual impact graphs, SARIF output | Planned |
+| [**8**](#phase-8--interactive-visualization--advanced-features) | Visualization & Advanced | Web UI, dead code detection, monorepo, agentic search, refactoring analysis | Planned |
 
 ### Dependency graph
 
 ```
 Phase 1 (Rust Core)
   └──→ Phase 2 (Foundation Hardening)
-         ├──→ Phase 3 (Embeddings + Metadata)  ──→ Phase 4 (NL Queries + Narration)
-         ├──→ Phase 5 (Languages)
-         └──→ Phase 6 (GitHub/CI) ←── Phase 3 (risk_score, side_effects)
-Phases 1-4 ──→ Phase 7 (Visualization + Refactoring Analysis)
+         └──→ Phase 3 (Architectural Refactoring)
+                ├──→ Phase 4 (Embeddings + Metadata)  ──→ Phase 5 (NL Queries + Narration)
+                ├──→ Phase 6 (Languages)
+                └──→ Phase 7 (GitHub/CI) ←── Phase 4 (risk_score, side_effects)
+Phases 1-5 ──→ Phase 8 (Visualization + Refactoring Analysis)
 ```
 
 ---
@@ -187,11 +189,297 @@ Support querying multiple codebases from a single MCP server instance.
 
 ---
 
-## Phase 3 — Intelligent Embeddings
+## Phase 3 — Architectural Refactoring
+
+**Goal:** Restructure the codebase for modularity, testability, and long-term maintainability. These are internal improvements — no new user-facing features, but they make every subsequent phase easier to build and maintain.
+
+> Reference: [generated/architecture.md](../generated/architecture.md) — full analysis with code examples and rationale.
+
+### 3.1 — Parser Plugin System
+
+Split `parser.js` (2,200+ lines) into a modular directory structure with isolated per-language extractors.
+
+```
+src/parser/
+  index.js              # Public API: parseFileAuto, parseFilesAuto
+  registry.js           # LANGUAGE_REGISTRY + extension mapping
+  engine.js             # Native/WASM init, engine resolution, grammar loading
+  tree-utils.js         # findChild, findParentClass, walkTree helpers
+  base-extractor.js     # Shared walk loop + accumulator framework
+  extractors/
+    javascript.js       # JS/TS/TSX
+    python.js
+    go.js
+    rust.js
+    java.js
+    csharp.js
+    ruby.js
+    php.js
+    hcl.js
+```
+
+Introduce a `BaseExtractor` that owns the tree walk loop. Each language extractor declares a `nodeType → handler` map instead of reimplementing the traversal. Eliminates repeated walk-and-switch boilerplate across 9+ extractors.
+
+**Affected files:** `src/parser.js` → split into `src/parser/`
+
+### 3.2 — Repository Pattern for Data Access
+
+Consolidate all SQL into a single `Repository` class. Currently SQL is scattered across `builder.js`, `queries.js`, `embedder.js`, `watcher.js`, and `cycles.js`.
+
+```
+src/db/
+  connection.js         # Open, WAL mode, pragma tuning
+  migrations.js         # Schema versions
+  repository.js         # ALL data access methods (reads + writes)
+```
+
+All prepared statements, index tuning, and schema knowledge live in one place. Consumers never see SQL. Enables an `InMemoryRepository` for fast unit tests.
+
+**Affected files:** `src/db.js` → split into `src/db/`, SQL extracted from `builder.js`, `queries.js`, `embedder.js`, `watcher.js`, `cycles.js`
+
+### 3.3 — Analysis / Formatting Separation
+
+Split `queries.js` (800+ lines) into pure analysis modules and presentation formatters.
+
+```
+src/analysis/           # Pure data: take repository, return typed results
+  impact.js
+  call-chain.js
+  diff-impact.js
+  module-map.js
+  class-hierarchy.js
+
+src/formatters/         # Presentation: take data, produce strings
+  cli-formatter.js
+  json-formatter.js
+  table-formatter.js
+```
+
+Analysis modules return pure data. The CLI, MCP server, and programmatic API each pick their own formatter (or none). Eliminates the `*Data()` / `*()` dual-function pattern.
+
+**Affected files:** `src/queries.js` → split into `src/analysis/` + `src/formatters/`
+
+### 3.4 — Builder Pipeline Architecture
+
+Refactor `buildGraph()` from a monolithic mega-function into explicit, independently testable pipeline stages.
+
+```js
+const pipeline = [
+  collectFiles,      // (rootDir, config) => filePaths[]
+  detectChanges,     // (filePaths, db) => { changed, removed, isFullBuild }
+  parseFiles,        // (filePaths, engineOpts) => Map<file, symbols>
+  insertNodes,       // (symbolMap, db) => nodeIndex
+  resolveImports,    // (symbolMap, rootDir, aliases) => importEdges[]
+  buildCallEdges,    // (symbolMap, nodeIndex) => callEdges[]
+  buildClassEdges,   // (symbolMap, nodeIndex) => classEdges[]
+  resolveBarrels,    // (edges, symbolMap) => resolvedEdges[]
+  insertEdges,       // (allEdges, db) => stats
+]
+```
+
+Watch mode reuses the same stages (triggered per-file instead of per-project), eliminating the divergence between `watcher.js` and `builder.js` where bug fixes must be applied separately.
+
+**Affected files:** `src/builder.js`, `src/watcher.js`
+
+### 3.5 — Unified Engine Interface
+
+Replace scattered `engine.name === 'native'` branching with a Strategy pattern. Every consumer receives an engine object with the same API regardless of backend.
+
+```js
+const engine = createEngine(opts) // returns same interface for native or WASM
+engine.parseFile(path, source)
+engine.resolveImports(batch, rootDir, aliases)
+engine.detectCycles(db)
+```
+
+Consumers never branch on native vs WASM. Adding a third backend (e.g., remote parsing service) requires zero consumer changes.
+
+**Affected files:** `src/parser.js`, `src/resolve.js`, `src/cycles.js`, `src/builder.js`, `src/native.js`
+
+### 3.6 — Qualified Names & Hierarchical Scoping
+
+Enrich the node model with scope information to reduce ambiguity.
+
+```sql
+ALTER TABLE nodes ADD COLUMN qualified_name TEXT;  -- 'DateHelper.format'
+ALTER TABLE nodes ADD COLUMN scope TEXT;            -- 'DateHelper'
+ALTER TABLE nodes ADD COLUMN visibility TEXT;       -- 'public' | 'private' | 'protected'
+```
+
+Enables queries like "all methods of class X" without traversing edges. Reduces reliance on heuristic confidence scoring for name collisions.
+
+**Affected files:** `src/db.js`, `src/parser.js` (extractors), `src/queries.js`, `src/builder.js`
+
+### 3.7 — Composable MCP Tool Registry
+
+Replace the monolithic `TOOLS` array + `switch` dispatch in `mcp.js` with self-contained tool modules.
+
+```
+src/mcp/
+  server.js             # MCP server setup, transport, lifecycle
+  tool-registry.js      # Dynamic tool registration + auto-discovery
+  tools/
+    query-function.js   # { schema, handler } per tool
+    file-deps.js
+    impact-analysis.js
+    ...
+```
+
+Adding a new MCP tool = adding a file. No other files change.
+
+**Affected files:** `src/mcp.js` → split into `src/mcp/`
+
+### 3.8 — CLI Command Objects
+
+Move from inline Commander chains in `cli.js` to self-contained command modules.
+
+```
+src/cli/
+  index.js              # Commander setup, auto-discover commands
+  commands/
+    build.js            # { name, description, options, validate, execute }
+    query.js
+    impact.js
+    ...
+```
+
+Each command is independently testable by calling `execute()` directly. The CLI index auto-discovers and registers them.
+
+**Affected files:** `src/cli.js` → split into `src/cli/`
+
+### 3.9 — Domain Error Hierarchy
+
+Replace ad-hoc error handling (mix of thrown `Error`, returned `null`, `logger.warn()`, `process.exit(1)`) with structured domain errors.
+
+```js
+class CodegraphError extends Error { constructor(message, { code, file, cause }) { ... } }
+class ParseError extends CodegraphError { code = 'PARSE_FAILED' }
+class DbError extends CodegraphError { code = 'DB_ERROR' }
+class ConfigError extends CodegraphError { code = 'CONFIG_INVALID' }
+class ResolutionError extends CodegraphError { code = 'RESOLUTION_FAILED' }
+class EngineError extends CodegraphError { code = 'ENGINE_UNAVAILABLE' }
+```
+
+CLI catches domain errors and formats for humans. MCP returns structured error responses. No more `process.exit()` from library code.
+
+**New file:** `src/errors.js`
+
+### 3.10 — Curated Public API Surface
+
+Reduce `index.js` from ~40 re-exports to a curated public API. Use `package.json` `exports` field to enforce module boundaries.
+
+```json
+{ "exports": { ".": "./src/index.js", "./cli": "./src/cli.js" } }
+```
+
+Internal modules become truly internal. Consumers can only import from documented entry points.
+
+**Affected files:** `src/index.js`, `package.json`
+
+### 3.11 — Embedder Subsystem Extraction
+
+Restructure `embedder.js` (525 lines) into a standalone subsystem with pluggable vector storage.
+
+```
+src/embeddings/
+  index.js              # Public API
+  model-registry.js     # Model definitions, batch sizes, loading
+  generator.js          # Source → text preparation → batch embedding
+  store.js              # Vector storage (pluggable: SQLite blob, HNSW index)
+  search.js             # Similarity search, RRF multi-query fusion
+```
+
+Decouples embedding schema from the graph DB. The pluggable store interface enables future O(log n) ANN search (e.g., `hnswlib-node`) when symbol counts reach 50K+.
+
+**Affected files:** `src/embedder.js` → split into `src/embeddings/`
+
+### 3.12 — Testing Pyramid
+
+Add proper unit test layer below the existing integration tests.
+
+- Pure unit tests for extractors (pass AST node, assert symbols — no file I/O)
+- Pure unit tests for BFS/Tarjan algorithms (pass adjacency list, assert result)
+- Pure unit tests for confidence scoring (pass parameters, assert score)
+- Repository mock for query tests (in-memory data, no SQLite)
+- E2E tests that invoke the CLI binary and assert exit codes + stdout
+
+The repository pattern (3.2) directly enables this: unit tests use `InMemoryRepository`, integration tests use `SqliteRepository`.
+
+### 3.13 — Event-Driven Pipeline
+
+Add an event/streaming architecture to the build pipeline for progress reporting, cancellation, and large-repo support.
+
+```js
+pipeline.on('file:parsed',    (file, symbols) => { /* progress */ })
+pipeline.on('file:indexed',   (file, nodeCount) => { /* progress */ })
+pipeline.on('build:complete',  (stats) => { /* summary */ })
+pipeline.on('error',           (file, err) => { /* continue or abort */ })
+await pipeline.run(rootDir)
+```
+
+Unifies build and watch code paths. Large builds stream results to the DB incrementally instead of buffering in memory.
+
+**Affected files:** `src/builder.js`, `src/watcher.js`, `src/cli.js`
+
+### 3.14 — Subgraph Export Filtering
+
+Add focus/filter options to the export module so visualizations are usable for real projects.
+
+```bash
+codegraph export --format dot --focus src/builder.js --depth 2
+codegraph export --format mermaid --filter "src/api/**" --kind function
+codegraph export --format json --changed
+```
+
+The export module receives a subgraph specification (focus node + depth, file pattern, kind filter) and extracts the relevant subgraph before formatting.
+
+**Affected files:** `src/export.js`, `src/cli.js`
+
+### 3.15 — Transitive Import-Aware Confidence
+
+Before falling back to proximity heuristics, walk the import graph from the caller file. If any import path (even indirect through barrel files) reaches a candidate, score it 0.9. Only fall back to proximity when no import path exists.
+
+**Affected files:** `src/resolve.js`, `src/builder.js`
+
+### 3.16 — Query Result Caching
+
+Add a TTL/LRU cache between the analysis layer and the repository. Particularly valuable for MCP where an agent session may repeatedly query related symbols.
+
+```js
+class QueryCache {
+  constructor(db, maxAge = 60_000) { ... }
+  get(key) { ... }        // key = query name + args hash
+  set(key, value) { ... }
+  invalidate() { ... }    // called after any DB mutation
+}
+```
+
+### 3.17 — Configuration Profiles
+
+Support profile-based configuration for monorepos with multiple services.
+
+```json
+{
+  "profiles": {
+    "backend":  { "include": ["services/api/**"], "build": { "dbPath": ".codegraph/api.db" } },
+    "frontend": { "include": ["apps/web/**"], "build": { "dbPath": ".codegraph/web.db" } }
+  }
+}
+```
+
+```bash
+codegraph build --profile backend
+```
+
+**Affected files:** `src/config.js`, `src/cli.js`
+
+---
+
+## Phase 4 — Intelligent Embeddings
 
 **Goal:** Dramatically improve semantic search quality by embedding natural-language descriptions instead of raw code.
 
-### 3.1 — LLM Description Generator
+### 4.1 — LLM Description Generator
 
 For each function/method/class node, generate a concise natural-language description:
 
@@ -219,7 +507,7 @@ For each function/method/class node, generate a concise natural-language descrip
 
 **New file:** `src/describer.js`
 
-### 3.2 — Enhanced Embedding Pipeline
+### 4.2 — Enhanced Embedding Pipeline
 
 - When descriptions exist, embed the description text instead of raw code
 - Keep raw code as fallback when no description is available
@@ -230,7 +518,7 @@ For each function/method/class node, generate a concise natural-language descrip
 
 **Affected files:** `src/embedder.js`
 
-### 3.3 — Hybrid Search
+### 4.3 — Hybrid Search
 
 Combine vector similarity with keyword matching.
 
@@ -243,7 +531,7 @@ Combine vector similarity with keyword matching.
 
 **Affected files:** `src/embedder.js`, `src/db.js`
 
-### 3.4 — Build-time Semantic Metadata
+### 4.4 — Build-time Semantic Metadata
 
 Enrich nodes with LLM-generated metadata beyond descriptions. Computed incrementally at build time (only for changed nodes), stored as columns on the `nodes` table.
 
@@ -256,9 +544,9 @@ Enrich nodes with LLM-generated metadata beyond descriptions. Computed increment
 - MCP tool: `assess <name>` — returns complexity rating + specific concerns
 - Cascade invalidation: when a node changes, mark dependents for re-enrichment
 
-**Depends on:** 3.1 (LLM provider abstraction)
+**Depends on:** 4.1 (LLM provider abstraction)
 
-### 3.5 — Module Summaries
+### 4.5 — Module Summaries
 
 Aggregate function descriptions + dependency direction into file-level narratives.
 
@@ -266,17 +554,17 @@ Aggregate function descriptions + dependency direction into file-level narrative
 - MCP tool: `explain_module <file>` — returns module purpose, key exports, role in the system
 - `naming_conventions` metadata per module — detected patterns (camelCase, snake_case, verb-first), flag outliers
 
-**Depends on:** 3.1 (function-level descriptions must exist first)
+**Depends on:** 4.1 (function-level descriptions must exist first)
 
 > **Full spec:** See [llm-integration.md](./llm-integration.md) for detailed architecture, infrastructure table, and prompt design.
 
 ---
 
-## Phase 4 — Natural Language Queries
+## Phase 5 — Natural Language Queries
 
 **Goal:** Allow developers to ask questions about their codebase in plain English.
 
-### 4.1 — Query Engine
+### 5.1 — Query Engine
 
 ```bash
 codegraph ask "How does the authentication flow work?"
@@ -302,7 +590,7 @@ codegraph ask "How does the authentication flow work?"
 
 **New file:** `src/nlquery.js`
 
-### 4.2 — Conversational Sessions
+### 5.2 — Conversational Sessions
 
 Multi-turn conversations with session memory.
 
@@ -316,7 +604,7 @@ codegraph sessions clear
 - Store conversation history in SQLite table `sessions`
 - Include prior Q&A pairs in subsequent prompts
 
-### 4.3 — MCP Integration
+### 5.3 — MCP Integration
 
 New MCP tool: `ask_codebase` — natural language query via MCP.
 
@@ -324,7 +612,7 @@ Enables AI coding agents (Claude Code, Cursor, etc.) to ask codegraph questions
 
 **Affected files:** `src/mcp.js`
 
-### 4.4 — LLM-Narrated Graph Queries
+### 5.4 — LLM-Narrated Graph Queries
 
 Graph traversal + LLM narration for questions that require both structural data and natural-language explanation. Each query walks the graph first, then sends the structural result to the LLM for narration.
 
@@ -337,9 +625,9 @@ Graph traversal + LLM narration for questions that require both structural data
 
 Pre-computed `flow_narratives` table caches results for key entry points at build time, invalidated when any node in the chain changes.
 
-**Depends on:** 3.4 (`side_effects` metadata), 3.1 (descriptions for narration context)
+**Depends on:** 4.4 (`side_effects` metadata), 4.1 (descriptions for narration context)
 
-### 4.5 — Onboarding & Navigation Tools
+### 5.5 — Onboarding & Navigation Tools
 
 Help new contributors and AI agents orient in an unfamiliar codebase.
 
@@ -348,15 +636,15 @@ Help new contributors and AI agents orient in an unfamiliar codebase.
 - MCP tool: `get_started` — returns ordered list: "start here, then read this, then this"
 - `change_plan <description>` — LLM reads description, graph identifies relevant modules, returns touch points and test coverage gaps
 
-**Depends on:** 3.5 (module summaries for context), 4.1 (query engine)
+**Depends on:** 4.5 (module summaries for context), 5.1 (query engine)
 
 ---
 
-## Phase 5 — Expanded Language Support
+## Phase 6 — Expanded Language Support
 
 **Goal:** Go from 12 → 20 supported languages.
 
-### 5.1 — Batch 1: High Demand
+### 6.1 — Batch 1: High Demand
 
 | Language | Extensions | Grammar | Effort |
 |----------|-----------|---------|--------|
@@ -365,7 +653,7 @@ Help new contributors and AI agents orient in an unfamiliar codebase.
 | Kotlin | `.kt`, `.kts` | `tree-sitter-kotlin` | Low |
 | Swift | `.swift` | `tree-sitter-swift` | Medium |
 
-### 5.2 — Batch 2: Growing Ecosystems
+### 6.2 — Batch 2: Growing Ecosystems
 
 | Language | Extensions | Grammar | Effort |
 |----------|-----------|---------|--------|
@@ -374,7 +662,7 @@ Help new contributors and AI agents orient in an unfamiliar codebase.
 | Lua | `.lua` | `tree-sitter-lua` | Low |
 | Zig | `.zig` | `tree-sitter-zig` | Low |
 
-### 5.3 — Parser Abstraction Layer
+### 6.3 — Parser Abstraction Layer
 
 Extract shared patterns from existing extractors into reusable helpers.
 
@@ -390,11 +678,11 @@ Extract shared patterns from existing extractors into reusable helpers.
 
 ---
 
-## Phase 6 — GitHub Integration & CI
+## Phase 7 — GitHub Integration & CI
 
 **Goal:** Bring codegraph's analysis into pull request workflows.
 
-### 6.1 — Reusable GitHub Action
+### 7.1 — Reusable GitHub Action
 
 A reusable GitHub Action that runs on PRs:
 
@@ -416,7 +704,7 @@ A reusable GitHub Action that runs on PRs:
 
 **New file:** `.github/actions/codegraph-ci/action.yml`
 
-### 6.2 — PR Review Integration
+### 7.2 — PR Review Integration
 
 ```bash
 codegraph review --pr <number>
@@ -439,7 +727,7 @@ Requires `gh` CLI. For each changed function:
 
 **New file:** `src/github.js`
 
-### 6.3 — Visual Impact Graphs for PRs
+### 7.3 — Visual Impact Graphs for PRs
 
 Extend the existing `diff-impact --format mermaid` foundation with CI automation and LLM annotations.
 
@@ -460,9 +748,9 @@ Extend the existing `diff-impact --format mermaid` foundation with CI automation
 - Highlight fragile nodes: high churn + high fan-in = high breakage risk
 - Track blast radius trends: "this PR's blast radius is 2× larger than your average"
 
-**Depends on:** 6.1 (GitHub Action), 3.4 (`risk_score`, `side_effects`)
+**Depends on:** 7.1 (GitHub Action), 4.4 (`risk_score`, `side_effects`)
 
-### 6.4 — SARIF Output
+### 7.4 — SARIF Output
 
 Add SARIF output format for cycle detection. SARIF integrates with GitHub Code Scanning, showing issues inline in the PR.
 
@@ -470,9 +758,9 @@ Add SARIF output format for cycle detection. SARIF integrates with GitHub Code S
 
 ---
 
-## Phase 7 — Interactive Visualization & Advanced Features
+## Phase 8 — Interactive Visualization & Advanced Features
 
-### 7.1 — Interactive Web Visualization
+### 8.1 — Interactive Web Visualization
 
 ```bash
 codegraph viz
@@ -492,7 +780,7 @@ Opens a local web UI at `localhost:3000` with:
 
 **New file:** `src/visualizer.js`
 
-### 7.2 — Dead Code Detection
+### 8.2 — Dead Code Detection
 
 ```bash
 codegraph dead
@@ -503,7 +791,7 @@ Find functions/methods/classes with zero incoming edges (never called). Filters
 
 **Affected files:** `src/queries.js`
 
-### 7.3 — Cross-Repository Support (Monorepo)
+### 8.3 — Cross-Repository Support (Monorepo)
 
 Support multi-package monorepos with cross-package edges.
 
@@ -513,7 +801,7 @@ Support multi-package monorepos with cross-package edges.
 - `codegraph build --workspace` to scan all packages
 - Impact analysis across package boundaries
 
-### 7.4 — Agentic Search
+### 8.4 — Agentic Search
 
 Recursive reference-following search that traces connections.
 
@@ -535,7 +823,7 @@ codegraph agent-search "payment processing"
 
 **New file:** `src/agentic-search.js`
 
-### 7.5 — Refactoring Analysis
+### 8.5 — Refactoring Analysis
 
 LLM-powered structural analysis that identifies refactoring opportunities. The graph provides the structural data; the LLM interprets it.
 
@@ -548,9 +836,9 @@ LLM-powered structural analysis that identifies refactoring opportunities. The g
 | `hotspots` | High fan-in + high fan-out + on many paths | Ranked fragility report with explanations, `risk_score` per node |
 | `boundary_analysis` | Graph clustering (tightly-coupled groups spanning modules) | Reorganization suggestions: "these 4 functions in 3 files all deal with auth" |
 
-**Depends on:** 3.4 (`risk_score`, `complexity_notes`), 3.5 (module summaries)
+**Depends on:** 4.4 (`risk_score`, `complexity_notes`), 4.5 (module summaries)
 
-### 7.6 — Auto-generated Docstrings
+### 8.6 — Auto-generated Docstrings
 
 ```bash
 codegraph annotate
@@ -559,7 +847,7 @@ codegraph annotate --changed-only
 
 LLM-generated docstrings aware of callers, callees, and types. Diff-aware: only regenerate for functions whose code or dependencies changed. Stores in `docstrings` column on nodes table — does not modify source files unless explicitly requested.
 
-**Depends on:** 3.1 (LLM provider abstraction), 3.4 (side effects context)
+**Depends on:** 4.1 (LLM provider abstraction), 4.4 (side effects context)
 
 > **Full spec:** See [llm-integration.md](./llm-integration.md) for detailed architecture, infrastructure tables, and prompt design for all LLM-powered features.
 
@@ -573,11 +861,12 @@ Each phase includes targeted verification:
 |-------|-------------|
 | **1** | Benchmark native vs WASM parsing on a large repo, verify identical output from both engines |
 | **2** | `npm test`, manual MCP client test for all tools, config loading tests |
-| **3** | Compare `codegraph search` quality before/after descriptions; verify `side_effects` and `risk_score` populated for LLM-enriched builds |
-| **4** | `codegraph ask "How does import resolution work?"` against codegraph itself; verify `trace_flow` and `get_started` produce coherent narration |
-| **5** | Parse sample files for each new language, verify definitions/calls/imports |
-| **6** | Test PR in a fork, verify GitHub Action comment with Mermaid graph and risk labels is posted |
-| **7** | `codegraph viz` loads; `hotspots` returns ranked list; `split_analysis` produces actionable output |
+| **3** | All existing tests pass; each refactored module produces identical output to the pre-refactoring version; unit tests for pure analysis modules |
+| **4** | Compare `codegraph search` quality before/after descriptions; verify `side_effects` and `risk_score` populated for LLM-enriched builds |
+| **5** | `codegraph ask "How does import resolution work?"` against codegraph itself; verify `trace_flow` and `get_started` produce coherent narration |
+| **6** | Parse sample files for each new language, verify definitions/calls/imports |
+| **7** | Test PR in a fork, verify GitHub Action comment with Mermaid graph and risk labels is posted |
+| **8** | `codegraph viz` loads; `hotspots` returns ranked list; `split_analysis` produces actionable output |
 
 **Full integration test** after all phases:
 
diff --git a/docs/use-cases/titan-paradigm.md b/docs/use-cases/titan-paradigm.md
index 9a0962f3..2103f054 100644
--- a/docs/use-cases/titan-paradigm.md
+++ b/docs/use-cases/titan-paradigm.md
@@ -172,7 +172,7 @@ Several planned features would make codegraph even more powerful for the Titan P
 | Feature | Status | How it helps |
 |---------|--------|-------------|
 | **Formal code health metrics** ([Backlog #6](../../roadmap/BACKLOG.md)) | Planned | Cyclomatic complexity, Maintainability Index, and Halstead metrics per function — directly maps to the Gauntlet's "complexity > 7 is a failure" rule. Computed from the AST we already parse |
-| **Build-time semantic metadata** ([Roadmap Phase 3.4](../../roadmap/ROADMAP.md#34--build-time-semantic-metadata)) | Planned | LLM-generated `complexity_notes`, `risk_score`, and `side_effects` per function. A sub-agent could query `codegraph assess <name>` and get "3 responsibilities, low cohesion — consider splitting" without analyzing the code itself |
+| **Build-time semantic metadata** ([Roadmap Phase 4.4](../../roadmap/ROADMAP.md#44--build-time-semantic-metadata)) | Planned | LLM-generated `complexity_notes`, `risk_score`, and `side_effects` per function. A sub-agent could query `codegraph assess <name>` and get "3 responsibilities, low cohesion — consider splitting" without analyzing the code itself |
 | **Community detection** ([Backlog #11](../../roadmap/BACKLOG.md)) | Planned | Leiden/Louvain algorithm to discover natural module boundaries vs actual file organization. Reveals which functions are tightly coupled and whether decomposition should follow the directory structure or propose a new one |
 
 ### For GLOBAL SYNC
@@ -180,7 +180,7 @@ Several planned features would make codegraph even more powerful for the Titan P
 | Feature | Status | How it helps |
 |---------|--------|-------------|
 | **Architecture boundary rules** ([Backlog #13](../../roadmap/BACKLOG.md)) | Planned | User-defined rules for allowed/forbidden dependencies between modules (e.g., "controllers must not import from other controllers"). The GLOBAL SYNC agent can enforce architectural standards automatically |
-| **Refactoring analysis** ([Roadmap Phase 7.5](../../roadmap/ROADMAP.md#75--refactoring-analysis)) | Planned | `split_analysis`, `extraction_candidates`, `boundary_analysis` — LLM-powered structural analysis that identifies exactly where shared abstractions should be created |
+| **Refactoring analysis** ([Roadmap Phase 8.5](../../roadmap/ROADMAP.md#85--refactoring-analysis)) | Planned | `split_analysis`, `extraction_candidates`, `boundary_analysis` — LLM-powered structural analysis that identifies exactly where shared abstractions should be created |
 | **Dead code detection** ([Backlog #1](../../roadmap/BACKLOG.md)) | **Done** | `codegraph roles --role dead -T` lists all symbols with zero fan-in that aren't exported. Delivered as part of node classification |
 
 ### For STATE MACHINE
@@ -188,18 +188,18 @@ Several planned features would make codegraph even more powerful for the Titan P
 | Feature | Status | How it helps |
 |---------|--------|-------------|
 | **Branch structural diff** ([Backlog #16](../../roadmap/BACKLOG.md)) | Planned | Compare code structure between two branches using git worktrees. Shows added/removed/changed symbols and their impact — perfect for validating that a refactoring branch hasn't broken the structural contract |
-| **GitHub Action + CI integration** ([Roadmap Phase 6](../../roadmap/ROADMAP.md#phase-6--github-integration--ci)) | Planned | Reusable GitHub Action that runs `diff-impact` on every PR, posts visual impact graphs, and fails if thresholds are exceeded — the STATE MACHINE becomes a CI gate |
+| **GitHub Action + CI integration** ([Roadmap Phase 7](../../roadmap/ROADMAP.md#phase-7--github-integration--ci)) | Planned | Reusable GitHub Action that runs `diff-impact` on every PR, posts visual impact graphs, and fails if thresholds are exceeded — the STATE MACHINE becomes a CI gate |
 | **Streaming / chunked results** ([Backlog #20](../../roadmap/BACKLOG.md)) | Planned | Large codebases don't blow up agent context windows; consumers process results as they arrive instead of waiting for the full payload |
 
 ---
 
 ## Recommendations: Making Codegraph Even Better for This Use Case
 
-The features above cover what codegraph can do today and what's already planned. Beyond those, the Titan Paradigm points to a class of enhancements that would naturally follow the [LLM integration work](../../roadmap/ROADMAP.md#phase-3--intelligent-embeddings) (Roadmap Phase 3) — combining codegraph's structural graph with LLM intelligence to serve multi-agent orchestration directly.
+The features above cover what codegraph can do today and what's already planned. Beyond those, the Titan Paradigm points to a class of enhancements that would naturally follow the [LLM integration work](../../roadmap/ROADMAP.md#phase-4--intelligent-embeddings) (Roadmap Phase 4) — combining codegraph's structural graph with LLM intelligence to serve multi-agent orchestration directly.
 
 ### 1. `codegraph audit` — one-call file assessment
 
-Once [build-time semantic metadata](../../roadmap/ROADMAP.md#34--build-time-semantic-metadata) (Phase 3.4) lands, codegraph will have `risk_score`, `complexity_notes`, and `side_effects` per function. A natural next step is a single `audit` command that combines these with `explain` and `fn-impact` into one structured report — exactly what each Gauntlet sub-agent needs.
+Once [build-time semantic metadata](../../roadmap/ROADMAP.md#44--build-time-semantic-metadata) (Phase 4.4) lands, codegraph will have `risk_score`, `complexity_notes`, and `side_effects` per function. A natural next step is a single `audit` command that combines these with `explain` and `fn-impact` into one structured report — exactly what each Gauntlet sub-agent needs.
 
 ```bash
 # One call per file, everything a sub-agent needs to decide pass/fail
@@ -219,7 +219,7 @@ Today, each query is a separate CLI invocation. For a swarm of 20+ sub-agents ea
 codegraph audit --batch targets.json --json > audit-results.json
 ```
 
-This becomes especially powerful after [module summaries](../../roadmap/ROADMAP.md#35--module-summaries) (Phase 3.5) — the batch output can include file-level narratives alongside function-level metrics, so sub-agents understand the module's role before diving into individual functions.
+This becomes especially powerful after [module summaries](../../roadmap/ROADMAP.md#45--module-summaries) (Phase 4.5) — the batch output can include file-level narratives alongside function-level metrics, so sub-agents understand the module's role before diving into individual functions.
 
 ### 3. `codegraph triage` — orchestrator-friendly priority queue
 
@@ -259,7 +259,7 @@ After LLM integration, snapshots would also preserve embeddings, descriptions, a
 
 ### 6. MCP-native orchestration
 
-The Titan Paradigm's agents could run entirely through codegraph's [MCP server](../examples/MCP.md) instead of shelling out to the CLI. With 18 tools already exposed, the main gap is the `audit`/`triage`/`check` commands described above. After Phase 3, adding these as MCP tools — alongside [`ask_codebase`](../../roadmap/ROADMAP.md#43--mcp-integration) (Phase 4.3) for natural-language queries — would let orchestrators like Claude Code's agent teams query the graph with zero CLI overhead. The RECON agent asks the MCP server "what are the riskiest files?", each Gauntlet agent asks "should this function be decomposed?", and the STATE MACHINE asks "is this change safe?" — all through the same protocol.
+The Titan Paradigm's agents could run entirely through codegraph's [MCP server](../examples/MCP.md) instead of shelling out to the CLI. With 18 tools already exposed, the main gap is the `audit`/`triage`/`check` commands described above. After Phase 4, adding these as MCP tools — alongside [`ask_codebase`](../../roadmap/ROADMAP.md#53--mcp-integration) (Phase 5.3) for natural-language queries — would let orchestrators like Claude Code's agent teams query the graph with zero CLI overhead. The RECON agent asks the MCP server "what are the riskiest files?", each Gauntlet agent asks "should this function be decomposed?", and the STATE MACHINE asks "is this change safe?" — all through the same protocol.
 
 ---
 

From ab57fb6d4b27e489759feada49fdd7a17f47a6e6 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]" <github-actions[bot]@users.noreply.github.com>
Date: Thu, 26 Feb 2026 01:29:26 -0700
Subject: [PATCH 4/8] fix: correct MCP tool counts and backlog ID collisions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Address Greptile review comments on #121:
- Update MCP tool counts from 18/19 to 21 (22 in multi-repo mode)
  across README, recommended-practices, dogfood skill, titan-paradigm
- Add missing execution_flow and list_entry_points to tool enumeration
- Renumber new backlog items 21-26 → 27-32 to avoid collision with
  existing items 21-22
---
 .claude/skills/dogfood/SKILL.md      |  2 +-
 README.md                            |  4 ++--
 docs/guides/recommended-practices.md |  2 +-
 docs/roadmap/BACKLOG.md              | 12 ++++++------
 docs/use-cases/titan-paradigm.md     |  2 +-
 5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/.claude/skills/dogfood/SKILL.md b/.claude/skills/dogfood/SKILL.md
index c740849c..1cce1479 100644
--- a/.claude/skills/dogfood/SKILL.md
+++ b/.claude/skills/dogfood/SKILL.md
@@ -203,7 +203,7 @@ Before writing the report, **stop and think** about:
 
 - What testing approaches am I missing?
 - **Cross-command pipelines:** Have I tested `build` → `embed` → `search` → modify → `build` → `search`? Have I tested `watch` detecting changes then `diff-impact`?
-- **MCP server:** Have I tested the `mcp` command? Initialize via JSON-RPC on stdin, send `tools/list`, verify all 18 tools are present. Test single-repo mode (default — `list_repos` should be absent, no `repo` parameter on tools) vs `--multi-repo` mode.
+- **MCP server:** Have I tested the `mcp` command? Initialize via JSON-RPC on stdin, send `tools/list`, verify all 21 tools are present. Test single-repo mode (default — `list_repos` should be absent, no `repo` parameter on tools) vs `--multi-repo` mode.
 - **Programmatic API:** Have I tested `require('@optave/codegraph')` or `import` from `index.js`? Key exports to verify: `buildGraph`, `loadConfig`, `openDb`, `findDbPath`, `contextData`, `explainData`, `whereData`, `fnDepsData`, `diffImpactData`, `statsData`, `isNativeAvailable`, `EXTENSIONS`, `IGNORE_DIRS`, `ALL_SYMBOL_KINDS`, `MODELS`.
 - **Config options:** Have I tested `.codegraphrc.json`? Create one with `include`/`exclude` patterns, custom `aliases`, `build.incremental: false`, `query.defaultDepth`, `search.defaultMinScore`. Verify overrides work.
 - **Env var overrides:** `CODEGRAPH_LLM_PROVIDER`, `CODEGRAPH_LLM_API_KEY`, `CODEGRAPH_LLM_MODEL`, `CODEGRAPH_REGISTRY_PATH`.
diff --git a/README.md b/README.md
index 6989c6b2..f9db3379 100644
--- a/README.md
+++ b/README.md
@@ -55,7 +55,7 @@ cd your-project
 codegraph build
 ```
 
-That's it. No config files, no Docker, no JVM, no API keys, no accounts. The graph is ready to query. Add `codegraph mcp` to your AI agent's config and it has full access to your dependency graph through 19 MCP tools.
+That's it. No config files, no Docker, no JVM, no API keys, no accounts. The graph is ready to query. Add `codegraph mcp` to your AI agent's config and it has full access to your dependency graph through 21 MCP tools (22 in multi-repo mode).
 
 ### Why it matters
 
@@ -431,7 +431,7 @@ Optional: `@huggingface/transformers` (semantic search), `@modelcontextprotocol/
 
 ### MCP Server
 
-Codegraph includes a built-in [Model Context Protocol](https://modelcontextprotocol.io/) server with 19 tools, so AI assistants can query your dependency graph directly:
+Codegraph includes a built-in [Model Context Protocol](https://modelcontextprotocol.io/) server with 21 tools (22 in multi-repo mode), so AI assistants can query your dependency graph directly:
 
 ```bash
 codegraph mcp                  # Single-repo mode (default) — only local project
diff --git a/docs/guides/recommended-practices.md b/docs/guides/recommended-practices.md
index 1eae4fe7..102b0070 100644
--- a/docs/guides/recommended-practices.md
+++ b/docs/guides/recommended-practices.md
@@ -143,7 +143,7 @@ By default, the MCP server runs in **single-repo mode** — the AI agent can onl
 
 Enable `--multi-repo` to let the agent query any registered repository, or use `--repos` to restrict access to a specific set of repos.
 
-The server exposes 19 tools (20 in multi-repo mode): `query_function`, `file_deps`, `impact_analysis`, `find_cycles`, `module_map`, `fn_deps`, `fn_impact`, `symbol_path`, `context`, `explain`, `where`, `diff_impact`, `semantic_search`, `export_graph`, `list_functions`, `structure`, `hotspots`, `node_roles`, `co_changes`, and `list_repos` (multi-repo only). See the [AI Agent Guide MCP reference](./ai-agent-guide.md#mcp-server-reference) for the full tool-to-CLI mapping table.
+The server exposes 21 tools (22 in multi-repo mode): `query_function`, `file_deps`, `impact_analysis`, `find_cycles`, `module_map`, `fn_deps`, `fn_impact`, `symbol_path`, `context`, `explain`, `where`, `diff_impact`, `semantic_search`, `export_graph`, `list_functions`, `structure`, `hotspots`, `node_roles`, `co_changes`, `execution_flow`, `list_entry_points`, and `list_repos` (multi-repo only). See the [AI Agent Guide MCP reference](./ai-agent-guide.md#mcp-server-reference) for the full tool-to-CLI mapping table.
 
 ### CLAUDE.md for your project
 
diff --git a/docs/roadmap/BACKLOG.md b/docs/roadmap/BACKLOG.md
index 39004f1e..e2d8e418 100644
--- a/docs/roadmap/BACKLOG.md
+++ b/docs/roadmap/BACKLOG.md
@@ -33,17 +33,17 @@ Non-breaking, ordered by problem-fit:
 | 12 | ~~Execution flow tracing~~ | ~~Framework-aware entry point detection (Express routes, CLI commands, event handlers) + BFS flow tracing from entry to leaf. Inspired by axon, GitNexus, code-context-mcp.~~ | Navigation | ~~Agents can answer "what happens when a user hits POST /login?" by tracing the full execution path in one query~~ | ✓ | ✓ | 4 | No | **DONE** — `codegraph flow` command with entry point detection and BFS flow tracing. MCP tools `flow` and `entry_points` added. Merged in PR #118. |
 | 16 | Branch structural diff | Compare code structure between two branches using git worktrees. Show added/removed/changed symbols and their impact. Inspired by axon. | Analysis | Teams can review structural impact of feature branches before merge; agents get branch-aware context | ✓ | ✓ | 4 | No |
 | 20 | Streaming / chunked results | Support streaming output for large query results so MCP clients and programmatic consumers can process incrementally. | Embeddability | Large codebases don't blow up agent context windows; consumers process results as they arrive instead of waiting for the full payload | ✓ | ✓ | 4 | No |
-| 21 | Composite audit command | Single `codegraph audit <file-or-function>` that combines `explain`, `fn-impact`, and code health metrics into one structured report per function. Core version uses graph data; enhanced version includes Phase 4.4 `risk_score`/`complexity_notes`/`side_effects` when available. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) Gauntlet phase. | Orchestration | Each sub-agent in a multi-agent swarm gets everything it needs to assess a function in one call instead of 3-4 — directly reduces token waste and round-trips | ✓ | ✓ | 4 | No |
-| 22 | Batch querying | Accept a list of targets (file or JSON) and return all query results in one JSON payload. Applies to `audit`, `fn-impact`, `context`, and other per-symbol commands. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) swarm pattern. | Orchestration | A swarm of 20+ agents auditing different files can be fed from a single orchestrator call instead of N sequential invocations — reduces overhead and enables parallel dispatch | ✓ | ✓ | 4 | No |
-| 23 | Triage priority queue | Single `codegraph triage` command that merges `map` connectivity, `hotspots` fan-in/fan-out, node roles, and optionally git churn + `risk_score` into one ranked audit queue. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) RECON phase. | Orchestration | Orchestrating agent gets a single prioritized list of what to audit first — replaces manual synthesis of 3+ commands, saves RECON phase from burning tokens on orientation | ✓ | ✓ | 4 | No |
-| 24 | Change validation predicates | `codegraph check --staged` with configurable predicates: `--no-new-cycles`, `--max-blast-radius N`, `--no-signature-changes`, `--no-boundary-violations`. Returns exit code 0/1 for CI gates and state machines. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) STATE MACHINE phase. | CI | Automated rollback triggers without parsing JSON — orchestrators and CI pipelines get first-class pass/fail signals for blast radius, cycles, and contract changes | ✓ | ✓ | 4 | No |
-| 26 | MCP orchestration tools | Expose `audit`, `triage`, and `check` as MCP tools alongside existing tools. Enables multi-agent orchestrators (Claude Code agent teams, custom MCP clients) to run the full Titan Paradigm loop through the MCP protocol without CLI overhead. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md). | Embeddability | Agents query the graph through MCP with zero CLI overhead — fewer tokens, faster round-trips, native integration with AI agent frameworks | ✓ | ✓ | 4 | No |
+| 27 | Composite audit command | Single `codegraph audit <file-or-function>` that combines `explain`, `fn-impact`, and code health metrics into one structured report per function. Core version uses graph data; enhanced version includes Phase 4.4 `risk_score`/`complexity_notes`/`side_effects` when available. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) Gauntlet phase. | Orchestration | Each sub-agent in a multi-agent swarm gets everything it needs to assess a function in one call instead of 3-4 — directly reduces token waste and round-trips | ✓ | ✓ | 4 | No |
+| 28 | Batch querying | Accept a list of targets (file or JSON) and return all query results in one JSON payload. Applies to `audit`, `fn-impact`, `context`, and other per-symbol commands. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) swarm pattern. | Orchestration | A swarm of 20+ agents auditing different files can be fed from a single orchestrator call instead of N sequential invocations — reduces overhead and enables parallel dispatch | ✓ | ✓ | 4 | No |
+| 29 | Triage priority queue | Single `codegraph triage` command that merges `map` connectivity, `hotspots` fan-in/fan-out, node roles, and optionally git churn + `risk_score` into one ranked audit queue. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) RECON phase. | Orchestration | Orchestrating agent gets a single prioritized list of what to audit first — replaces manual synthesis of 3+ commands, saves RECON phase from burning tokens on orientation | ✓ | ✓ | 4 | No |
+| 30 | Change validation predicates | `codegraph check --staged` with configurable predicates: `--no-new-cycles`, `--max-blast-radius N`, `--no-signature-changes`, `--no-boundary-violations`. Returns exit code 0/1 for CI gates and state machines. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) STATE MACHINE phase. | CI | Automated rollback triggers without parsing JSON — orchestrators and CI pipelines get first-class pass/fail signals for blast radius, cycles, and contract changes | ✓ | ✓ | 4 | No |
+| 32 | MCP orchestration tools | Expose `audit`, `triage`, and `check` as MCP tools alongside existing tools. Enables multi-agent orchestrators (Claude Code agent teams, custom MCP clients) to run the full Titan Paradigm loop through the MCP protocol without CLI overhead. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md). | Embeddability | Agents query the graph through MCP with zero CLI overhead — fewer tokens, faster round-trips, native integration with AI agent frameworks | ✓ | ✓ | 4 | No |
 | 5 | TF-IDF lightweight search | SQLite FTS5 + TF-IDF as a middle tier (~50MB) between "no search" and full transformer embeddings (~500MB). Provides decent keyword search with near-zero overhead. Inspired by codexray. | Search | Users get useful search without the 500MB embedding model download; faster startup for small projects | ✓ | ✓ | 3 | No |
 | 13 | Architecture boundary rules | User-defined rules for allowed/forbidden dependencies between modules (e.g., "controllers must not import from other controllers"). Violations flagged in `diff-impact` and CI. Inspired by codegraph-rust, stratify. | Architecture | Prevents architectural decay in CI; agents are warned before introducing forbidden cross-module dependencies | ✓ | ✓ | 3 | No |
 | 15 | Hybrid BM25 + semantic search | Combine BM25 keyword matching with embedding-based semantic search using Reciprocal Rank Fusion. Better recall than either approach alone. Inspired by GitNexus, claude-context-local. | Search | Search results improve dramatically — keyword matches catch exact names, embeddings catch conceptual matches, RRF merges both | ✓ | ✓ | 3 | No |
 | 18 | CODEOWNERS integration | Map graph nodes to CODEOWNERS entries. Show who owns each function, surface ownership boundaries in `diff-impact`. Inspired by CKB. | Developer Experience | `diff-impact` tells agents which teams to notify; ownership-aware impact analysis reduces missed reviews | ✓ | ✓ | 3 | No |
 | 22 | Manifesto-driven pass/fail | User-defined rule engine with custom thresholds (e.g. "cognitive > 15 = fail", "cyclomatic > 10 = fail", "imports > 10 = decompose"). Outputs pass/fail per function/file. Generalizes ID 13 (boundary rules) into a generic rule system. | Analysis | Enables autonomous multi-agent audit workflows (GAUNTLET pattern); CI integration for code health gates with configurable thresholds | ✓ | ✓ | 3 | No |
-| 25 | Graph snapshots | `codegraph snapshot save <name>` / `codegraph snapshot restore <name>` for lightweight SQLite DB backup and restore. Enables orchestrators to checkpoint before refactoring passes and instantly rollback without rebuilding. After Phase 4, also preserves embeddings and semantic metadata. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) STATE MACHINE phase. | Orchestration | Multi-agent workflows get instant rollback without re-running expensive builds or LLM calls — orchestrator checkpoints before each pass and restores on failure | ✓ | ✓ | 3 | No |
+| 31 | Graph snapshots | `codegraph snapshot save <name>` / `codegraph snapshot restore <name>` for lightweight SQLite DB backup and restore. Enables orchestrators to checkpoint before refactoring passes and instantly rollback without rebuilding. After Phase 4, also preserves embeddings and semantic metadata. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) STATE MACHINE phase. | Orchestration | Multi-agent workflows get instant rollback without re-running expensive builds or LLM calls — orchestrator checkpoints before each pass and restores on failure | ✓ | ✓ | 3 | No |
 | 6 | Formal code health metrics | Cyclomatic complexity, Maintainability Index, and Halstead metrics per function — we already parse the AST, the data is there. Inspired by code-health-meter (published in ACM TOSEM 2025). | Analysis | Agents can prioritize refactoring targets; `hotspots` becomes richer with quantitative health scores per function | ✓ | ✓ | 2 | No |
 | 7 | OWASP/CWE pattern detection | Security pattern scanning on the existing AST — hardcoded secrets, SQL injection patterns, eval usage, XSS sinks. Lightweight static rules, not full taint analysis. Inspired by narsil-mcp, CKB. | Security | Catches low-hanging security issues during `diff-impact`; agents can flag risky patterns before they're committed | ✓ | ✓ | 2 | No |
 | 11 | Community detection | Leiden/Louvain algorithm to discover natural module boundaries vs actual file organization. Reveals which symbols are tightly coupled and whether the directory structure matches. Inspired by axon, GitNexus, CodeGraphMCPServer. | Intelligence | Surfaces architectural drift — when directory structure no longer matches actual dependency clusters; guides refactoring | ✓ | ✓ | 2 | No |
diff --git a/docs/use-cases/titan-paradigm.md b/docs/use-cases/titan-paradigm.md
index 2103f054..d9d1a6e9 100644
--- a/docs/use-cases/titan-paradigm.md
+++ b/docs/use-cases/titan-paradigm.md
@@ -259,7 +259,7 @@ After LLM integration, snapshots would also preserve embeddings, descriptions, a
 
 ### 6. MCP-native orchestration
 
-The Titan Paradigm's agents could run entirely through codegraph's [MCP server](../examples/MCP.md) instead of shelling out to the CLI. With 18 tools already exposed, the main gap is the `audit`/`triage`/`check` commands described above. After Phase 4, adding these as MCP tools — alongside [`ask_codebase`](../../roadmap/ROADMAP.md#53--mcp-integration) (Phase 5.3) for natural-language queries — would let orchestrators like Claude Code's agent teams query the graph with zero CLI overhead. The RECON agent asks the MCP server "what are the riskiest files?", each Gauntlet agent asks "should this function be decomposed?", and the STATE MACHINE asks "is this change safe?" — all through the same protocol.
+The Titan Paradigm's agents could run entirely through codegraph's [MCP server](../examples/MCP.md) instead of shelling out to the CLI. With 21 tools already exposed, the main gap is the `audit`/`triage`/`check` commands described above. After Phase 4, adding these as MCP tools — alongside [`ask_codebase`](../../roadmap/ROADMAP.md#53--mcp-integration) (Phase 5.3) for natural-language queries — would let orchestrators like Claude Code's agent teams query the graph with zero CLI overhead. The RECON agent asks the MCP server "what are the riskiest files?", each Gauntlet agent asks "should this function be decomposed?", and the STATE MACHINE asks "is this change safe?" — all through the same protocol.
 
 ---
 

From 51fedb439a46faf5e7379f8036edac87ed848f60 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]" <github-actions[bot]@users.noreply.github.com>
Date: Thu, 26 Feb 2026 01:40:11 -0700
Subject: [PATCH 5/8] feat: add token savings benchmark (codegraph vs raw
 navigation)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adds a benchmark suite that measures how much codegraph reduces token
usage when AI agents navigate the Next.js codebase (~4k TS files).

- scripts/token-benchmark-issues.js: 5 real Next.js PRs as test cases
- scripts/token-benchmark.js: runner using Claude Agent SDK (baseline
  vs codegraph MCP), with --perf flag for build/query benchmarks
- scripts/update-token-report.js: JSON → markdown report generator
- docs/benchmarks/: methodology docs and placeholder report

Impact: 21 functions changed, 7 affected
---
 docs/benchmarks/README.md         | 129 +++++++
 docs/benchmarks/TOKEN-SAVINGS.md  |  17 +
 scripts/token-benchmark-issues.js | 224 +++++++++++
 scripts/token-benchmark.js        | 610 ++++++++++++++++++++++++++++++
 scripts/update-token-report.js    | 322 ++++++++++++++++
 5 files changed, 1302 insertions(+)
 create mode 100644 docs/benchmarks/README.md
 create mode 100644 docs/benchmarks/TOKEN-SAVINGS.md
 create mode 100644 scripts/token-benchmark-issues.js
 create mode 100644 scripts/token-benchmark.js
 create mode 100644 scripts/update-token-report.js

diff --git a/docs/benchmarks/README.md b/docs/benchmarks/README.md
new file mode 100644
index 00000000..003cc9c6
--- /dev/null
+++ b/docs/benchmarks/README.md
@@ -0,0 +1,129 @@
+# Token Savings Benchmark
+
+Quantifies how much codegraph reduces token usage when AI agents navigate large codebases, compared to raw file exploration (Glob/Grep/Read/Bash).
+
+## Prerequisites
+
+1. **Claude Agent SDK**
+   ```bash
+   npm install @anthropic-ai/claude-agent-sdk
+   ```
+
+2. **API key**
+   ```bash
+   export ANTHROPIC_API_KEY=sk-ant-...
+   ```
+
+3. **Git** (for cloning Next.js)
+
+4. **codegraph** installed in this repo (`npm install`)
+
+## Quick Start
+
+```bash
+# Smoke test — 1 issue, 1 run (~$2-4)
+node scripts/token-benchmark.js --issues csrf-case-insensitive --runs 1 > result.json
+
+# View the JSON
+cat result.json | jq .aggregate
+
+# Generate the markdown report
+node scripts/update-token-report.js result.json
+cat docs/benchmarks/TOKEN-SAVINGS.md
+```
+
+## Full Run
+
+```bash
+# All 5 issues × 3 runs (~$10-20)
+node scripts/token-benchmark.js > result.json
+node scripts/update-token-report.js result.json
+```
+
+## CLI Flags
+
+| Flag | Default | Description |
+|------|---------|-------------|
+| `--runs <N>` | `3` | Number of runs per issue (medians used) |
+| `--model <model>` | `sonnet` | Claude model to use |
+| `--issues <id,...>` | all | Comma-separated subset of issue IDs |
+| `--nextjs-dir <path>` | `$TMPDIR/...` | Reuse existing Next.js clone |
+| `--skip-graph` | `false` | Skip codegraph rebuild (use existing DB) |
+| `--max-turns <N>` | `50` | Max agent turns per session |
+| `--max-budget <$>` | `2.00` | Max USD per session |
+| `--perf` | `false` | Also run build/query perf benchmarks on the Next.js graph |
+
+## Available Issues
+
+| ID | Difficulty | PR | Description |
+|----|:----------:|---:|-------------|
+| `csrf-case-insensitive` | Easy | #89127 | Case-insensitive CSRF origin matching |
+| `ready-in-time` | Medium | #88589 | Incorrect "Ready in" time display |
+| `aggregate-error-inspect` | Medium | #88999 | AggregateError.errors missing in output |
+| `otel-propagation` | Hard | #90181 | OTEL trace context propagation broken |
+| `static-rsc-payloads` | Hard | #89202 | Static RSC payloads not emitted/served |
+
+## Methodology
+
+### Setup
+- **Target repo:** [vercel/next.js](https://github.com/vercel/next.js) (~4,000 TypeScript files)
+- Each issue is a real closed PR with a known set of affected source files
+
+### Two conditions (identical except codegraph access)
+
+**Baseline:** Agent has `Glob`, `Grep`, `Read`, `Bash` tools. No codegraph.
+
+**Codegraph:** Agent has the same tools **plus** a codegraph MCP server providing structural navigation (symbol search, dependency tracking, impact analysis, call chains).
+
+### Controls
+- Same model for both conditions
+- Same issue prompt (bug description only — no hints about the solution)
+- Checkout pinned to the commit *before* the fix (agent can't see the answer in git history)
+- Same `maxTurns` and `maxBudgetUsd` budget caps
+
+### Metrics
+- **Input tokens:** Total tokens sent to the model (primary metric)
+- **Cost:** USD cost of the session
+- **Turns:** Number of agent turns (tool-use round-trips)
+- **Hit rate:** Percentage of ground-truth files correctly identified
+- **Tool calls:** Breakdown by tool type
+
+### Statistical handling
+- N runs per issue (default 3), median used to handle non-determinism
+- Error runs are excluded from aggregation
+
+## Cost Estimate
+
+| Scenario | Approximate cost |
+|----------|----------------:|
+| 1 issue × 1 run | $2-4 |
+| 1 issue × 3 runs | $6-12 |
+| 5 issues × 3 runs | $30-60 |
+
+Costs depend on model choice and issue difficulty. The `--max-budget` flag caps individual sessions.
+
+## Adding New Issues
+
+Edit `scripts/token-benchmark-issues.js` and add an entry to the `ISSUES` array:
+
+```js
+{
+  id: 'short-slug',
+  difficulty: 'easy|medium|hard',
+  pr: 12345,
+  title: 'PR title',
+  description: 'Bug description for the agent (no solution hints)',
+  commitBefore: 'abc123def...',  // SHA before the fix
+  expectedFiles: ['packages/next/src/path/to/file.ts'],
+}
+```
+
+Requirements:
+- Use a real closed PR with a clear bug description
+- `commitBefore` must be the parent of the merge commit (not the merge itself)
+- `expectedFiles` should list only source files, not tests
+- Verify the SHA exists: `git log --oneline <sha> -1` in the Next.js repo
+
+## Output Format
+
+The runner outputs JSON to stdout. See [TOKEN-SAVINGS.md](TOKEN-SAVINGS.md) for the generated report.
diff --git a/docs/benchmarks/TOKEN-SAVINGS.md b/docs/benchmarks/TOKEN-SAVINGS.md
new file mode 100644
index 00000000..4f3dc642
--- /dev/null
+++ b/docs/benchmarks/TOKEN-SAVINGS.md
@@ -0,0 +1,17 @@
+# Token Savings Benchmark: codegraph vs Raw Navigation
+
+Measures how much codegraph reduces token usage when an AI agent navigates
+the [Next.js](https://github.com/vercel/next.js) codebase (~4,000 TypeScript files).
+
+*No benchmark data yet. Run the benchmark to populate this report:*
+
+```bash
+node scripts/token-benchmark.js > result.json
+node scripts/update-token-report.js result.json
+```
+
+See [README.md](README.md) for full instructions.
+
+<!-- TOKEN_BENCHMARK_DATA
+[]
+-->
diff --git a/scripts/token-benchmark-issues.js b/scripts/token-benchmark-issues.js
new file mode 100644
index 00000000..5dc10171
--- /dev/null
+++ b/scripts/token-benchmark-issues.js
@@ -0,0 +1,224 @@
+/**
+ * Issue catalog for the token savings benchmark.
+ *
+ * Each entry is a real closed Next.js PR, varying in difficulty. The agent
+ * is given the bug description (never the solution) and asked to identify
+ * which source files need modification.
+ *
+ * Ground truth (`expectedFiles`) lists only *source* files that were changed
+ * in the actual fix — test files are excluded.
+ */
+
+/** @typedef {'easy'|'medium'|'hard'} Difficulty */
+
+/**
+ * @typedef {object} BenchmarkIssue
+ * @property {string}     id
+ * @property {Difficulty}  difficulty
+ * @property {number}     pr            — Next.js PR number
+ * @property {string}     title
+ * @property {string}     description   — bug description for the agent prompt
+ * @property {string}     commitBefore  — base SHA (before the fix)
+ * @property {string[]}   expectedFiles — source files changed in the fix
+ */
+
+/** @type {BenchmarkIssue[]} */
+export const ISSUES = [
+	// ── Easy (1 source file) ──────────────────────────────────────────────
+	{
+		id: 'csrf-case-insensitive',
+		difficulty: 'easy',
+		pr: 89127,
+		title: 'CSRF origin matching should be case-insensitive',
+		description:
+			'The isCsrfOriginAllowed function used for Server Actions CSRF protection ' +
+			'performs case-sensitive domain matching. However, DNS names are case-insensitive ' +
+			'per RFC 1035. Requests with uppercase Origin headers (e.g. sub.VERCEL.com) fail ' +
+			'CSRF checks against configured patterns like *.vercel.com, causing legitimate ' +
+			'Server Action requests to be rejected when serverActions.allowedOrigins is set ' +
+			'in next.config.js.',
+		commitBefore: '59c48b73b4a01b4b5b9277eff1e62d75097ba812',
+		expectedFiles: ['packages/next/src/server/app-render/csrf-protection.ts'],
+	},
+
+	// ── Medium (2 source files) ───────────────────────────────────────────
+	{
+		id: 'ready-in-time',
+		difficulty: 'medium',
+		pr: 88589,
+		title: 'Fix incorrect "Ready in" time for next start',
+		description:
+			'Running `next start` displays impossibly large "Ready in" times like ' +
+			'"Ready in 29474457.7min" instead of the actual startup duration. The ' +
+			'NEXT_PRIVATE_START_TIME environment variable is not being properly set or ' +
+			'propagated when startServer() reads it. When the variable is missing, the code ' +
+			'defaults to 0, causing the calculation `Date.now() - 0` to equal the entire ' +
+			'Unix timestamp. The bug involves two subsystems: the CLI entry point ' +
+			'(next-start.ts) which should set the env var, and the server startup ' +
+			'(start-server.ts) which consumes it.',
+		commitBefore: '52b2b8be6a74b4f65fe595de1d6e3311efd3c446',
+		expectedFiles: [
+			'packages/next/src/cli/next-start.ts',
+			'packages/next/src/server/lib/start-server.ts',
+		],
+	},
+	{
+		id: 'aggregate-error-inspect',
+		difficulty: 'medium',
+		pr: 88999,
+		title: 'Include AggregateError.errors in terminal output',
+		description:
+			'console.error(aggregateError) in Next.js production backends omits the ' +
+			'[errors] property entirely. Next.js patches util.inspect to rewrite stack ' +
+			'traces, but the patch does not handle AggregateError.errors because it is a ' +
+			'non-enumerable property. The existing enumerable-property iteration logic ' +
+			'skips it. Additionally, the depth calculation in the patch is miscalculated, ' +
+			'causing nested Error.cause chains to truncate at the wrong depth.',
+		commitBefore: '1c73ca5a58e3ec8ab6f1b908f2819245a6147469',
+		expectedFiles: ['packages/next/src/server/patch-error-inspect.ts'],
+	},
+
+	// ── Hard (6-7 source files) ───────────────────────────────────────────
+	{
+		id: 'otel-propagation',
+		difficulty: 'hard',
+		pr: 90181,
+		title: 'Fix OTEL propagation and add direct entrypoint e2e coverage',
+		description:
+			'OpenTelemetry trace context propagation is broken when using the Next.js ' +
+			'entrypoint handler directly (without the next-server wrapper). The forced ' +
+			'trace context extraction in the tracer drops the active context when no ' +
+			'remote span context is present in incoming request headers. Upstream trace ' +
+			'contexts are not propagated through app pages, app routes, or pages API ' +
+			'routes. The bug spans build templates (app-page, app-route, pages-api), the ' +
+			'router-server-context, the tracer propagation logic, and next-server ' +
+			'initialization across 6 source files in 4 directories.',
+		commitBefore: '87f609e710650c5b05664ac1da3b2cd35a643d78',
+		expectedFiles: [
+			'packages/next/src/build/templates/app-page.ts',
+			'packages/next/src/build/templates/app-route.ts',
+			'packages/next/src/build/templates/pages-api.ts',
+			'packages/next/src/server/lib/router-utils/router-server-context.ts',
+			'packages/next/src/server/lib/trace/tracer.ts',
+			'packages/next/src/server/next-server.ts',
+		],
+	},
+	{
+		id: 'static-rsc-payloads',
+		difficulty: 'hard',
+		pr: 89202,
+		title: 'Fully static pages should emit and serve static RSC payloads',
+		description:
+			'Navigating to fully static PPR (Partial Pre-Rendering) routes in Cache ' +
+			'Components triggers unnecessary function invocations instead of serving ' +
+			'cached static content. After the Cache Components refactor, ' +
+			'prefetchDataRoute entries are no longer populated with .prefetch.rsc values ' +
+			'containing static RSC payloads. Static payloads only exist in the .segments ' +
+			'directory. When a non-prefetch RSC request occurs (prefetch not completed or ' +
+			'prefetch={false}), it routes to an empty fallback and invokes a function ' +
+			'instead of serving static content. The fix requires changes across the build ' +
+			'pipeline, export system, build adapter, and incremental cache — 7 source ' +
+			'files in 5 directories.',
+		commitBefore: '0e457e95a96089eea85159635d7b75838699dd87',
+		expectedFiles: [
+			'packages/next/src/build/adapter/build-complete.ts',
+			'packages/next/src/build/index.ts',
+			'packages/next/src/build/templates/app-page.ts',
+			'packages/next/src/export/index.ts',
+			'packages/next/src/export/routes/app-page.ts',
+			'packages/next/src/export/types.ts',
+			'packages/next/src/server/lib/incremental-cache/file-system-cache.ts',
+		],
+	},
+];
+
+/**
+ * Compute hit rate — percentage of expected files the agent identified.
+ *
+ * Uses path suffix matching so the agent doesn't need to get the exact
+ * repo-relative path right (e.g. "src/server/tracer.ts" matches
+ * "packages/next/src/server/lib/trace/tracer.ts" won't match, but
+ * "server/lib/trace/tracer.ts" will).
+ *
+ * @param {string}   issueId
+ * @param {string[]} filesIdentified — files the agent reported
+ * @returns {{ hits: number, total: number, hitRate: number, matched: string[], missed: string[] }}
+ */
+export function validateResult(issueId, filesIdentified) {
+	const issue = ISSUES.find((i) => i.id === issueId);
+	if (!issue) throw new Error(`Unknown issue: ${issueId}`);
+
+	const normalize = (f) => f.replace(/\\/g, '/');
+	const identified = filesIdentified.map(normalize);
+
+	const matched = [];
+	const missed = [];
+
+	for (const expected of issue.expectedFiles) {
+		const norm = normalize(expected);
+		const found = identified.some(
+			(f) => f === norm || f.endsWith('/' + norm) || norm.endsWith('/' + f),
+		);
+		if (found) {
+			matched.push(expected);
+		} else {
+			missed.push(expected);
+		}
+	}
+
+	const total = issue.expectedFiles.length;
+	const hits = matched.length;
+	return {
+		hits,
+		total,
+		hitRate: total > 0 ? Math.round((hits / total) * 100) : 0,
+		matched,
+		missed,
+	};
+}
+
+/**
+ * Extract the agent's structured output from its messages.
+ *
+ * Looks for a fenced JSON block containing `{ "files": [...] }`.
+ *
+ * @param {Array<{ role: string, content: string|Array }>} messages
+ * @returns {{ files: string[], explanation: string } | null}
+ */
+export function extractAgentOutput(messages) {
+	for (let i = messages.length - 1; i >= 0; i--) {
+		const msg = messages[i];
+		if (msg.role !== 'assistant') continue;
+
+		const text =
+			typeof msg.content === 'string'
+				? msg.content
+				: Array.isArray(msg.content)
+					? msg.content
+							.filter((b) => b.type === 'text')
+							.map((b) => b.text)
+							.join('\n')
+					: '';
+
+		// Match ```json ... ``` or bare { "files": ... }
+		const fenced = text.match(/```(?:json)?\s*(\{[\s\S]*?"files"[\s\S]*?\})\s*```/);
+		if (fenced) {
+			try {
+				return JSON.parse(fenced[1]);
+			} catch {
+				/* try next pattern */
+			}
+		}
+
+		const bare = text.match(/(\{[\s\S]*?"files"\s*:\s*\[[\s\S]*?\][\s\S]*?\})/);
+		if (bare) {
+			try {
+				return JSON.parse(bare[1]);
+			} catch {
+				/* skip */
+			}
+		}
+	}
+
+	return null;
+}
diff --git a/scripts/token-benchmark.js b/scripts/token-benchmark.js
new file mode 100644
index 00000000..7c20996e
--- /dev/null
+++ b/scripts/token-benchmark.js
@@ -0,0 +1,610 @@
+#!/usr/bin/env node
+
+/**
+ * Token savings benchmark — measures codegraph's navigation advantage.
+ *
+ * Runs controlled experiments: same Next.js issues, same model — one agent
+ * navigates with codegraph (via MCP), one without. Outputs JSON to stdout
+ * with per-issue and aggregate token/cost savings.
+ *
+ * Prerequisites:
+ *   npm install @anthropic-ai/claude-agent-sdk
+ *   ANTHROPIC_API_KEY set in environment
+ *
+ * Usage:
+ *   node scripts/token-benchmark.js > result.json
+ *   node scripts/token-benchmark.js --runs 1 --issues csrf-case-insensitive
+ *   node scripts/token-benchmark.js --nextjs-dir /tmp/next.js --skip-graph
+ */
+
+import { execFileSync, execSync } from 'node:child_process';
+import fs from 'node:fs';
+import os from 'node:os';
+import path from 'node:path';
+import { performance } from 'node:perf_hooks';
+import { fileURLToPath } from 'node:url';
+import { parseArgs } from 'node:util';
+
+import { ISSUES, extractAgentOutput, validateResult } from './token-benchmark-issues.js';
+
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const root = path.resolve(__dirname, '..');
+const pkg = JSON.parse(fs.readFileSync(path.join(root, 'package.json'), 'utf8'));
+
+// Redirect console.log to stderr so only JSON goes to stdout
+const origLog = console.log;
+console.log = (...args) => console.error(...args);
+
+// ── CLI flags ─────────────────────────────────────────────────────────────
+
+const { values: flags } = parseArgs({
+	options: {
+		runs: { type: 'string', default: '3' },
+		model: { type: 'string', default: 'sonnet' },
+		issues: { type: 'string', default: '' },
+		'nextjs-dir': { type: 'string', default: '' },
+		'skip-graph': { type: 'boolean', default: false },
+		'max-turns': { type: 'string', default: '50' },
+		'max-budget': { type: 'string', default: '2.00' },
+		perf: { type: 'boolean', default: false },
+	},
+	strict: false,
+});
+
+const RUNS = parseInt(flags.runs, 10) || 3;
+const MODEL = flags.model;
+const MAX_TURNS = parseInt(flags['max-turns'], 10) || 50;
+const MAX_BUDGET = parseFloat(flags['max-budget']) || 2.0;
+const SKIP_GRAPH = flags['skip-graph'];
+const RUN_PERF = flags.perf;
+
+const selectedIssueIds = flags.issues
+	? flags.issues.split(',').map((s) => s.trim())
+	: ISSUES.map((i) => i.id);
+
+const selectedIssues = selectedIssueIds.map((id) => {
+	const issue = ISSUES.find((i) => i.id === id);
+	if (!issue) {
+		console.error(`Unknown issue: ${id}`);
+		console.error(`Available: ${ISSUES.map((i) => i.id).join(', ')}`);
+		process.exit(1);
+	}
+	return issue;
+});
+
+// ── Helpers ───────────────────────────────────────────────────────────────
+
+function median(arr) {
+	if (arr.length === 0) return 0;
+	const sorted = [...arr].sort((a, b) => a - b);
+	const mid = Math.floor(sorted.length / 2);
+	return sorted.length % 2 ? sorted[mid] : (sorted[mid - 1] + sorted[mid]) / 2;
+}
+
+function round2(n) {
+	return Math.round(n * 100) / 100;
+}
+
+function git(args, cwd) {
+	return execFileSync('git', args, { cwd, stdio: 'pipe', encoding: 'utf8' }).trim();
+}
+
+// ── Prompts ───────────────────────────────────────────────────────────────
+
+const BASELINE_PROMPT = `You are an expert debugging agent navigating a large codebase.
+You have access to Glob, Grep, Read, and Bash tools to explore the code.
+Use these to search for relevant files, read their contents, and trace
+call chains. Be systematic and thorough.`;
+
+const CODEGRAPH_PROMPT = `You are an expert debugging agent navigating a large codebase.
+You have access to a codegraph MCP server that provides structural code
+navigation tools (symbol search, dependency tracking, impact analysis,
+call chains). Use these tools to efficiently find relevant code.
+You also have Glob, Grep, Read, and Bash tools for additional exploration.
+Prefer codegraph tools for structural navigation — they are faster than
+manual grep/read exploration.`;
+
+function makeIssuePrompt(issue) {
+	return `You are debugging a bug in the Next.js codebase (vercel/next.js).
+
+**Bug:** ${issue.title}
+
+**Description:** ${issue.description}
+
+Your task: Identify which source files need to be modified to fix this bug.
+Explain the root cause and the fix approach.
+
+IMPORTANT: Output your answer as a JSON code block with this exact format:
+\`\`\`json
+{
+  "files": ["path/to/file1.ts", "path/to/file2.ts"],
+  "explanation": "Brief explanation of the root cause and fix approach"
+}
+\`\`\`
+
+Only include source files that need modification — not test files.`;
+}
+
+// ── Next.js setup ─────────────────────────────────────────────────────────
+
+async function ensureNextjsClone(targetDir) {
+	if (fs.existsSync(path.join(targetDir, '.git'))) {
+		console.error(`Reusing existing Next.js clone at ${targetDir}`);
+		git(['fetch', 'origin'], targetDir);
+		return;
+	}
+
+	console.error(`Cloning Next.js to ${targetDir} (shallow)...`);
+	fs.mkdirSync(targetDir, { recursive: true });
+	execFileSync(
+		'git',
+		['clone', '--filter=blob:none', 'https://github.com/vercel/next.js.git', targetDir],
+		{ stdio: 'inherit' },
+	);
+}
+
+function checkoutCommit(nextjsDir, sha) {
+	console.error(`Checking out ${sha.slice(0, 10)}...`);
+	git(['checkout', sha, '--force'], nextjsDir);
+}
+
+// ── Graph building ────────────────────────────────────────────────────────
+
+async function buildCodegraph(nextjsDir) {
+	const cliPath = path.join(root, 'src', 'cli.js');
+	console.error('Building codegraph graph for Next.js...');
+	const start = performance.now();
+	execFileSync('node', [cliPath, 'build', nextjsDir], {
+		cwd: nextjsDir,
+		stdio: 'pipe',
+		timeout: 600_000, // 10 min
+	});
+	const elapsed = Math.round(performance.now() - start);
+	console.error(`Graph built in ${elapsed}ms`);
+}
+
+// ── Session runner ────────────────────────────────────────────────────────
+
+/**
+ * Run a single agent session using the Claude Agent SDK.
+ *
+ * @param {'baseline'|'codegraph'} mode
+ * @param {import('./token-benchmark-issues.js').BenchmarkIssue} issue
+ * @param {string} nextjsDir
+ * @returns {Promise<object>} session metrics
+ */
+async function runSession(mode, issue, nextjsDir) {
+	// Lazy-load the SDK
+	const { query } = await import('@anthropic-ai/claude-agent-sdk');
+
+	const dbPath = path.join(nextjsDir, '.codegraph', 'graph.db');
+	const cliPath = path.join(root, 'src', 'cli.js');
+	const issuePrompt = makeIssuePrompt(issue);
+
+	const options = {
+		cwd: nextjsDir,
+		model: MODEL,
+		allowedTools: ['Glob', 'Grep', 'Read', 'Bash'],
+		permissionMode: 'bypassPermissions',
+		maxTurns: MAX_TURNS,
+		maxBudgetUsd: MAX_BUDGET,
+		systemPrompt: mode === 'codegraph' ? CODEGRAPH_PROMPT : BASELINE_PROMPT,
+	};
+
+	if (mode === 'codegraph') {
+		options.mcpServers = {
+			codegraph: {
+				type: 'stdio',
+				command: 'node',
+				args: [cliPath, 'mcp', '-d', dbPath],
+			},
+		};
+	}
+
+	const start = performance.now();
+	const result = await query({ prompt: issuePrompt, options });
+	const durationMs = Math.round(performance.now() - start);
+
+	// Extract metrics from the SDK result
+	const usage = result.usage || {};
+	const inputTokens = usage.input_tokens || usage.inputTokens || 0;
+	const outputTokens = usage.output_tokens || usage.outputTokens || 0;
+	const cacheReadInputTokens =
+		usage.cache_read_input_tokens || usage.cacheReadInputTokens || 0;
+	const totalCostUsd = usage.total_cost_usd || usage.totalCostUsd || 0;
+	const numTurns = result.num_turns || result.numTurns || 0;
+
+	// Count tool calls by type
+	const messages = result.messages || [];
+	const toolCalls = {};
+	let uniqueFilesRead = new Set();
+
+	for (const msg of messages) {
+		if (msg.role !== 'assistant') continue;
+		const blocks = Array.isArray(msg.content) ? msg.content : [];
+		for (const block of blocks) {
+			if (block.type === 'tool_use') {
+				const name = block.name || 'unknown';
+				toolCalls[name] = (toolCalls[name] || 0) + 1;
+				// Track unique files read
+				if (name === 'Read' && block.input?.file_path) {
+					uniqueFilesRead.add(block.input.file_path);
+				}
+			}
+		}
+	}
+
+	// Extract identified files from agent output
+	const agentOutput = extractAgentOutput(messages);
+	const filesIdentified = agentOutput?.files || [];
+	const validation = validateResult(issue.id, filesIdentified);
+
+	return {
+		inputTokens,
+		outputTokens,
+		cacheReadInputTokens,
+		totalCostUsd: round2(totalCostUsd),
+		numTurns,
+		durationMs,
+		toolCalls,
+		uniqueFilesRead: uniqueFilesRead.size,
+		filesIdentified,
+		hitRate: validation.hitRate,
+		matched: validation.matched,
+		missed: validation.missed,
+	};
+}
+
+// ── Performance benchmarks (build/query on the large graph) ──────────────
+
+const PERF_RUNS = 3;
+
+function round1(n) {
+	return Math.round(n * 10) / 10;
+}
+
+/**
+ * Run build/query/stats benchmarks against the Next.js graph.
+ * Reuses the same codegraph APIs as the existing benchmark scripts.
+ */
+async function runPerfBenchmarks(nextjsDir) {
+	const { pathToFileURL } = await import('node:url');
+	const { buildGraph } = await import(
+		pathToFileURL(path.join(root, 'src', 'builder.js')).href
+	);
+	const { fnDepsData, fnImpactData, statsData } = await import(
+		pathToFileURL(path.join(root, 'src', 'queries.js')).href
+	);
+	const { isNativeAvailable } = await import(
+		pathToFileURL(path.join(root, 'src', 'native.js')).href
+	);
+
+	const dbPath = path.join(nextjsDir, '.codegraph', 'graph.db');
+
+	console.error('\n── Performance benchmarks ──');
+
+	// ── Build benchmarks ──────────────────────────────────────────────
+	const buildResults = {};
+	for (const engine of ['wasm', ...(isNativeAvailable() ? ['native'] : [])]) {
+		console.error(`  Full build (${engine})...`);
+		const timings = [];
+		for (let i = 0; i < PERF_RUNS; i++) {
+			if (fs.existsSync(dbPath)) fs.unlinkSync(dbPath);
+			const start = performance.now();
+			await buildGraph(nextjsDir, { engine, incremental: false });
+			timings.push(performance.now() - start);
+		}
+		const fullBuildMs = Math.round(median(timings));
+
+		// No-op rebuild
+		console.error(`  No-op rebuild (${engine})...`);
+		const noopTimings = [];
+		for (let i = 0; i < PERF_RUNS; i++) {
+			const start = performance.now();
+			await buildGraph(nextjsDir, { engine, incremental: true });
+			noopTimings.push(performance.now() - start);
+		}
+		const noopRebuildMs = Math.round(median(noopTimings));
+
+		buildResults[engine] = { fullBuildMs, noopRebuildMs };
+		console.error(`    full=${fullBuildMs}ms noop=${noopRebuildMs}ms`);
+	}
+
+	// ── Stats ─────────────────────────────────────────────────────────
+	// Ensure we have a graph (rebuild with wasm if needed)
+	if (!fs.existsSync(dbPath)) {
+		await buildGraph(nextjsDir, { engine: 'wasm', incremental: false });
+	}
+	const stats = statsData(dbPath);
+	const graphStats = {
+		files: stats.files.total,
+		nodes: stats.nodes.total,
+		edges: stats.edges.total,
+		dbSizeBytes: fs.existsSync(dbPath) ? fs.statSync(dbPath).size : 0,
+	};
+	console.error(
+		`  Stats: ${graphStats.files} files, ${graphStats.nodes} nodes, ${graphStats.edges} edges`,
+	);
+
+	// ── Query benchmarks ──────────────────────────────────────────────
+	// Find a hub node (most connected) for query benchmarks
+	const { default: Database } = await import('better-sqlite3');
+	const db = new Database(dbPath, { readonly: true });
+	const hubRow = db
+		.prepare(
+			`SELECT n.name, COUNT(e.id) AS cnt
+			 FROM nodes n
+			 JOIN edges e ON e.source_id = n.id OR e.target_id = n.id
+			 WHERE n.file NOT LIKE '%test%' AND n.file NOT LIKE '%spec%'
+			 GROUP BY n.id
+			 ORDER BY cnt DESC
+			 LIMIT 1`,
+		)
+		.get();
+	db.close();
+
+	const hubName = hubRow?.name || null;
+	const queryResults = {};
+
+	if (hubName) {
+		console.error(`  Query target (hub): ${hubName}`);
+
+		for (const depth of [1, 3, 5]) {
+			// fnDeps
+			const depsTimings = [];
+			for (let i = 0; i < PERF_RUNS; i++) {
+				const start = performance.now();
+				fnDepsData(hubName, dbPath, { depth, noTests: true });
+				depsTimings.push(performance.now() - start);
+			}
+
+			// fnImpact
+			const impactTimings = [];
+			for (let i = 0; i < PERF_RUNS; i++) {
+				const start = performance.now();
+				fnImpactData(hubName, dbPath, { depth, noTests: true });
+				impactTimings.push(performance.now() - start);
+			}
+
+			queryResults[`fnDeps_depth${depth}Ms`] = round1(median(depsTimings));
+			queryResults[`fnImpact_depth${depth}Ms`] = round1(median(impactTimings));
+		}
+
+		console.error(
+			`    fnDeps: d1=${queryResults.fnDeps_depth1Ms}ms d3=${queryResults.fnDeps_depth3Ms}ms d5=${queryResults.fnDeps_depth5Ms}ms`,
+		);
+		console.error(
+			`    fnImpact: d1=${queryResults.fnImpact_depth1Ms}ms d3=${queryResults.fnImpact_depth3Ms}ms d5=${queryResults.fnImpact_depth5Ms}ms`,
+		);
+	}
+
+	return {
+		repo: 'vercel/next.js',
+		stats: graphStats,
+		build: buildResults,
+		query: { hub: hubName, ...queryResults },
+	};
+}
+
+// ── Main ──────────────────────────────────────────────────────────────────
+
+async function main() {
+	// Resolve Next.js directory
+	const nextjsDir = flags['nextjs-dir']
+		? path.resolve(flags['nextjs-dir'])
+		: path.join(os.tmpdir(), 'codegraph-bench-nextjs');
+
+	console.error(`Token Savings Benchmark`);
+	console.error(`  Model: ${MODEL}`);
+	console.error(`  Runs per issue: ${RUNS}`);
+	console.error(`  Issues: ${selectedIssues.map((i) => i.id).join(', ')}`);
+	console.error(`  Max turns: ${MAX_TURNS}`);
+	console.error(`  Max budget: $${MAX_BUDGET}`);
+	console.error(`  Next.js dir: ${nextjsDir}`);
+	console.error('');
+
+	// Clone / fetch Next.js
+	await ensureNextjsClone(nextjsDir);
+
+	const results = [];
+
+	for (const issue of selectedIssues) {
+		console.error(`\n── ${issue.id} (${issue.difficulty}) ──`);
+		console.error(`PR #${issue.pr}: ${issue.title}`);
+
+		// Checkout the commit before the fix
+		checkoutCommit(nextjsDir, issue.commitBefore);
+
+		// Build codegraph (unless skipped)
+		if (!SKIP_GRAPH) {
+			await buildCodegraph(nextjsDir);
+		}
+
+		const baselineRuns = [];
+		const codegraphRuns = [];
+
+		// Run baseline sessions
+		for (let r = 0; r < RUNS; r++) {
+			console.error(`  Baseline run ${r + 1}/${RUNS}...`);
+			try {
+				const metrics = await runSession('baseline', issue, nextjsDir);
+				baselineRuns.push(metrics);
+				console.error(
+					`    ${metrics.inputTokens} input tokens, $${metrics.totalCostUsd}, ` +
+						`${metrics.numTurns} turns, hit rate: ${metrics.hitRate}%`,
+				);
+			} catch (err) {
+				console.error(`    ERROR: ${err.message}`);
+				baselineRuns.push({ error: err.message });
+			}
+		}
+
+		// Run codegraph sessions
+		for (let r = 0; r < RUNS; r++) {
+			console.error(`  Codegraph run ${r + 1}/${RUNS}...`);
+			try {
+				const metrics = await runSession('codegraph', issue, nextjsDir);
+				codegraphRuns.push(metrics);
+				console.error(
+					`    ${metrics.inputTokens} input tokens, $${metrics.totalCostUsd}, ` +
+						`${metrics.numTurns} turns, hit rate: ${metrics.hitRate}%`,
+				);
+			} catch (err) {
+				console.error(`    ERROR: ${err.message}`);
+				codegraphRuns.push({ error: err.message });
+			}
+		}
+
+		// Compute medians (filter out errored runs)
+		const validBaseline = baselineRuns.filter((r) => !r.error);
+		const validCodegraph = codegraphRuns.filter((r) => !r.error);
+
+		const medianOf = (runs, key) => median(runs.map((r) => r[key]));
+
+		const baselineMedian =
+			validBaseline.length > 0
+				? {
+						inputTokens: medianOf(validBaseline, 'inputTokens'),
+						outputTokens: medianOf(validBaseline, 'outputTokens'),
+						cacheReadInputTokens: medianOf(validBaseline, 'cacheReadInputTokens'),
+						totalCostUsd: round2(medianOf(validBaseline, 'totalCostUsd')),
+						numTurns: medianOf(validBaseline, 'numTurns'),
+						durationMs: medianOf(validBaseline, 'durationMs'),
+						uniqueFilesRead: medianOf(validBaseline, 'uniqueFilesRead'),
+						hitRate: medianOf(validBaseline, 'hitRate'),
+					}
+				: null;
+
+		const codegraphMedian =
+			validCodegraph.length > 0
+				? {
+						inputTokens: medianOf(validCodegraph, 'inputTokens'),
+						outputTokens: medianOf(validCodegraph, 'outputTokens'),
+						cacheReadInputTokens: medianOf(validCodegraph, 'cacheReadInputTokens'),
+						totalCostUsd: round2(medianOf(validCodegraph, 'totalCostUsd')),
+						numTurns: medianOf(validCodegraph, 'numTurns'),
+						durationMs: medianOf(validCodegraph, 'durationMs'),
+						uniqueFilesRead: medianOf(validCodegraph, 'uniqueFilesRead'),
+						hitRate: medianOf(validCodegraph, 'hitRate'),
+					}
+				: null;
+
+		// Compute savings
+		let savings = null;
+		if (baselineMedian && codegraphMedian && baselineMedian.inputTokens > 0) {
+			const tokenSavings =
+				((baselineMedian.inputTokens - codegraphMedian.inputTokens) /
+					baselineMedian.inputTokens) *
+				100;
+			const costSavings =
+				baselineMedian.totalCostUsd > 0
+					? ((baselineMedian.totalCostUsd - codegraphMedian.totalCostUsd) /
+							baselineMedian.totalCostUsd) *
+						100
+					: 0;
+			savings = {
+				inputTokensPct: Math.round(tokenSavings),
+				costPct: Math.round(costSavings),
+			};
+		}
+
+		results.push({
+			id: issue.id,
+			difficulty: issue.difficulty,
+			pr: issue.pr,
+			baseline: { median: baselineMedian, runs: baselineRuns },
+			codegraph: { median: codegraphMedian, runs: codegraphRuns },
+			savings,
+		});
+
+		if (savings) {
+			console.error(
+				`  Savings: ${savings.inputTokensPct}% tokens, ${savings.costPct}% cost`,
+			);
+		}
+	}
+
+	// ── Aggregate ───────────────────────────────────────────────────────
+
+	const validResults = results.filter(
+		(r) => r.baseline.median && r.codegraph.median && r.savings,
+	);
+
+	let aggregate = null;
+	if (validResults.length > 0) {
+		const totalBaselineTokens = validResults.reduce(
+			(s, r) => s + r.baseline.median.inputTokens,
+			0,
+		);
+		const totalCodegraphTokens = validResults.reduce(
+			(s, r) => s + r.codegraph.median.inputTokens,
+			0,
+		);
+		const totalBaselineCost = validResults.reduce(
+			(s, r) => s + r.baseline.median.totalCostUsd,
+			0,
+		);
+		const totalCodegraphCost = validResults.reduce(
+			(s, r) => s + r.codegraph.median.totalCostUsd,
+			0,
+		);
+
+		aggregate = {
+			savings: {
+				inputTokensPct:
+					totalBaselineTokens > 0
+						? Math.round(
+								((totalBaselineTokens - totalCodegraphTokens) / totalBaselineTokens) * 100,
+							)
+						: 0,
+				costPct:
+					totalBaselineCost > 0
+						? Math.round(
+								((totalBaselineCost - totalCodegraphCost) / totalBaselineCost) * 100,
+							)
+						: 0,
+			},
+			baselineAvgHitRate: Math.round(
+				validResults.reduce((s, r) => s + r.baseline.median.hitRate, 0) /
+					validResults.length,
+			),
+			codegraphAvgHitRate: Math.round(
+				validResults.reduce((s, r) => s + r.codegraph.median.hitRate, 0) /
+					validResults.length,
+			),
+		};
+	}
+
+	// ── Performance benchmarks (optional) ────────────────────────────────
+
+	let perfBenchmarks = null;
+	if (RUN_PERF) {
+		// Checkout latest commit from the first issue for a stable snapshot
+		checkoutCommit(nextjsDir, selectedIssues[0].commitBefore);
+		perfBenchmarks = await runPerfBenchmarks(nextjsDir);
+	}
+
+	// ── Output ──────────────────────────────────────────────────────────
+
+	// Restore console.log for JSON output
+	console.log = origLog;
+
+	const output = {
+		version: pkg.version,
+		date: new Date().toISOString().slice(0, 10),
+		model: MODEL,
+		runsPerIssue: RUNS,
+		maxTurns: MAX_TURNS,
+		maxBudgetUsd: MAX_BUDGET,
+		issues: results,
+		aggregate,
+		perfBenchmarks,
+	};
+
+	console.log(JSON.stringify(output, null, 2));
+}
+
+main().catch((err) => {
+	console.error(`Fatal: ${err.message}`);
+	process.exit(1);
+});
diff --git a/scripts/update-token-report.js b/scripts/update-token-report.js
new file mode 100644
index 00000000..a2d4f966
--- /dev/null
+++ b/scripts/update-token-report.js
@@ -0,0 +1,322 @@
+#!/usr/bin/env node
+
+/**
+ * Update token savings report — reads benchmark JSON and generates
+ * docs/benchmarks/TOKEN-SAVINGS.md with summary tables, per-issue
+ * breakdowns, difficulty averages, and historical trends.
+ *
+ * Usage:
+ *   node scripts/update-token-report.js token-result.json
+ *   node scripts/token-benchmark.js | node scripts/update-token-report.js
+ */
+
+import fs from 'node:fs';
+import path from 'node:path';
+import { fileURLToPath } from 'node:url';
+
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const root = path.resolve(__dirname, '..');
+
+// ── Read benchmark JSON from file arg or stdin ───────────────────────────
+let jsonText;
+const arg = process.argv[2];
+if (arg) {
+	jsonText = fs.readFileSync(path.resolve(arg), 'utf8');
+} else {
+	jsonText = fs.readFileSync('/dev/stdin', 'utf8');
+}
+const entry = JSON.parse(jsonText);
+
+// ── Paths ────────────────────────────────────────────────────────────────
+const reportPath = path.join(root, 'docs', 'benchmarks', 'TOKEN-SAVINGS.md');
+
+// ── Load existing history ────────────────────────────────────────────────
+let history = [];
+if (fs.existsSync(reportPath)) {
+	const content = fs.readFileSync(reportPath, 'utf8');
+	const match = content.match(/<!--\s*TOKEN_BENCHMARK_DATA\s*([\s\S]*?)\s*-->/);
+	if (match) {
+		try {
+			history = JSON.parse(match[1]);
+		} catch {
+			/* start fresh if corrupt */
+		}
+	}
+}
+
+// Add new entry (deduplicate by version)
+const idx = history.findIndex((h) => h.version === entry.version);
+if (idx >= 0) {
+	history[idx] = entry;
+} else {
+	history.unshift(entry);
+}
+
+// ── Helpers ──────────────────────────────────────────────────────────────
+
+function trend(current, previous, lowerIsBetter = true) {
+	if (previous == null) return '';
+	const pct = ((current - previous) / previous) * 100;
+	if (Math.abs(pct) < 2) return ' ~';
+	if (lowerIsBetter) {
+		return pct < 0 ? ` ↓${Math.abs(Math.round(pct))}%` : ` ↑${Math.round(pct)}%`;
+	}
+	return pct > 0 ? ` ↑${Math.round(pct)}%` : ` ↓${Math.abs(Math.round(pct))}%`;
+}
+
+function formatTokens(n) {
+	if (n >= 1_000_000) return `${(n / 1_000_000).toFixed(1)}M`;
+	if (n >= 1000) return `${(n / 1000).toFixed(1)}k`;
+	return String(n);
+}
+
+function formatCost(n) {
+	return `$${n.toFixed(2)}`;
+}
+
+function formatMs(ms) {
+	if (ms >= 1000) return `${(ms / 1000).toFixed(1)}s`;
+	return `${Math.round(ms)}ms`;
+}
+
+function formatBytes(bytes) {
+	if (bytes >= 1048576) return `${(bytes / 1048576).toFixed(1)} MB`;
+	if (bytes >= 1024) return `${(bytes / 1024).toFixed(0)} KB`;
+	return `${bytes} B`;
+}
+
+function difficultyEmoji(d) {
+	if (d === 'easy') return '🟢';
+	if (d === 'medium') return '🟡';
+	return '🔴';
+}
+
+// ── Build report ─────────────────────────────────────────────────────────
+
+const latest = history[0];
+const prev = history[1] || null;
+
+let md = '# Token Savings Benchmark: codegraph vs Raw Navigation\n\n';
+md += 'Measures how much codegraph reduces token usage when an AI agent navigates\n';
+md += 'the [Next.js](https://github.com/vercel/next.js) codebase (~4,000 TypeScript files).\n\n';
+md += `**Model:** ${latest.model} | **Runs per issue:** ${latest.runsPerIssue} | `;
+md += `**codegraph version:** ${latest.version} | **Date:** ${latest.date}\n\n`;
+
+// ── Summary table ────────────────────────────────────────────────────────
+
+if (latest.aggregate) {
+	md += '## Summary\n\n';
+	md += '| Metric | Baseline | Codegraph | Savings |\n';
+	md += '|--------|--------:|---------:|--------:|\n';
+
+	const validIssues = latest.issues.filter((i) => i.baseline?.median && i.codegraph?.median);
+
+	if (validIssues.length > 0) {
+		const totalBaselineTokens = validIssues.reduce(
+			(s, i) => s + i.baseline.median.inputTokens,
+			0,
+		);
+		const totalCodegraphTokens = validIssues.reduce(
+			(s, i) => s + i.codegraph.median.inputTokens,
+			0,
+		);
+		const totalBaselineCost = validIssues.reduce(
+			(s, i) => s + i.baseline.median.totalCostUsd,
+			0,
+		);
+		const totalCodegraphCost = validIssues.reduce(
+			(s, i) => s + i.codegraph.median.totalCostUsd,
+			0,
+		);
+		const avgBaselineTurns =
+			validIssues.reduce((s, i) => s + i.baseline.median.numTurns, 0) / validIssues.length;
+		const avgCodegraphTurns =
+			validIssues.reduce((s, i) => s + i.codegraph.median.numTurns, 0) / validIssues.length;
+
+		md += `| Input tokens (total) | ${formatTokens(totalBaselineTokens)} | ${formatTokens(totalCodegraphTokens)} | **${latest.aggregate.savings.inputTokensPct}%** |\n`;
+		md += `| Cost (total) | ${formatCost(totalBaselineCost)} | ${formatCost(totalCodegraphCost)} | **${latest.aggregate.savings.costPct}%** |\n`;
+		md += `| Avg turns/issue | ${avgBaselineTurns.toFixed(1)} | ${avgCodegraphTurns.toFixed(1)} | — |\n`;
+		md += `| Avg hit rate | ${latest.aggregate.baselineAvgHitRate}% | ${latest.aggregate.codegraphAvgHitRate}% | — |\n`;
+	}
+
+	md += '\n';
+}
+
+// ── Per-issue breakdown ──────────────────────────────────────────────────
+
+md += '## Per-Issue Breakdown\n\n';
+md += '| Issue | Diff | Baseline tokens | CG tokens | Token savings | Baseline cost | CG cost | Cost savings | Hit rate (B/CG) |\n';
+md += '|-------|:----:|----------------:|----------:|--------------:|--------------:|--------:|-------------:|----------------:|\n';
+
+for (const issue of latest.issues) {
+	const emoji = difficultyEmoji(issue.difficulty);
+	const b = issue.baseline?.median;
+	const c = issue.codegraph?.median;
+
+	if (!b || !c) {
+		md += `| ${issue.id} | ${emoji} | — | — | — | — | — | — | — |\n`;
+		continue;
+	}
+
+	const savingsStr = issue.savings
+		? `**${issue.savings.inputTokensPct}%**`
+		: '—';
+	const costSavingsStr = issue.savings
+		? `**${issue.savings.costPct}%**`
+		: '—';
+
+	md += `| ${issue.id} | ${emoji} | ${formatTokens(b.inputTokens)} | ${formatTokens(c.inputTokens)} | ${savingsStr} | ${formatCost(b.totalCostUsd)} | ${formatCost(c.totalCostUsd)} | ${costSavingsStr} | ${b.hitRate}% / ${c.hitRate}% |\n`;
+}
+
+md += '\n';
+md += 'Difficulty: 🟢 Easy (1 file) · 🟡 Medium (1-2 files) · 🔴 Hard (5-7 files)\n\n';
+
+// ── By-difficulty averages ───────────────────────────────────────────────
+
+md += '## By Difficulty\n\n';
+md += '| Difficulty | Issues | Avg token savings | Avg cost savings | Avg hit rate (B/CG) |\n';
+md += '|------------|-------:|------------------:|-----------------:|--------------------:|\n';
+
+for (const difficulty of ['easy', 'medium', 'hard']) {
+	const issues = latest.issues.filter(
+		(i) => i.difficulty === difficulty && i.savings,
+	);
+
+	if (issues.length === 0) {
+		md += `| ${difficulty} | 0 | — | — | — |\n`;
+		continue;
+	}
+
+	const avgTokenSavings = Math.round(
+		issues.reduce((s, i) => s + i.savings.inputTokensPct, 0) / issues.length,
+	);
+	const avgCostSavings = Math.round(
+		issues.reduce((s, i) => s + i.savings.costPct, 0) / issues.length,
+	);
+	const avgBaselineHit = Math.round(
+		issues.reduce((s, i) => s + i.baseline.median.hitRate, 0) / issues.length,
+	);
+	const avgCgHit = Math.round(
+		issues.reduce((s, i) => s + i.codegraph.median.hitRate, 0) / issues.length,
+	);
+
+	md += `| ${difficulty} | ${issues.length} | **${avgTokenSavings}%** | **${avgCostSavings}%** | ${avgBaselineHit}% / ${avgCgHit}% |\n`;
+}
+
+md += '\n';
+
+// ── Historical trend ─────────────────────────────────────────────────────
+
+if (history.length > 1) {
+	md += '## Historical Trend\n\n';
+	md += '| Version | Date | Model | Token savings | Cost savings | Trend |\n';
+	md += '|---------|------|-------|-------------:|------------:|------:|\n';
+
+	for (let i = 0; i < history.length; i++) {
+		const h = history[i];
+		const p = history[i + 1] || null;
+		if (!h.aggregate) continue;
+
+		const tokenTrend = p?.aggregate
+			? trend(h.aggregate.savings.inputTokensPct, p.aggregate.savings.inputTokensPct, false)
+			: '';
+
+		md += `| ${h.version} | ${h.date} | ${h.model} | ${h.aggregate.savings.inputTokensPct}% | ${h.aggregate.savings.costPct}% | ${tokenTrend} |\n`;
+	}
+
+	md += '\n';
+}
+
+// ── Performance benchmarks (if present) ──────────────────────────────────
+
+if (latest.perfBenchmarks) {
+	const perf = latest.perfBenchmarks;
+	md += '## Codegraph Performance on Next.js\n\n';
+	md += `Measured on the **${perf.repo}** codebase during the benchmark run.\n\n`;
+
+	// Graph stats
+	if (perf.stats) {
+		const s = perf.stats;
+		md += '### Graph Stats\n\n';
+		md += '| Metric | Value |\n';
+		md += '|--------|------:|\n';
+		md += `| Files | ${s.files.toLocaleString()} |\n`;
+		md += `| Nodes | ${s.nodes.toLocaleString()} |\n`;
+		md += `| Edges | ${s.edges.toLocaleString()} |\n`;
+		md += `| DB size | ${formatBytes(s.dbSizeBytes)} |\n`;
+		md += `| Nodes/file | ${s.files > 0 ? (s.nodes / s.files).toFixed(1) : '—'} |\n`;
+		md += `| Edges/file | ${s.files > 0 ? (s.edges / s.files).toFixed(1) : '—'} |\n`;
+		md += '\n';
+	}
+
+	// Build benchmarks
+	if (perf.build) {
+		md += '### Build Performance\n\n';
+		md += '| Engine | Full build | No-op rebuild |\n';
+		md += '|--------|----------:|-------------:|\n';
+		for (const [engine, data] of Object.entries(perf.build)) {
+			md += `| ${engine} | ${formatMs(data.fullBuildMs)} | ${formatMs(data.noopRebuildMs)} |\n`;
+		}
+		if (perf.stats?.files > 0) {
+			md += '\n*Per-file:*\n\n';
+			md += '| Engine | Build ms/file |\n';
+			md += '|--------|--------------:|\n';
+			for (const [engine, data] of Object.entries(perf.build)) {
+				const perFile = (data.fullBuildMs / perf.stats.files).toFixed(1);
+				md += `| ${engine} | ${perFile} |\n`;
+			}
+		}
+		md += '\n';
+	}
+
+	// Query benchmarks
+	if (perf.query?.hub) {
+		md += '### Query Performance\n\n';
+		md += `Hub node: \`${perf.query.hub}\`\n\n`;
+		md += '| Query | Depth 1 | Depth 3 | Depth 5 |\n';
+		md += '|-------|--------:|--------:|--------:|\n';
+		md += `| fnDeps | ${formatMs(perf.query.fnDeps_depth1Ms || 0)} | ${formatMs(perf.query.fnDeps_depth3Ms || 0)} | ${formatMs(perf.query.fnDeps_depth5Ms || 0)} |\n`;
+		md += `| fnImpact | ${formatMs(perf.query.fnImpact_depth1Ms || 0)} | ${formatMs(perf.query.fnImpact_depth3Ms || 0)} | ${formatMs(perf.query.fnImpact_depth5Ms || 0)} |\n`;
+		md += '\n';
+	}
+}
+
+// ── Methodology ──────────────────────────────────────────────────────────
+
+md += '## Methodology\n\n';
+md += '- Each issue is a real closed Next.js PR with known affected files\n';
+md += '- Agent is checked out to the commit *before* the fix (no answer in git history)\n';
+md += '- Baseline: agent uses Glob/Grep/Read/Bash only\n';
+md += '- Codegraph: agent has access to codegraph MCP server (symbol search, deps, impact)\n';
+md += '- Same model, same prompt, same budget cap for both conditions\n';
+md += '- Metrics are median of N runs to handle non-determinism\n';
+md += '- Hit rate = percentage of ground-truth files the agent identified\n\n';
+md += 'See [docs/benchmarks/README.md](README.md) for full details.\n\n';
+
+// ── Embedded data ────────────────────────────────────────────────────────
+
+md += `<!-- TOKEN_BENCHMARK_DATA\n${JSON.stringify(history, null, 2)}\n-->\n`;
+
+// ── Write report ─────────────────────────────────────────────────────────
+
+fs.mkdirSync(path.dirname(reportPath), { recursive: true });
+fs.writeFileSync(reportPath, md);
+console.error(`Updated ${path.relative(root, reportPath)}`);
+
+// ── Regression detection ─────────────────────────────────────────────────
+const REGRESSION_THRESHOLD = 0.15; // 15%
+
+if (prev?.aggregate && latest.aggregate) {
+	const currentSavings = latest.aggregate.savings.inputTokensPct;
+	const previousSavings = prev.aggregate.savings.inputTokensPct;
+	const drop = previousSavings - currentSavings;
+
+	if (drop > REGRESSION_THRESHOLD * 100) {
+		const msg = `Token savings dropped: ${previousSavings}% → ${currentSavings}% (-${drop}pp, threshold ${Math.round(REGRESSION_THRESHOLD * 100)}pp)`;
+		if (process.env.GITHUB_ACTIONS) {
+			console.error(`::warning title=Token Benchmark Regression::${msg}`);
+		} else {
+			console.error(`⚠ REGRESSION: ${msg}`);
+		}
+	}
+}

From 845e4c971547d67d83cd7dc81032bff8d686068e Mon Sep 17 00:00:00 2001
From: "github-actions[bot]" <github-actions[bot]@users.noreply.github.com>
Date: Thu, 26 Feb 2026 01:45:09 -0700
Subject: [PATCH 6/8] feat: extend benchmarks with incremental builds and
 expanded query coverage

benchmark.js now measures no-op rebuilds, 1-file rebuilds, and query
latency (fn-deps, fn-impact, path, roles) alongside full builds.
update-benchmark-report.js renders new Incremental Rebuilds and Query
Latency sections in BUILD-BENCHMARKS.md and adds incremental/query rows
to the README performance table. All new fields are additive for backward
compatibility.

Impact: 5 functions changed, 2 affected
---
 README.md                          |  9 ++-
 generated/BUILD-BENCHMARKS.md      | 69 +++++++++++++++------
 scripts/benchmark.js               | 97 +++++++++++++++++++++++++++++-
 scripts/update-benchmark-report.js | 81 +++++++++++++++++++++++--
 4 files changed, 228 insertions(+), 28 deletions(-)

diff --git a/README.md b/README.md
index f9db3379..acabc794 100644
--- a/README.md
+++ b/README.md
@@ -406,10 +406,13 @@ Self-measured on every release via CI ([build benchmarks](generated/BUILD-BENCHM
 
 | Metric | Latest |
 |---|---|
-| Build speed (native) | **1.9 ms/file** |
-| Build speed (WASM) | **6.6 ms/file** |
+| Build speed | **4.6 ms/file** |
 | Query time | **2ms** |
-| ~50,000 files (est.) | **~95.0s build** |
+| No-op rebuild | **5ms** |
+| 1-file rebuild | **233ms** |
+| Query: fn-deps | **1.8ms** |
+| Query: path | **0.8ms** |
+| ~50,000 files (est.) | **~230.0s build** |
 
 Metrics are normalized per file for cross-version comparability. Times above are for a full initial build — incremental rebuilds only re-parse changed files.
 
diff --git a/generated/BUILD-BENCHMARKS.md b/generated/BUILD-BENCHMARKS.md
index c12d7638..6106b900 100644
--- a/generated/BUILD-BENCHMARKS.md
+++ b/generated/BUILD-BENCHMARKS.md
@@ -5,6 +5,7 @@ Metrics are normalized per file for cross-version comparability.
 
 | Version | Engine | Date | Files | Build (ms/file) | Query (ms) | Nodes/file | Edges/file | DB (bytes/file) |
 |---------|--------|------|------:|----------------:|-----------:|-----------:|-----------:|----------------:|
+| 2.4.0 | wasm | 2026-02-26 | 112 | 4.6 ↓30% | 1.9 ↓10% | 6 ↑3% | 9.8 ↑8% | 4425 ↑15% |
 | 2.3.0 | native | 2026-02-24 | 99 | 1.9 ~ | 1.5 ↑7% | 5.8 ↑7% | 9.1 ~ | 3848 ~ |
 | 2.3.0 | wasm | 2026-02-24 | 99 | 6.6 ~ | 2.1 ↑11% | 5.8 ~ | 9.1 ↑3% | 3848 ~ |
 | 2.1.0 | native | 2026-02-23 | 92 | 1.9 ↓24% | 1.4 ↑17% | 5.4 ↑6% | 9.1 ↓47% | 3829 ↓14% |
@@ -14,27 +15,16 @@ Metrics are normalized per file for cross-version comparability.
 
 ### Raw totals (latest)
 
-#### Native (Rust)
-
-| Metric | Value |
-|--------|-------|
-| Build time | 183ms |
-| Query time | 2ms |
-| Nodes | 575 |
-| Edges | 897 |
-| DB size | 372 KB |
-| Files | 99 |
-
 #### WASM
 
 | Metric | Value |
 |--------|-------|
-| Build time | 649ms |
+| Build time | 519ms |
 | Query time | 2ms |
-| Nodes | 575 |
-| Edges | 897 |
-| DB size | 372 KB |
-| Files | 99 |
+| Nodes | 672 |
+| Edges | 1,094 |
+| DB size | 484 KB |
+| Files | 112 |
 
 ### Estimated performance at 50,000 files
 
@@ -42,13 +32,52 @@ Extrapolated linearly from per-file metrics above.
 
 | Metric | Native (Rust) | WASM |
 |--------|---:|---:|
-| Build time | 95.0s | 330.0s |
-| DB size | 183.5 MB | 183.5 MB |
-| Nodes | 290,000 | 290,000 |
-| Edges | 455,000 | 455,000 |
+| Build time | n/a | 230.0s |
+| DB size | n/a | 211.0 MB |
+| Nodes | n/a | 300,000 |
+| Edges | n/a | 490,000 |
+
+### Incremental Rebuilds
+
+| Version | Engine | No-op (ms) | 1-file (ms) |
+|---------|--------|----------:|-----------:|
+| 2.4.0 | wasm | 5 | 233 |
+
+### Query Latency
+
+| Version | Engine | fn-deps (ms) | fn-impact (ms) | path (ms) | roles (ms) |
+|---------|--------|------------:|--------------:|----------:|----------:|
+| 2.4.0 | wasm | 1.8 | 1.4 | 0.8 | 0.8 |
 
 <!-- BENCHMARK_DATA
 [
+  {
+    "version": "2.4.0",
+    "date": "2026-02-26",
+    "files": 112,
+    "wasm": {
+      "buildTimeMs": 519,
+      "queryTimeMs": 1.9,
+      "nodes": 672,
+      "edges": 1094,
+      "dbSizeBytes": 495616,
+      "perFile": {
+        "buildTimeMs": 4.6,
+        "nodes": 6,
+        "edges": 9.8,
+        "dbSizeBytes": 4425
+      },
+      "noopRebuildMs": 5,
+      "oneFileRebuildMs": 233,
+      "queries": {
+        "fnDepsMs": 1.8,
+        "fnImpactMs": 1.4,
+        "pathMs": 0.8,
+        "rolesMs": 0.8
+      }
+    },
+    "native": null
+  },
   {
     "version": "2.3.0",
     "date": "2026-02-24",
diff --git a/scripts/benchmark.js b/scripts/benchmark.js
index 715479ac..ef0ee072 100644
--- a/scripts/benchmark.js
+++ b/scripts/benchmark.js
@@ -13,6 +13,7 @@ import fs from 'node:fs';
 import path from 'node:path';
 import { performance } from 'node:perf_hooks';
 import { fileURLToPath, pathToFileURL } from 'node:url';
+import Database from 'better-sqlite3';
 
 const __dirname = path.dirname(fileURLToPath(import.meta.url));
 const root = path.resolve(__dirname, '..');
@@ -24,13 +25,49 @@ const dbPath = path.join(root, '.codegraph', 'graph.db');
 
 // Import programmatic API (use file:// URLs for Windows compatibility)
 const { buildGraph } = await import(pathToFileURL(path.join(root, 'src', 'builder.js')).href);
-const { fnDepsData, statsData } = await import(
+const { fnDepsData, fnImpactData, pathData, rolesData, statsData } = await import(
 	pathToFileURL(path.join(root, 'src', 'queries.js')).href
 );
 const { isNativeAvailable } = await import(
 	pathToFileURL(path.join(root, 'src', 'native.js')).href
 );
 
+const INCREMENTAL_RUNS = 3;
+const QUERY_RUNS = 5;
+const PROBE_FILE = path.join(root, 'src', 'queries.js');
+
+function median(arr) {
+	const sorted = [...arr].sort((a, b) => a - b);
+	const mid = Math.floor(sorted.length / 2);
+	return sorted.length % 2 ? sorted[mid] : (sorted[mid - 1] + sorted[mid]) / 2;
+}
+
+function round1(n) {
+	return Math.round(n * 10) / 10;
+}
+
+/**
+ * Pick hub (most-connected) and leaf (least-connected) non-test symbols from the DB.
+ */
+function selectTargets() {
+	const db = new Database(dbPath, { readonly: true });
+	const rows = db
+		.prepare(
+			`SELECT n.name, COUNT(e.id) AS cnt
+       FROM nodes n
+       JOIN edges e ON e.source_id = n.id OR e.target_id = n.id
+       WHERE n.file NOT LIKE '%test%' AND n.file NOT LIKE '%spec%'
+       GROUP BY n.id
+       ORDER BY cnt DESC`,
+		)
+		.all();
+	db.close();
+
+	if (rows.length === 0) return { hub: 'buildGraph', leaf: 'median' };
+
+	return { hub: rows[0].name, leaf: rows[rows.length - 1].name };
+}
+
 // Redirect console.log to stderr so only JSON goes to stdout
 const origLog = console.log;
 console.log = (...args) => console.error(...args);
@@ -53,6 +90,55 @@ async function benchmarkEngine(engine) {
 	const totalEdges = stats.edges.total;
 	const dbSizeBytes = fs.statSync(dbPath).size;
 
+	// ── Incremental build tiers (reuse existing DB from full build) ─────
+	console.error(`  [${engine}] Benchmarking no-op rebuild...`);
+	const noopTimings = [];
+	for (let i = 0; i < INCREMENTAL_RUNS; i++) {
+		const start = performance.now();
+		await buildGraph(root, { engine, incremental: true });
+		noopTimings.push(performance.now() - start);
+	}
+	const noopRebuildMs = Math.round(median(noopTimings));
+
+	console.error(`  [${engine}] Benchmarking 1-file rebuild...`);
+	const original = fs.readFileSync(PROBE_FILE, 'utf8');
+	let oneFileRebuildMs;
+	try {
+		const oneFileTimings = [];
+		for (let i = 0; i < INCREMENTAL_RUNS; i++) {
+			fs.writeFileSync(PROBE_FILE, original + `\n// probe-${i}\n`);
+			const start = performance.now();
+			await buildGraph(root, { engine, incremental: true });
+			oneFileTimings.push(performance.now() - start);
+		}
+		oneFileRebuildMs = Math.round(median(oneFileTimings));
+	} finally {
+		fs.writeFileSync(PROBE_FILE, original);
+		await buildGraph(root, { engine, incremental: true });
+	}
+
+	// ── Query benchmarks (median of QUERY_RUNS each) ────────────────────
+	console.error(`  [${engine}] Benchmarking queries...`);
+	const targets = selectTargets();
+	console.error(`    hub=${targets.hub}, leaf=${targets.leaf}`);
+
+	function benchQuery(fn, ...args) {
+		const timings = [];
+		for (let i = 0; i < QUERY_RUNS; i++) {
+			const start = performance.now();
+			fn(...args);
+			timings.push(performance.now() - start);
+		}
+		return round1(median(timings));
+	}
+
+	const queries = {
+		fnDepsMs: benchQuery(fnDepsData, targets.hub, dbPath, { depth: 3, noTests: true }),
+		fnImpactMs: benchQuery(fnImpactData, targets.hub, dbPath, { depth: 3, noTests: true }),
+		pathMs: benchQuery(pathData, targets.hub, targets.leaf, dbPath, { noTests: true }),
+		rolesMs: benchQuery(rolesData, dbPath, { noTests: true }),
+	};
+
 	return {
 		buildTimeMs: Math.round(buildTimeMs),
 		queryTimeMs: Math.round(queryTimeMs * 10) / 10,
@@ -66,6 +152,9 @@ async function benchmarkEngine(engine) {
 			edges: Math.round((totalEdges / totalFiles) * 10) / 10,
 			dbSizeBytes: Math.round(dbSizeBytes / totalFiles),
 		},
+		noopRebuildMs,
+		oneFileRebuildMs,
+		queries,
 	};
 }
 
@@ -93,6 +182,9 @@ const result = {
 		edges: wasm.edges,
 		dbSizeBytes: wasm.dbSizeBytes,
 		perFile: wasm.perFile,
+		noopRebuildMs: wasm.noopRebuildMs,
+		oneFileRebuildMs: wasm.oneFileRebuildMs,
+		queries: wasm.queries,
 	},
 	native: native
 		? {
@@ -102,6 +194,9 @@ const result = {
 				edges: native.edges,
 				dbSizeBytes: native.dbSizeBytes,
 				perFile: native.perFile,
+				noopRebuildMs: native.noopRebuildMs,
+				oneFileRebuildMs: native.oneFileRebuildMs,
+				queries: native.queries,
 			}
 		: null,
 };
diff --git a/scripts/update-benchmark-report.js b/scripts/update-benchmark-report.js
index 13af42b7..22071576 100644
--- a/scripts/update-benchmark-report.js
+++ b/scripts/update-benchmark-report.js
@@ -148,6 +148,58 @@ md += `| DB size | ${estNative ? formatBytes(estNative.dbSizeBytes * ESTIMATE_FI
 md += `| Nodes | ${estNative ? Math.round(estNative.nodes * ESTIMATE_FILES).toLocaleString() : 'n/a'} | ${Math.round(estWasm.nodes * ESTIMATE_FILES).toLocaleString()} |\n`;
 md += `| Edges | ${estNative ? Math.round(estNative.edges * ESTIMATE_FILES).toLocaleString() : 'n/a'} | ${Math.round(estWasm.edges * ESTIMATE_FILES).toLocaleString()} |\n\n`;
 
+// ── Incremental Rebuilds section ──────────────────────────────────────────
+const hasIncremental = history.some((h) => h.wasm?.noopRebuildMs != null || h.native?.noopRebuildMs != null);
+if (hasIncremental) {
+	md += '### Incremental Rebuilds\n\n';
+	md += '| Version | Engine | No-op (ms) | 1-file (ms) |\n';
+	md += '|---------|--------|----------:|-----------:|\n';
+
+	for (let i = 0; i < history.length; i++) {
+		const h = history[i];
+		const prev = history[i + 1] || null;
+
+		for (const engineKey of ['native', 'wasm']) {
+			const e = h[engineKey];
+			if (!e || e.noopRebuildMs == null) continue;
+			const p = prev?.[engineKey] || null;
+
+			const noopTrend = trend(e.noopRebuildMs, p?.noopRebuildMs);
+			const oneFileTrend = trend(e.oneFileRebuildMs, p?.oneFileRebuildMs);
+
+			md += `| ${h.version} | ${engineKey} | ${e.noopRebuildMs}${noopTrend} | ${e.oneFileRebuildMs}${oneFileTrend} |\n`;
+		}
+	}
+	md += '\n';
+}
+
+// ── Query Latency section ─────────────────────────────────────────────────
+const hasQueries = history.some((h) => h.wasm?.queries != null || h.native?.queries != null);
+if (hasQueries) {
+	md += '### Query Latency\n\n';
+	md += '| Version | Engine | fn-deps (ms) | fn-impact (ms) | path (ms) | roles (ms) |\n';
+	md += '|---------|--------|------------:|--------------:|----------:|----------:|\n';
+
+	for (let i = 0; i < history.length; i++) {
+		const h = history[i];
+		const prev = history[i + 1] || null;
+
+		for (const engineKey of ['native', 'wasm']) {
+			const e = h[engineKey];
+			if (!e?.queries) continue;
+			const p = prev?.[engineKey]?.queries || null;
+
+			const depsTrend = trend(e.queries.fnDepsMs, p?.fnDepsMs);
+			const impactTrend = trend(e.queries.fnImpactMs, p?.fnImpactMs);
+			const pathTrend = trend(e.queries.pathMs, p?.pathMs);
+			const rolesTrend = trend(e.queries.rolesMs, p?.rolesMs);
+
+			md += `| ${h.version} | ${engineKey} | ${e.queries.fnDepsMs}${depsTrend} | ${e.queries.fnImpactMs}${impactTrend} | ${e.queries.pathMs}${pathTrend} | ${e.queries.rolesMs}${rolesTrend} |\n`;
+		}
+	}
+	md += '\n';
+}
+
 md += `<!-- BENCHMARK_DATA\n${JSON.stringify(history, null, 2)}\n-->\n`;
 
 fs.mkdirSync(path.dirname(benchmarkPath), { recursive: true });
@@ -180,6 +232,12 @@ if (prev) {
 		checkRegression(`${tag} Build ms/file`, e.perFile.buildTimeMs, p.perFile.buildTimeMs);
 		checkRegression(`${tag} Query time`, e.queryTimeMs, p.queryTimeMs);
 		checkRegression(`${tag} DB bytes/file`, e.perFile.dbSizeBytes, p.perFile.dbSizeBytes);
+		if (e.noopRebuildMs != null && p.noopRebuildMs != null) {
+			checkRegression(`${tag} No-op rebuild`, e.noopRebuildMs, p.noopRebuildMs);
+		}
+		if (e.oneFileRebuildMs != null && p.oneFileRebuildMs != null) {
+			checkRegression(`${tag} 1-file rebuild`, e.oneFileRebuildMs, p.oneFileRebuildMs);
+		}
 	}
 }
 
@@ -188,6 +246,10 @@ if (fs.existsSync(readmePath)) {
 	let readme = fs.readFileSync(readmePath, 'utf8');
 
 	// Build the table rows — show both engines when native is available
+	// Pick the preferred engine: native when available, WASM as fallback
+	const pref = latest.native || latest.wasm;
+	const prefLabel = latest.native ? ' (native)' : '';
+
 	let rows = '';
 	if (latest.native) {
 		rows += `| Build speed (native) | **${latest.native.perFile.buildTimeMs} ms/file** |\n`;
@@ -198,6 +260,18 @@ if (fs.existsSync(readmePath)) {
 		rows += `| Query time | **${formatMs(latest.wasm.queryTimeMs)}** |\n`;
 	}
 
+	// Incremental rebuild rows (prefer native, fallback to WASM)
+	if (pref.noopRebuildMs != null) {
+		rows += `| No-op rebuild${prefLabel} | **${formatMs(pref.noopRebuildMs)}** |\n`;
+		rows += `| 1-file rebuild${prefLabel} | **${formatMs(pref.oneFileRebuildMs)}** |\n`;
+	}
+
+	// Query latency rows (pick two representative queries)
+	if (pref.queries) {
+		rows += `| Query: fn-deps | **${pref.queries.fnDepsMs}ms** |\n`;
+		rows += `| Query: path | **${pref.queries.pathMs}ms** |\n`;
+	}
+
 	// 50k-file estimate
 	const estBuild = latest.native
 		? formatMs(latest.native.perFile.buildTimeMs * ESTIMATE_FILES)
@@ -214,10 +288,9 @@ ${rows}
 Metrics are normalized per file for cross-version comparability. Times above are for a full initial build — incremental rebuilds only re-parse changed files.
 `;
 
-	// Match the performance section from header to next h2 (## ) header or end.
-	// The lookahead must reject h3+ (###) so subsections like "### Lightweight
-	// Footprint" are preserved and not swallowed by the replacement.
-	const perfRegex = /## 📊 Performance\r?\n[\s\S]*?(?=\r?\n## (?!#)|$)/;
+	// Match the performance section from header to the next heading (h2 or h3) or end.
+	// Stops at ### subsections like "Lightweight Footprint" so they are preserved.
+	const perfRegex = /## 📊 Performance\r?\n[\s\S]*?(?=\r?\n##[# ]|$)/;
 	if (perfRegex.test(readme)) {
 		readme = readme.replace(perfRegex, perfSection);
 	} else {

From 1f6a2f405b61fe184bf67502ef81c7da22c53602 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]" <github-actions[bot]@users.noreply.github.com>
Date: Thu, 26 Feb 2026 01:49:08 -0700
Subject: [PATCH 7/8] ci: include version in automated benchmark commits and
 PRs

Extract version from benchmark result JSON and include it in branch
names, commit messages, PR titles, and PR bodies across all 4 benchmark
jobs (build, embedding, query, incremental).
---
 .github/workflows/benchmark.yml | 52 +++++++++++++++++++++++----------
 1 file changed, 36 insertions(+), 16 deletions(-)

diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
index 4d5b7d22..7933c256 100644
--- a/.github/workflows/benchmark.yml
+++ b/.github/workflows/benchmark.yml
@@ -54,25 +54,30 @@ jobs:
             echo "changed=true" >> "$GITHUB_OUTPUT"
           fi
 
+      - name: Extract version from result
+        id: version
+        run: echo "version=$(node -p "require('./benchmark-result.json').version")" >> "$GITHUB_OUTPUT"
+
       - name: Commit and push via PR
         if: steps.changes.outputs.changed == 'true'
         env:
           GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          VERSION: ${{ steps.version.outputs.version }}
         run: |
           git config user.name "github-actions[bot]"
           git config user.email "github-actions[bot]@users.noreply.github.com"
 
-          BRANCH="benchmark/build-$(date +%Y%m%d-%H%M%S)"
+          BRANCH="benchmark/build-v${VERSION}-$(date +%Y%m%d-%H%M%S)"
           git checkout -b "$BRANCH"
           git add generated/BUILD-BENCHMARKS.md README.md
-          git commit -m "docs: update build performance benchmarks"
+          git commit -m "docs: update build performance benchmarks (v${VERSION})"
           git push origin "$BRANCH"
 
           gh pr create \
             --base main \
             --head "$BRANCH" \
-            --title "docs: update build performance benchmarks" \
-            --body "Automated build benchmark update from workflow run [#${{ github.run_number }}](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})."
+            --title "docs: update build performance benchmarks (v${VERSION})" \
+            --body "Automated build benchmark update for **v${VERSION}** from workflow run [#${{ github.run_number }}](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})."
 
   embedding-benchmark:
     runs-on: ubuntu-latest
@@ -131,25 +136,30 @@ jobs:
             echo "changed=true" >> "$GITHUB_OUTPUT"
           fi
 
+      - name: Extract version from result
+        id: version
+        run: echo "version=$(node -p "require('./embedding-benchmark-result.json').version")" >> "$GITHUB_OUTPUT"
+
       - name: Commit and push via PR
         if: steps.changes.outputs.changed == 'true'
         env:
           GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          VERSION: ${{ steps.version.outputs.version }}
         run: |
           git config user.name "github-actions[bot]"
           git config user.email "github-actions[bot]@users.noreply.github.com"
 
-          BRANCH="benchmark/embedding-$(date +%Y%m%d-%H%M%S)"
+          BRANCH="benchmark/embedding-v${VERSION}-$(date +%Y%m%d-%H%M%S)"
           git checkout -b "$BRANCH"
           git add generated/EMBEDDING-BENCHMARKS.md
-          git commit -m "docs: update embedding benchmarks"
+          git commit -m "docs: update embedding benchmarks (v${VERSION})"
           git push origin "$BRANCH"
 
           gh pr create \
             --base main \
             --head "$BRANCH" \
-            --title "docs: update embedding benchmarks" \
-            --body "Automated embedding benchmark update from workflow run [#${{ github.run_number }}](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})."
+            --title "docs: update embedding benchmarks (v${VERSION})" \
+            --body "Automated embedding benchmark update for **v${VERSION}** from workflow run [#${{ github.run_number }}](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})."
 
   query-benchmark:
     runs-on: ubuntu-latest
@@ -196,25 +206,30 @@ jobs:
             echo "changed=true" >> "$GITHUB_OUTPUT"
           fi
 
+      - name: Extract version from result
+        id: version
+        run: echo "version=$(node -p "require('./query-benchmark-result.json').version")" >> "$GITHUB_OUTPUT"
+
       - name: Commit and push via PR
         if: steps.changes.outputs.changed == 'true'
         env:
           GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          VERSION: ${{ steps.version.outputs.version }}
         run: |
           git config user.name "github-actions[bot]"
           git config user.email "github-actions[bot]@users.noreply.github.com"
 
-          BRANCH="benchmark/query-$(date +%Y%m%d-%H%M%S)"
+          BRANCH="benchmark/query-v${VERSION}-$(date +%Y%m%d-%H%M%S)"
           git checkout -b "$BRANCH"
           git add generated/QUERY-BENCHMARKS.md
-          git commit -m "docs: update query benchmarks"
+          git commit -m "docs: update query benchmarks (v${VERSION})"
           git push origin "$BRANCH"
 
           gh pr create \
             --base main \
             --head "$BRANCH" \
-            --title "docs: update query benchmarks" \
-            --body "Automated query benchmark update from workflow run [#${{ github.run_number }}](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})."
+            --title "docs: update query benchmarks (v${VERSION})" \
+            --body "Automated query benchmark update for **v${VERSION}** from workflow run [#${{ github.run_number }}](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})."
 
   incremental-benchmark:
     runs-on: ubuntu-latest
@@ -261,22 +276,27 @@ jobs:
             echo "changed=true" >> "$GITHUB_OUTPUT"
           fi
 
+      - name: Extract version from result
+        id: version
+        run: echo "version=$(node -p "require('./incremental-benchmark-result.json').version")" >> "$GITHUB_OUTPUT"
+
       - name: Commit and push via PR
         if: steps.changes.outputs.changed == 'true'
         env:
           GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          VERSION: ${{ steps.version.outputs.version }}
         run: |
           git config user.name "github-actions[bot]"
           git config user.email "github-actions[bot]@users.noreply.github.com"
 
-          BRANCH="benchmark/incremental-$(date +%Y%m%d-%H%M%S)"
+          BRANCH="benchmark/incremental-v${VERSION}-$(date +%Y%m%d-%H%M%S)"
           git checkout -b "$BRANCH"
           git add generated/INCREMENTAL-BENCHMARKS.md
-          git commit -m "docs: update incremental benchmarks"
+          git commit -m "docs: update incremental benchmarks (v${VERSION})"
           git push origin "$BRANCH"
 
           gh pr create \
             --base main \
             --head "$BRANCH" \
-            --title "docs: update incremental benchmarks" \
-            --body "Automated incremental benchmark update from workflow run [#${{ github.run_number }}](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})."
+            --title "docs: update incremental benchmarks (v${VERSION})" \
+            --body "Automated incremental benchmark update for **v${VERSION}** from workflow run [#${{ github.run_number }}](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})."

From b273961388dcb2b7851b535e6fe95603f40adfa8 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]" <github-actions[bot]@users.noreply.github.com>
Date: Thu, 26 Feb 2026 01:57:21 -0700
Subject: [PATCH 8/8] fix: update remaining 19-tool references to 21-tool in
 README

---
 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index acabc794..14813f7a 100644
--- a/README.md
+++ b/README.md
@@ -97,7 +97,7 @@ That's it. No config files, no Docker, no JVM, no API keys, no accounts. The gra
 | **🔓** | **Zero-cost core, LLM-enhanced when you want** | Full graph analysis with no API keys, no accounts, no cost. Optionally bring your own LLM provider — your code only goes where you choose |
 | **🔬** | **Function-level, not just files** | Traces `handleAuth()` → `validateToken()` → `decryptJWT()` and shows 14 callers across 9 files break if `decryptJWT` changes |
 | **🏷️** | **Role classification** | Every symbol auto-tagged as `entry`/`core`/`utility`/`adapter`/`dead`/`leaf` — agents instantly know what they're looking at |
-| **🤖** | **Built for AI agents** | 19-tool [MCP server](https://modelcontextprotocol.io/) — AI assistants query your graph directly. Single-repo by default |
+| **🤖** | **Built for AI agents** | 21-tool [MCP server](https://modelcontextprotocol.io/) — AI assistants query your graph directly. Single-repo by default |
 | **🌐** | **Multi-language, one CLI** | JS/TS + Python + Go + Rust + Java + C# + PHP + Ruby + HCL in a single graph |
 | **💥** | **Git diff impact** | `codegraph diff-impact` shows changed functions, their callers, and full blast radius — enriched with historically coupled files from git co-change analysis. Ships with a GitHub Actions workflow |
 | **🧠** | **Semantic search** | Local embeddings by default, LLM-powered when opted in — multi-query with RRF ranking via `"auth; token; JWT"` |
@@ -144,7 +144,7 @@ After modifying code:
 Or connect directly via MCP:
 
 ```bash
-codegraph mcp          # 19-tool MCP server — AI queries the graph directly
+codegraph mcp          # 21-tool MCP server — AI queries the graph directly
 ```
 
 Full agent setup: [AI Agent Guide](docs/guides/ai-agent-guide.md) &middot; [CLAUDE.md template](docs/guides/ai-agent-guide.md#claudemd-template)
@@ -170,7 +170,7 @@ Full agent setup: [AI Agent Guide](docs/guides/ai-agent-guide.md) &middot; [CLAU
 | 📤 | **Export** | DOT (Graphviz), Mermaid, and JSON graph export |
 | 🧠 | **Semantic search** | Embeddings-powered natural language search with multi-query RRF ranking |
 | 👀 | **Watch mode** | Incrementally update the graph as files change |
-| 🤖 | **MCP server** | 19-tool MCP server for AI assistants; single-repo by default, opt-in multi-repo |
+| 🤖 | **MCP server** | 21-tool MCP server for AI assistants; single-repo by default, opt-in multi-repo |
 | ⚡ | **Always fresh** | Three-tier incremental detection — sub-second rebuilds even on large codebases |
 
 See [docs/examples](docs/examples) for real-world CLI and MCP usage examples.