feat: add PageRank ranking, architecture summary, and token-budgeted responses by maplenk · Pull Request #147 · DeusData/codebase-memory-mcp

maplenk · 2026-03-26T06:52:32Z

Summary

Adds structural importance ranking (PageRank), a one-call architecture overview tool, and token-budgeted responses to prevent context window overflow.

New tools

get_architecture_summary — Structured markdown overview of the project: top files by connectivity, route→controller→service chains, Louvain clusters, high fan-in functions, entry points. Supports max_tokens for output size control and focus for narrowing to a specific area.
get_key_symbols — Returns top-K functions/classes ranked by PageRank. Enables "what are the most important functions in this codebase?" queries.

Enhanced tools

search_graph — New ranked parameter (default true). When enabled, results are sorted by PageRank score. PageRank included in response JSON.
trace_call_path — New ranked parameter. BFS results post-sorted by PageRank when enabled.
search_graph, trace_call_path, query_graph — New max_tokens parameter. Two-tier truncation: top 5 results in full detail, remainder as compact signatures. Emits truncated, total_results, shown metadata.

Implementation details

PageRank: standard iterative algorithm (d=0.85, 20 iterations) with dangling node handling. Persisted in node_scores table. Runs as pipeline post-processing step. Non-fatal on failure.
Architecture summary: SQL queries against existing graph — no new indexing. Hash table lookups for O(1) file resolution. yyjson route property extraction.
Token budget: build-then-check approach (zero overhead on happy path). Compact chain summaries (A → ... (3 more) → Z) for truncated traces.
WAL-mode fix: read-only query opens use immutable SQLite URIs (fixes corrupt DB misclassification).

Tests

test_store_arch.c: architecture summary (basic, focus, many_files, cluster_growth)
test_store_search.c: PageRank computation + ranking
test_mcp.c: get_key_symbols, ranked search, truncation for all 3 tools
test_pipeline.c: PageRank in pipeline
test_integration.c: live index tests

Motivation

AI coding agents consume 7–38% of context window per structural query. PageRank ranking ensures the most important results appear first. Token budgets let agents request "give me the answer in under 2000 tokens." Architecture summaries eliminate entire categories of exploratory queries — one call replaces 3–5 tool invocations.

Benchmarked on a 32K-node / 70K-edge production Laravel codebase.

Part 1 of a 4-PR series. PRs 2–4 build on this foundation.

Built with OpenAI Codex and Claude Code.

All install paths, download URLs, self-update checks, CI workflows, and documentation now reference maplenk/codebase-memory-mcp so the fork can operate independently with its own releases while upstream PRs (DeusData#147-DeusData#150) are pending. Upstream attribution in README fork section and LICENSE preserved. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

DeusData · 2026-03-26T11:00:22Z

Thanks @maplenk — PageRank for code ranking and architecture summaries is a great idea. Large PR — will review carefully.

Account for optional signatures in the search_graph and trace_call_path size estimators, and improve compact trace chains to report omitted-node counts. This also documents the normal-path output enrichment introduced with Task 4: search_graph results now include file_path, start_line, end_line, and signature, and trace_call_path hop items now include file_path, start_line, and signature.

- Guard cbm_mcp_text_result() against NULL text - Fix memory leak in handle_get_key_symbols() REQUIRE_STORE path (focus not freed) - Wire qn_pattern through handle_search_graph() - Fix OOM infinite loop in markdown_builder_reserve() - Return 0 instead of CBM_STORE_ERR from summary_count_nodes() on prepare fail Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

DeusData · 2026-03-27T12:54:59Z

Thanks for the effort here, @maplenk. I want to give honest feedback on the core premise before we go further.

PageRank is the wrong algorithm for code graphs. PageRank measures "if you randomly follow edges, where do you end up?" On the web, being linked-to is an editorial signal. In a call graph, being called by many things means you're a leaf utility — log.Error(), fmt.Sprintf(), strings.Contains(). These would rank highest, which is the opposite of architecturally important code. Handlers, orchestrators, and pipeline stages — the code that actually matters — typically have few callers but many callees. PageRank would rank them low.

We already expose min_degree/max_degree on search_graph, which gives you direct fan-in/fan-out filtering with zero computational overhead. That covers the "find heavily-connected code" use case without the conceptual mismatch.

The architecture summary and token-budget features are separate ideas worth discussing on their own merits — but they're bundled here with PageRank as the foundation, which makes it hard to evaluate them independently. Could you split those into standalone PRs?

Also noting: this PR modifies store.c (+1,587 lines) and mcp.c (+944 lines), which are core files. Changes of that magnitude to the store and MCP layers need very careful review, especially since this is part 1 of 4 — I need to understand the full scope before committing to a direction.

maplenk · 2026-03-27T13:11:55Z

Hey @DeusData
Thanks for the details.

Will split the other features first and check on the PageRank algorithm as well!

DeusData · 2026-04-02T23:36:45Z

Thanks @maplenk for the thorough work here — the benchmarking on a real Laravel codebase and the detailed writeup are appreciated.

After evaluation, we're going to pass on this for now. Here's our reasoning:

PageRank ranking: For code graphs, simple in-degree counting gives nearly identical results to PageRank because code structure is hierarchical and predictable (unlike web link graphs where transitive weighting matters). If we need result ranking, ORDER BY degree DESC is a 1-line SQL change vs a new pipeline step + table + iterative algorithm.

get_architecture_summary / get_key_symbols: These overlap with existing tools — get_architecture already provides project summaries, and search_graph(min_degree=10) finds the most-connected symbols. We're trying to avoid tool inflation (currently at 14) since each tool adds cognitive load for LLMs parsing the tool list at session start.

max_tokens truncation: Agents already control result size via limit, offset, and depth parameters. Server-side truncation with opinionated formatting ("top 5 full, rest compact") removes control from the agent, which knows its own context budget better than we do.

These are reasonable ideas — we may revisit ranking or token budgets if users report specific pain points. For now the existing primitives cover the use cases.

maplenk · 2026-04-03T01:37:11Z

Hey!!

Thanks for the details.
I understand, and realised that too have made some changes to those and will open a new issue with contribution guidelines soon!!

Please share your inputs on those 😀

This was referenced Mar 26, 2026

feat: add blast radius analysis with risk scoring #148

Closed

feat: add compound query tools (explore, understand, prepare_change) #149

Closed

feat: add session memory tracking, proactive hints, and context recovery #150

Closed

DeusData added the enhancement New feature or request label Mar 26, 2026

Naman Khator and others added 6 commits March 27, 2026 18:10

Add architecture summary MCP tool

ac9ce21

Add PageRank ranking to graph tools

b6f16cf

Make PageRank failures non-fatal during indexing

0af23ec

Fix read-only query opens for snapshot DBs

408be51

maplenk force-pushed the feat/pagerank-arch-summary branch from 1e02f10 to f3e93e7 Compare March 27, 2026 12:46

DeusData mentioned this pull request Mar 27, 2026

supercharge codebase-memory-mcp: streamline and consolidate api, autoindexing, PageRank, dependency indexing, speedup, cli config, autotune #151

Closed

DeusData closed this Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add PageRank ranking, architecture summary, and token-budgeted responses#147

feat: add PageRank ranking, architecture summary, and token-budgeted responses#147
maplenk wants to merge 6 commits intoDeusData:mainfrom
maplenk:feat/pagerank-arch-summary

maplenk commented Mar 26, 2026 •

edited

Loading

Uh oh!

DeusData commented Mar 26, 2026

Uh oh!

DeusData commented Mar 27, 2026

Uh oh!

maplenk commented Mar 27, 2026

Uh oh!

DeusData commented Apr 2, 2026

Uh oh!

maplenk commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maplenk commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New tools

Enhanced tools

Implementation details

Tests

Motivation

Uh oh!

DeusData commented Mar 26, 2026

Uh oh!

DeusData commented Mar 27, 2026

Uh oh!

maplenk commented Mar 27, 2026

Uh oh!

DeusData commented Apr 2, 2026

Uh oh!

maplenk commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maplenk commented Mar 26, 2026 •

edited

Loading