From 21a670847ccfdf96befc7f3ca2142df1ade8a018 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Sun, 22 Feb 2026 02:49:26 -0700 Subject: [PATCH 1/3] docs: add competitive analysis and foundation principles Analyze 21 code intelligence tools, rank codegraph #7/22, and establish 8 core principles (zero-infrastructure, dual engine, confidence scoring, incremental builds, embeddable-first, single registry, security defaults, scope boundaries). --- COMPETITIVE_ANALYSIS.md | 177 ++++++++++++++++++++++++++++++++++++++++ FOUNDATION.md | 169 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 346 insertions(+) create mode 100644 COMPETITIVE_ANALYSIS.md create mode 100644 FOUNDATION.md diff --git a/COMPETITIVE_ANALYSIS.md b/COMPETITIVE_ANALYSIS.md new file mode 100644 index 00000000..d2cf7d94 --- /dev/null +++ b/COMPETITIVE_ANALYSIS.md @@ -0,0 +1,177 @@ +# Competitive Analysis — Code Graph / Code Intelligence Tools + +**Date:** 2026-02-22 +**Scope:** 21 code analysis tools compared against `@optave/codegraph` + +--- + +## Overall Ranking + +Ranked by weighted score across 6 dimensions (each 1–5): + +| # | Score | Project | Stars | Lang | License | Summary | +|---|-------|---------|-------|------|---------|---------| +| 1 | 4.5 | [vitali87/code-graph-rag](https://github.com/vitali87/code-graph-rag) | 1,916 | Python | MIT | Graph RAG with Memgraph, multi-provider AI, code editing, semantic search, MCP | +| 2 | 4.2 | [seatedro/glimpse](https://github.com/seatedro/glimpse) | 349 | Rust | MIT | Clipboard-first codebase-to-LLM tool with call graphs, token counting, LSP resolution | +| 3 | 4.0 | [SimplyLiz/CodeMCP (CKB)](https://github.com/SimplyLiz/CodeMCP) | 59 | Go | Custom | SCIP-based indexing, compound operations (83% token savings), CODEOWNERS, secret scanning | +| 4 | 3.9 | [harshkedia177/axon](https://github.com/harshkedia177/axon) | 29 | Python | None | 11-phase pipeline, KuzuDB, Leiden community detection, dead code, change coupling | +| 5 | 3.8 | [anrgct/autodev-codebase](https://github.com/anrgct/autodev-codebase) | 111 | TypeScript | None | 40+ languages, 7 embedding providers, Cytoscape.js visualization, LLM reranking | +| 6 | 3.7 | [Anandb71/arbor](https://github.com/Anandb71/arbor) | 85 | Rust | MIT | Native GUI, confidence scoring, architectural role classification, fuzzy search, MCP | +| **7** | **3.6** | **[@optave/codegraph](https://github.com/optave/codegraph)** | — | **JS/Rust** | **Apache-2.0** | **Dual engine (native Rust + WASM), 11 languages, SQLite, MCP, semantic search, zero-cloud** | +| 8 | 3.4 | [Durafen/Claude-code-memory](https://github.com/Durafen/Claude-code-memory) | 72 | Python | None | Memory Guard quality gate, persistent codebase memory, Voyage AI + Qdrant | +| 9 | 3.3 | [NeuralRays/codexray](https://github.com/NeuralRays/codexray) | 2 | TypeScript | MIT | 16 MCP tools, TF-IDF semantic search (~50MB), dead code, complexity, path finding | +| 10 | 3.2 | [al1-nasir/codegraph-cli](https://github.com/al1-nasir/codegraph-cli) | 11 | Python | MIT | CrewAI multi-agent system, 6 LLM providers, browser explorer, DOCX export | +| 11 | 3.1 | [anasdayeh/claude-context-local](https://github.com/anasdayeh/claude-context-local) | 0 | Python | None | 100% local, Merkle DAG incremental indexing, sharded FAISS, hybrid BM25+vector, GPU accel | +| 12 | 3.0 | [Vasu014/loregrep](https://github.com/Vasu014/loregrep) | 12 | Rust | Apache-2.0 | In-memory index library, Rust + Python bindings, AI-tool-ready schemas | +| 13 | 2.9 | [rahulvgmail/CodeInteliMCP](https://github.com/rahulvgmail/CodeInteliMCP) | 8 | Python | None | DuckDB + ChromaDB (zero Docker), multi-repo, lightweight embedded DBs | +| 14 | 2.8 | [Bikach/codeGraph](https://github.com/Bikach/codeGraph) | 6 | TypeScript | MIT | Neo4j graph, Claude Code slash commands, Kotlin support, 40-50% cost reduction | +| 15 | 2.7 | [yumeiriowl/repo-graphrag-mcp](https://github.com/yumeiriowl/repo-graphrag-mcp) | 3 | Python | MIT | LightRAG + tree-sitter, entity merge (code ↔ docs), implementation planning tool | +| 16 | 2.6 | [0xjcf/MCP_CodeAnalysis](https://github.com/0xjcf/MCP_CodeAnalysis) | 7 | Python/TS | None | Stateful tools (XState), Redis sessions, socio-technical analysis, dual language impl | +| 17 | 2.5 | [RaheesAhmed/code-context-mcp](https://github.com/RaheesAhmed/code-context-mcp) | 0 | Python | MIT | Security pattern detection, auto architecture diagrams, code flow tracing | +| 18 | 2.4 | [shantham/codegraph](https://github.com/shantham/codegraph) | 0 | TypeScript | MIT | Polished `npx` one-command installer, sqlite-vss, 7 MCP tools | +| 19 | 2.3 | [0xd219b/codegraph](https://github.com/0xd219b/codegraph) | 0 | Rust | None | Pure Rust, HTTP server mode, Java + Go support | +| 20 | 2.1 | [floydw1234/badger-graph](https://github.com/floydw1234/badger-graph) | 0 | Python | None | Dgraph backend (Docker), C struct field access tracking | +| 21 | 2.0 | [khushil/code-graph-rag](https://github.com/khushil/code-graph-rag) | 0 | Python | MIT | Fork of vitali87/code-graph-rag with no modifications | +| 22 | 1.8 | [m3et/CodeRAG](https://github.com/m3et/CodeRAG) | 0 | Python | None | Iterative RAG with self-reflection, ChromaDB, Azure OpenAI dependent | + +--- + +## Scoring Breakdown + +| # | Project | Features | Analysis Depth | Deploy Simplicity | Lang Support | Code Quality | Community | +|---|---------|----------|---------------|-------------------|-------------|-------------|-----------| +| 1 | code-graph-rag | 5 | 4 | 3 | 4 | 4 | 5 | +| 2 | glimpse | 4 | 4 | 5 | 3 | 5 | 5 | +| 3 | CKB | 5 | 5 | 4 | 3 | 4 | 3 | +| 4 | axon | 5 | 5 | 4 | 2 | 4 | 2 | +| 5 | autodev-codebase | 5 | 3 | 3 | 5 | 3 | 4 | +| 6 | arbor | 4 | 4 | 5 | 4 | 5 | 3 | +| **7** | **codegraph (us)** | **3** | **3** | **5** | **4** | **4** | **2** | +| 8 | Claude-code-memory | 4 | 3 | 3 | 3 | 4 | 3 | +| 9 | codexray | 5 | 4 | 4 | 4 | 3 | 1 | +| 10 | codegraph-cli | 5 | 3 | 3 | 2 | 3 | 2 | +| 11 | claude-context-local | 4 | 3 | 3 | 4 | 4 | 1 | +| 12 | loregrep | 3 | 3 | 4 | 3 | 5 | 2 | +| 13 | CodeInteliMCP | 3 | 3 | 4 | 3 | 3 | 1 | +| 14 | Bikach/codeGraph | 3 | 3 | 3 | 2 | 3 | 1 | +| 15 | repo-graphrag-mcp | 3 | 3 | 3 | 4 | 3 | 1 | +| 16 | MCP_CodeAnalysis | 4 | 3 | 3 | 2 | 3 | 1 | +| 17 | code-context-mcp | 4 | 2 | 3 | 2 | 2 | 1 | +| 18 | shantham/codegraph | 3 | 2 | 4 | 4 | 3 | 1 | +| 19 | 0xd219b/codegraph | 2 | 3 | 4 | 1 | 4 | 1 | +| 20 | badger-graph | 2 | 2 | 2 | 1 | 2 | 1 | +| 21 | khushil/code-graph-rag | 5 | 4 | 3 | 4 | 4 | 1 | +| 22 | CodeRAG | 3 | 2 | 2 | 1 | 2 | 1 | + +**Scoring criteria:** +- **Features** (1-5): breadth of tools, MCP integration, search, visualization, export +- **Analysis Depth** (1-5): how deep the code analysis goes (dead code, complexity, flow tracing, coupling) +- **Deploy Simplicity** (1-5): ease of setup — zero Docker = 5, requires Docker = 3, complex multi-service = 1 +- **Lang Support** (1-5): number of well-supported programming languages +- **Code Quality** (1-5): architecture, performance characteristics, engineering rigor +- **Community** (1-5): stars, contributors, activity, documentation quality + +--- + +## Where Codegraph Wins + +| Strength | Details | +|----------|---------| +| **Zero-dependency deployment** | `npm install` and done. No Docker, no cloud, no API keys needed. Most competitors require Docker (Memgraph, Neo4j, Dgraph, Qdrant) or cloud APIs | +| **Dual engine architecture** | Only project with native Rust (napi-rs) + automatic WASM fallback. Others are pure Rust OR pure JS/Python — never both | +| **Single-repo MCP isolation** | Security-conscious default: tools have no `repo` property unless `--multi-repo` is explicitly enabled. Most competitors default to exposing everything | +| **Incremental builds** | File-hash-based skip of unchanged files. Some competitors re-index everything | +| **Platform binaries** | Published `@optave/codegraph-{platform}-{arch}` optional packages — true npm-native distribution | +| **Import resolution depth** | 6-level priority system with confidence scoring — more sophisticated than most competitors' resolution | + +--- + +## Where Codegraph Loses + +### vs code-graph-rag (#1, 1916 stars) +- **Graph query expressiveness**: Memgraph + Cypher enables arbitrary graph traversals; our SQL queries are more rigid +- **AI-powered code editing**: they can surgically edit functions via AST targeting with visual diffs +- **Provider flexibility**: they support Gemini/OpenAI/Claude/Ollama and can mix providers per task +- **Community**: 1,916 stars — orders of magnitude more traction + +### vs glimpse (#2, 349 stars) +- **LLM workflow optimization**: clipboard-first output + token counting + XML output mode — purpose-built for "code → LLM context" +- **LSP-based call resolution**: compiler-grade accuracy vs our tree-sitter heuristic approach +- **Web content processing**: can fetch URLs and convert HTML to markdown for context + +### vs CKB (#3, 59 stars) +- **Indexing accuracy**: SCIP provides compiler-grade cross-file references (type-aware), fundamentally more accurate than tree-sitter for supported languages +- **Compound operations**: `explore`/`understand`/`prepareChange` batch multiple queries into one call — 83% token reduction, 60-70% fewer tool calls +- **CODEOWNERS + secret scanning**: enterprise features we lack entirely + +### vs axon (#4, 29 stars) +- **Analysis depth**: their 11-phase pipeline includes community detection (Leiden), execution flow tracing, git change coupling, dead code detection — all features we lack +- **Graph database**: KuzuDB with native Cypher is more expressive for complex graph queries than our SQLite +- **Branch structural diff**: compares code structure between branches using git worktrees + +### vs autodev-codebase (#5, 111 stars) +- **Language breadth**: 40+ languages vs our 11 +- **Interactive visualization**: Cytoscape.js call graph explorer in the browser — we only have static DOT/Mermaid +- **LLM reranking**: secondary LLM pass to improve search relevance — more sophisticated retrieval pipeline + +### vs arbor (#6, 85 stars) +- **Native GUI**: desktop app for interactive impact analysis (we're CLI/MCP only) +- **Confidence scoring surfaced to users**: every result shows High/Medium/Low confidence +- **Architectural role classification**: auto-tags symbols as Entry Point / Core Logic / Utility / Adapter +- **Fuzzy symbol search**: typo tolerance with Jaro-Winkler matching + +--- + +## Features to Adopt — Priority Roadmap + +### Tier 1: High impact, low effort +| Feature | Inspired by | Why | +|---------|------------|-----| +| **Dead code detection** | axon, codexray, CKB | We have the graph — find nodes with zero incoming edges (minus entry points/exports). Agents constantly ask "is this used?" | +| **Fuzzy symbol search** | arbor | Add Levenshtein/Jaro-Winkler to `fn` command. Currently requires exact match | +| **Expose confidence scores** | arbor | Already computed internally in import resolution — just surface them | +| **Shortest path A→B** | codexray, arbor | BFS on existing edges table. We have `fn` for single chains but no A→B pathfinding | + +### Tier 2: High impact, medium effort +| Feature | Inspired by | Why | +|---------|------------|-----| +| **Compound MCP tools** | CKB | `explore`/`understand` meta-tools that batch deps + fn + map into single responses. Biggest token-savings opportunity | +| **Token counting on responses** | glimpse, arbor | tiktoken-based counts so agents know context budget consumed | +| **Node classification** | arbor | Auto-tag Entry Point / Core / Utility / Adapter from in-degree/out-degree patterns | +| **TF-IDF lightweight search** | codexray | SQLite FTS5 + TF-IDF as a middle tier (~50MB) between "no search" and full transformers (~500MB) | + +### Tier 3: High impact, high effort +| Feature | Inspired by | Why | +|---------|------------|-----| +| **Interactive HTML visualization** | autodev-codebase, codegraph-cli | `codegraph viz` → opens interactive vis.js/Cytoscape.js graph in browser | +| **Git change coupling** | axon | Analyze git history for files that always change together — enhances `diff-impact` | +| **Community detection** | axon | Leiden algorithm to discover natural module boundaries vs actual file organization | +| **Execution flow tracing** | axon, code-context-mcp | Framework-aware entry point detection + BFS flow tracing | +| **Security pattern scanning** | CKB, code-context-mcp | Detect hardcoded secrets, SQL injection patterns, XSS in parsed code | + +### Not worth copying +| Feature | Why skip | +|---------|----------| +| Memgraph/Neo4j/KuzuDB | Our SQLite = zero Docker, simpler deployment. Query gap matters less than simplicity | +| Multi-provider AI | We're deliberately cloud-free — that's a feature, not a limitation | +| SCIP indexing | Would require maintaining SCIP toolchains per language. Tree-sitter + native Rust is the right bet | +| CrewAI multi-agent | Overengineered for a code analysis tool. Keep the scope focused | +| Clipboard/LLM-dump mode | Different product category (glimpse). We're a graph tool, not a context-packer | + +--- + +## Irrelevant Repos (excluded from ranking) + +These repos from the initial list were not code analysis / graph tools: + +| Repo | What it actually is | +|------|-------------------| +| [susliko/tla.nvim](https://github.com/susliko/tla.nvim) | TLA+/PlusCal Neovim plugin for formal verification | +| [akaash-nigam/AxionApps](https://github.com/akaash-nigam/AxionApps) | Portfolio of 17 Indian social impact mobile apps | +| [jasonjckn/tree-sitter-clojure](https://github.com/jasonjckn/tree-sitter-clojure) | Fork of Clojure tree-sitter grammar, inactive since 2022 | +| [omkargade04/sentinel-agent](https://github.com/omkargade04/sentinel-agent) | AI-powered GitHub PR reviewer agent | +| [rupurt/tree-sitter-graph-nix](https://github.com/rupurt/tree-sitter-graph-nix) | Nix flake packaging for tree-sitter-graph (1.8KB of Nix) | +| [shandianchengzi/tree_sitter_DataExtractor](https://github.com/shandianchengzi/tree_sitter_DataExtractor) | Academic research on program graph representations for GNNs | +| [hasssanezzz/GoTypeGraph](https://github.com/hasssanezzz/GoTypeGraph) | Go-only struct/interface relationship visualizer | +| [romiras/py-cmm-parser](https://github.com/romiras/py-cmm-parser) | Python-only canonical metadata parser with Pyright LSP | +| [OrkeeAI/orkee](https://github.com/OrkeeAI/orkee) | AI agent orchestration platform (CLI/TUI/Web/Desktop) — adjacent but different category | diff --git a/FOUNDATION.md b/FOUNDATION.md new file mode 100644 index 00000000..1e078bf8 --- /dev/null +++ b/FOUNDATION.md @@ -0,0 +1,169 @@ +# Codegraph Foundation Document + +**Project:** `@optave/codegraph` +**License:** Apache-2.0 +**Established:** 2026 | Optave AI Solutions Inc. + +--- + +## Why Codegraph Exists + +There are 20+ code analysis and code graph tools in the open-source ecosystem. Most require Docker, Python environments, cloud API keys, or external databases. None of them ship as a single npm package with native performance. + +Codegraph exists to be **the code intelligence engine for the JavaScript ecosystem** — the one you `npm install` and it just works, on every platform, with nothing else to set up. + +--- + +## Core Principles + +These principles define what codegraph is and is not. Every feature decision, PR review, and architectural choice should be measured against them. + +### 1. Zero-infrastructure deployment + +**Codegraph must never require anything beyond `npm install`.** + +No Docker. No external databases. No cloud accounts. No API keys for core functionality. No Python. No Go toolchain. No manual compilation steps. + +SQLite is our database because it's embedded. WASM grammars are our fallback because they run everywhere Node.js runs. Optional dependencies (`@huggingface/transformers`, `@modelcontextprotocol/sdk`) are lazy-loaded and degrade gracefully. + +This is our single most important differentiator. Every competitor that adds Docker to their install instructions loses users we should capture. + +*Test: can a developer on a fresh machine run `npm install @optave/codegraph && codegraph build .` with zero prior setup? If not, we broke this principle.* + +### 2. Native speed, universal reach + +**The dual engine is our architectural moat.** + +Native Rust via napi-rs (rayon-parallelized tree-sitter) for platforms we support. Automatic WASM fallback for everything else. The user never chooses — `--engine auto` detects the right path. + +We publish platform-specific optional packages (`@optave/codegraph-{platform}-{arch}`) that npm resolves automatically. This gives us 10-100x parsing speed on supported platforms with zero configuration, while never breaking on unsupported ones. + +No other tool in this space has both native performance and universal portability in a single npm package. + +*Test: does `codegraph build .` work on macOS ARM, macOS x64, Linux x64, and Windows x64 with native speed — and still work (slower) on any other Node.js-capable platform?* + +### 3. Confidence over noise + +**Every result should tell you how much to trust it.** + +Our 6-level import resolution scores every edge 0.0-1.0. Most tools return all matches (noise) or pick the first one (often wrong). We quantify uncertainty. + +This principle extends beyond import resolution. When we add features — dead code detection, impact analysis, search results — they should include confidence or relevance scores. AI agents and developers both benefit from ranked, scored results over raw dumps. + +*Test: does every query result include enough context for the consumer to judge its reliability?* + +### 4. Incremental by default + +**Never re-parse what hasn't changed.** + +File-level MD5 hashing tracks what changed between builds. Only modified files get re-parsed, and their stale nodes/edges are cleaned before re-insertion. This makes watch-mode and AI-agent loops practical — rebuilds drop from seconds to milliseconds. + +This is not a feature flag. It's the default behavior. The graph is always fresh with minimum work. + +*Test: after changing one file in a 1000-file project, does `codegraph build .` complete in under 500ms?* + +### 5. Embeddable first, CLI second + +**Codegraph is a library that happens to have a CLI, not the other way around.** + +Every capability is available through the programmatic API (`src/index.js`). The CLI (`src/cli.js`) and MCP server (`src/mcp.js`) are thin wrappers. This means codegraph can be imported into VS Code extensions, Electron apps, CI pipelines, other MCP servers, and any JavaScript tooling. + +Most competitors are CLI-first or server-first. We are library-first. The API surface is the product; the CLI is a convenience. + +*Test: can another npm package `import { buildGraph, queryFunction } from '@optave/codegraph'` and use the full feature set programmatically?* + +### 6. One registry, one schema, no magic + +**Adding a language is one data entry, not an architecture change.** + +`LANGUAGE_REGISTRY` in `parser.js` is a declarative list mapping each language to `{ id, extensions, grammarFile, extractor, required }`. `EXTENSIONS` in `constants.js` is derived from it. `SYMBOL_KINDS` in `queries.js` is the exhaustive list of node types. + +No language gets special-cased. No hidden configuration. No scattered if-else chains. When someone wants to add Kotlin or Swift support, they add one registry entry and one extractor function. + +*Test: can a contributor add a new language in under 100 lines of code, touching at most 2 files?* + +### 7. Security-conscious defaults + +**Multi-repo access is opt-in, never opt-on.** + +The MCP server defaults to single-repo mode. Tools have no `repo` property and `list_repos` is not exposed. Only explicit `--multi-repo` or `--repos` flags enable cross-repo access. `allowedRepos` restricts what an MCP client can see. + +Credentials are resolved through `apiKeyCommand` (shelling out to external secret managers via `execFileSync` with no shell) — never stored in config files. + +This matters because codegraph runs inside AI agents that have broad tool access. Leaking cross-repo data or credentials through an MCP server is a real attack surface. + +*Test: does a default `codegraph mcp` invocation expose only the single repo it was pointed at?* + +### 8. Honest about what we're not + +**We are not a graph database. We are not a RAG system. We are not an AI agent.** + +We use SQLite, not Neo4j/Memgraph/KuzuDB. Our queries are hand-written SQL, not Cypher. This is intentional — it keeps us at zero infrastructure. + +We offer semantic search via optional embeddings, but we are not a RAG pipeline. We don't generate code, answer questions, or translate natural language to queries. + +We expose tools to AI agents via MCP, but we are not an agent ourselves. We don't make decisions, run multi-step workflows, or modify code. + +Staying in our lane means we can be embedded inside tools that do those things — without competing with them or duplicating their responsibilities. + +--- + +## What We Build vs. What We Don't + +### We will build + +- Features that deepen **structural code understanding**: dead code detection, complexity metrics, path finding, community detection — all derivable from our existing graph +- Features that improve **result quality**: fuzzy search, confidence scoring, node classification, compound queries that reduce agent round-trips +- Features that improve **speed**: faster native parsing, smarter incremental builds, lighter-weight search alternatives (FTS5/TF-IDF alongside full embeddings) +- Features that improve **embeddability**: better programmatic API, streaming results, output format options + +### We will not build + +- External database backends (Memgraph, Neo4j, Qdrant, etc.) — violates Principle 1 +- Cloud API integrations for core functionality — violates Principle 1 +- AI-powered code generation or editing — violates Principle 8 +- Multi-agent orchestration — violates Principle 8 +- Native desktop GUI — outside our lane; we're a library +- Features that require non-npm dependencies — violates Principle 1 + +--- + +## Competitive Position + +As of February 2026, codegraph is **#7 out of 22** in the code intelligence tool space (see [COMPETITIVE_ANALYSIS.md](./COMPETITIVE_ANALYSIS.md)). + +Six tools rank above us on feature breadth and community size. But none of them occupy our niche: **the npm-native, zero-config, dual-engine code intelligence library.** + +| What competitors need | What codegraph needs | +|-----------------------|----------------------| +| Docker (Memgraph, Neo4j, Qdrant, Dgraph) | Nothing | +| Python environment | Nothing | +| Cloud API keys (OpenAI, Gemini, Voyage AI) | Nothing | +| Manual Rust/Go compilation | Nothing | +| External secret management setup | Nothing | +| `npm install @optave/codegraph` | That's it | + +Our path to #1 is not feature parity with every competitor. It's making codegraph **the obvious default for any JavaScript developer or tool that needs code intelligence** — because it's the only one that doesn't ask them to leave the npm ecosystem. + +--- + +## Landscape License Overview + +How the competitive field is licensed (relevant for understanding what's available to learn from, fork, or integrate): + +| License | Count | Projects | +|---------|-------|----------| +| **MIT** | 10 | [code-graph-rag](https://github.com/vitali87/code-graph-rag), [glimpse](https://github.com/seatedro/glimpse), [arbor](https://github.com/Anandb71/arbor), [codexray](https://github.com/NeuralRays/codexray), [codegraph-cli](https://github.com/al1-nasir/codegraph-cli), [Bikach/codeGraph](https://github.com/Bikach/codeGraph), [repo-graphrag-mcp](https://github.com/yumeiriowl/repo-graphrag-mcp), [code-context-mcp](https://github.com/RaheesAhmed/code-context-mcp), [shantham/codegraph](https://github.com/shantham/codegraph), [khushil/code-graph-rag](https://github.com/khushil/code-graph-rag) | +| **Apache-2.0** | 2 | **[@optave/codegraph](https://github.com/optave/codegraph)** (us), [loregrep](https://github.com/Vasu014/loregrep) | +| **Custom/Other** | 1 | [CodeMCP/CKB](https://github.com/SimplyLiz/CodeMCP) (non-standard license) | +| **No license** | 9 | [axon](https://github.com/harshkedia177/axon), [autodev-codebase](https://github.com/anrgct/autodev-codebase), [Claude-code-memory](https://github.com/Durafen/Claude-code-memory), [claude-context-local](https://github.com/anasdayeh/claude-context-local), [CodeInteliMCP](https://github.com/rahulvgmail/CodeInteliMCP), [MCP_CodeAnalysis](https://github.com/0xjcf/MCP_CodeAnalysis), [0xd219b/codegraph](https://github.com/0xd219b/codegraph), [badger-graph](https://github.com/floydw1234/badger-graph), [CodeRAG](https://github.com/m3et/CodeRAG) | + +**Key implications:** +- MIT-licensed projects (10/22) are fully open — their approaches, algorithms, and code can be studied and adapted freely +- 9 projects have **no license at all**, meaning they are proprietary by default under copyright law — their code cannot legally be copied or forked, even though it's publicly visible on GitHub +- CKB (CodeMCP) has a custom license that should be reviewed before any integration or inspiration +- Our Apache-2.0 license provides patent protection to users (stronger than MIT) while remaining fully open source — a deliberate choice for enterprise adoption + +--- + +*This document should be revisited when the competitive landscape shifts meaningfully, or when a proposed feature contradicts one of the core principles above.* From aacb44c9f560b3a26269b50cbd5b50839cdeeb3a Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Sun, 22 Feb 2026 02:54:05 -0700 Subject: [PATCH 2/3] docs: add v1.5.0 release notes to CHANGELOG --- CHANGELOG.md | 37 ++++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index e1d53e5e..d81a8f2d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,42 @@ All notable changes to this project will be documented in this file. See [commit-and-tag-version](https://github.com/absolute-version/commit-and-tag-version) for commit guidelines. +## [1.5.0](https://github.com/optave/codegraph/compare/v1.4.0...v1.5.0) (2026-02-22) + +**Phase 2.5 — Multi-Repo MCP & Structural Analysis.** This release adds multi-repo support for AI agents, structural analysis with architectural metrics, and hardens security across the MCP server and SQL layers. + +### ⚠ BREAKING CHANGES + +* **parser:** Node kinds now use language-native types — Go structs → `struct`, Rust structs/enums/traits → `struct`/`enum`/`trait`, Java enums → `enum`, C# structs/records/enums → `struct`/`record`/`enum`, PHP traits/enums → `trait`/`enum`, Ruby modules → `module`. Rebuild required: `codegraph build --no-incremental`. ([72535fb](https://github.com/optave/codegraph/commit/72535fba44e56312fb8d5b21e19bdcbec1ea9f5e)) + +### Features + +* **mcp:** add multi-repo MCP support with global registry at `~/.codegraph/registry.json` — optional `repo` param on all 11 tools, new `list_repos` tool, auto-register on build ([54ea9f6](https://github.com/optave/codegraph/commit/54ea9f6c497f1c7ad4c2f0199b4a951af0a51c62)) +* **mcp:** default MCP server to single-repo mode for security isolation — multi-repo access requires explicit `--multi-repo` or `--repos` opt-in ([49c07ad](https://github.com/optave/codegraph/commit/49c07ad725421710af3dd3cce5b3fc7028ab94a8)) +* **registry:** harden multi-repo registry — `pruneRegistry()` removes stale entries, `--repos` allowlist for repo-level access control, auto-suffix name collisions ([a413ea7](https://github.com/optave/codegraph/commit/a413ea73ff2ab12b4d500d07bd7f71bc319c9f54)) +* **structure:** add structural analysis with directory nodes, containment edges, and metrics (symbol density, avg fan-out, cohesion scores) ([a413ea7](https://github.com/optave/codegraph/commit/a413ea73ff2ab12b4d500d07bd7f71bc319c9f54)) +* **cli:** add `codegraph structure [dir]`, `codegraph hotspots`, and `codegraph registry list|add|remove|prune` commands ([a413ea7](https://github.com/optave/codegraph/commit/a413ea73ff2ab12b4d500d07bd7f71bc319c9f54)) +* **export:** extend DOT/Mermaid export with directory clusters ([a413ea7](https://github.com/optave/codegraph/commit/a413ea73ff2ab12b4d500d07bd7f71bc319c9f54)) +* **parser:** add `SYMBOL_KINDS` constant and granular node types across both WASM and native Rust extractors ([72535fb](https://github.com/optave/codegraph/commit/72535fba44e56312fb8d5b21e19bdcbec1ea9f5e)) + +### Bug Fixes + +* **security:** eliminate SQL interpolation in `hotspotsData` — replace dynamic string interpolation with static map of pre-built prepared statements ([f8790d7](https://github.com/optave/codegraph/commit/f8790d772989070903adbeeb30720789890591d9)) +* **parser:** break `parser.js` ↔ `constants.js` circular dependency by inlining path normalization ([36239e9](https://github.com/optave/codegraph/commit/36239e91de43a6c6747951a84072953ea05e2321)) +* **structure:** add `NULLS LAST` to hotspots `ORDER BY` clause ([a41668f](https://github.com/optave/codegraph/commit/a41668f55ff8c18acb6dde883b9e98c3113abf7d)) +* **ci:** add license scan allowlist for `@img/sharp-*` dual-licensed packages ([9fbb084](https://github.com/optave/codegraph/commit/9fbb0848b4523baca71b94e7bceeb569773c8b45)) + +### Testing + +* add 18 unit tests for registry, 4 MCP integration tests, 4 CLI integration tests for multi-repo ([54ea9f6](https://github.com/optave/codegraph/commit/54ea9f6c497f1c7ad4c2f0199b4a951af0a51c62)) +* add 277 unit tests and 182 integration tests for structural analysis ([a413ea7](https://github.com/optave/codegraph/commit/a413ea73ff2ab12b4d500d07bd7f71bc319c9f54)) +* add MCP single-repo / multi-repo mode tests ([49c07ad](https://github.com/optave/codegraph/commit/49c07ad725421710af3dd3cce5b3fc7028ab94a8)) +* add registry hardening tests (pruning, allowlist, name collision) ([a413ea7](https://github.com/optave/codegraph/commit/a413ea73ff2ab12b4d500d07bd7f71bc319c9f54)) + +### Documentation + +* add dogfooding guide for self-analysis with codegraph ([36239e9](https://github.com/optave/codegraph/commit/36239e91de43a6c6747951a84072953ea05e2321)) + ## [1.4.0](https://github.com/optave/codegraph/compare/v1.3.0...v1.4.0) (2026-02-22) **Phase 2 — Foundation Hardening** is complete. This release hardens the core infrastructure: a declarative parser registry, a full MCP server, significantly improved test coverage, and secure credential management. @@ -31,7 +67,6 @@ All notable changes to this project will be documented in this file. See [commit * add license compliance workflow and CI testing pipeline ([eeeb68b](https://github.com/optave/codegraph/commit/eeeb68b)) * add OIDC trusted publishing with `--provenance` for npm packages ([bc595f7](https://github.com/optave/codegraph/commit/bc595f7)) * add automated semantic versioning and commit enforcement ([b8e5277](https://github.com/optave/codegraph/commit/b8e5277)) -* add Claude Code review action for PRs ([eb5d9f2](https://github.com/optave/codegraph/commit/eb5d9f2)) * add Biome linter and formatter ([a6e6bd4](https://github.com/optave/codegraph/commit/a6e6bd4)) ### Bug Fixes From 1571f2a864cca2ae812327b180d63b46ab465077 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Sun, 22 Feb 2026 03:11:27 -0700 Subject: [PATCH 3/3] fix: harden publish workflow version resolution The release trigger had no access to version-override inputs, causing commit-and-tag-version to fall through to auto-detect which silently produced the stale version. Now extracts version from the release tag, verifies the bump actually happened, and checks npm registry before publishing to catch version conflicts early. --- .github/workflows/publish.yml | 33 ++++++++++++++++++++++++++-- README.md | 41 +++++++++++++++++------------------ 2 files changed, 51 insertions(+), 23 deletions(-) diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml index cd51fdf0..11ab9062 100644 --- a/.github/workflows/publish.yml +++ b/.github/workflows/publish.yml @@ -125,7 +125,15 @@ jobs: run: | git checkout -- package-lock.json CURRENT=$(node -p "require('./package.json').version") - OVERRIDE="${{ inputs.version-override }}" + + # For release trigger, extract version from tag; for workflow_dispatch, use input + if [ "${{ github.event_name }}" = "release" ]; then + OVERRIDE=$(echo "${{ github.event.release.tag_name }}" | sed 's/^v//') + echo "Release trigger — using version from tag: $OVERRIDE" + else + OVERRIDE="${{ inputs.version-override }}" + fi + if [ -n "$OVERRIDE" ] && [ "$CURRENT" = "$OVERRIDE" ]; then echo "Version already at $OVERRIDE — skipping bump" elif [ -n "$OVERRIDE" ]; then @@ -133,13 +141,34 @@ jobs: else npx commit-and-tag-version fi - echo "new_version=$(node -p "require('./package.json').version")" >> "$GITHUB_OUTPUT" + + NEW_VERSION=$(node -p "require('./package.json').version") + echo "new_version=$NEW_VERSION" >> "$GITHUB_OUTPUT" + + # Verify the version was actually bumped (unless it already matched the override) + if [ "$NEW_VERSION" = "$CURRENT" ] && [ "$CURRENT" != "$OVERRIDE" ]; then + echo "::error::Version was not bumped (still $CURRENT). Check commit history or provide a version-override." + exit 1 + fi + + echo "Will publish version $NEW_VERSION (was $CURRENT)" - name: Download native artifacts uses: actions/download-artifact@v4 with: path: artifacts/ + - name: Verify version not already on npm + run: | + VERSION="${{ steps.version.outputs.new_version }}" + PKG="@optave/codegraph" + echo "Checking if $PKG@$VERSION already exists on npm..." + if npm view "$PKG@$VERSION" version 2>/dev/null; then + echo "::error::$PKG@$VERSION is already published on npm. Bump to a higher version." + exit 1 + fi + echo "$PKG@$VERSION is not yet published — proceeding" + - name: Publish platform packages shell: bash run: | diff --git a/README.md b/README.md index b1f23880..49d12e8e 100644 --- a/README.md +++ b/README.md @@ -45,20 +45,19 @@ Most dependency graph tools only tell you which **files** import which — codeg ### Feature comparison -| Capability | codegraph | Madge | dep-cruiser | Skott | Nx graph | Sourcetrail | GitNexus | -|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:| -| Function-level analysis | **Yes** | — | — | — | — | **Yes** | **Yes** | -| Multi-language | **11** | 1 | 1 | 1 | Any (project) | 4 | 9 | -| Semantic search | **Yes** | — | — | — | — | — | **Yes** | -| MCP / AI agent support | **Yes** | — | — | — | — | — | **Yes** | -| Git diff impact | **Yes** | — | — | — | Partial | — | **Yes** | -| Persistent database | **Yes** | — | — | — | — | Yes | **Yes** | -| Watch mode | **Yes** | — | — | — | Daemon | — | — | -| CI workflow included | **Yes** | — | Rules | — | Yes | — | — | -| Cycle detection | **Yes** | Yes | Yes | Yes | — | — | — | -| Zero config | **Yes** | Yes | — | Yes | — | — | **Yes** | -| Fully local / no telemetry | **Yes** | Yes | Yes | Yes | Partial | Yes | **Yes** | -| Free & open source | **Yes** | Yes | Yes | Yes | Partial | Archived | No | +| Capability | codegraph | [code-graph-rag](https://github.com/vitali87/code-graph-rag) | [glimpse](https://github.com/seatedro/glimpse) | [CodeMCP](https://github.com/SimplyLiz/CodeMCP) | [axon](https://github.com/harshkedia177/axon) | [autodev-codebase](https://github.com/anrgct/autodev-codebase) | [arbor](https://github.com/Anandb71/arbor) | [Claude-code-memory](https://github.com/Durafen/Claude-code-memory) | +|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| +| Function-level analysis | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | **Yes** | — | +| Multi-language | **11** | Multi | Multi | SCIP langs | Few | **40+** | Multi | — | +| Semantic search | **Yes** | **Yes** | — | — | — | **Yes** | **Yes** | **Yes** | +| MCP / AI agent support | **Yes** | **Yes** | — | **Yes** | — | — | **Yes** | **Yes** | +| Git diff impact | **Yes** | — | — | — | **Yes** | — | — | — | +| Watch mode | **Yes** | — | — | — | — | — | — | — | +| CI workflow included | **Yes** | — | — | — | — | — | — | — | +| Cycle detection | **Yes** | — | — | — | **Yes** | — | — | — | +| Zero config | **Yes** | — | **Yes** | — | — | — | **Yes** | — | +| Fully local / no telemetry | **Yes** | Partial | **Yes** | **Yes** | **Yes** | Partial | **Yes** | — | +| Free & open source | **Yes** | Yes | Yes | Custom | — | — | Yes | — | ### What makes codegraph different @@ -78,17 +77,17 @@ Many tools in this space are cloud-based or SaaS — meaning your code leaves yo | Tool | What it does well | Where it falls short | |---|---|---| +| [code-graph-rag](https://github.com/vitali87/code-graph-rag) | Graph RAG with Memgraph, multi-provider AI, semantic search, code editing via AST | Requires Docker (Memgraph), depends on cloud AI providers, complex setup | +| [glimpse](https://github.com/seatedro/glimpse) | Clipboard-first LLM context tool, call graphs, LSP resolution, token counting | Context-packing tool, not a dependency graph — no persistence, no queries | +| [CodeMCP](https://github.com/SimplyLiz/CodeMCP) | SCIP compiler-grade indexing, compound operations (83% token savings), secret scanning | Custom license, requires SCIP toolchains per language, limited language coverage | +| [axon](https://github.com/harshkedia177/axon) | 11-phase pipeline, KuzuDB, community detection, dead code, change coupling | No license, Python-focused, limited language support | +| [autodev-codebase](https://github.com/anrgct/autodev-codebase) | 40+ languages, interactive Cytoscape.js visualization, LLM reranking | No license, some embedding providers require cloud APIs, complex setup | +| [arbor](https://github.com/Anandb71/arbor) | Native GUI, confidence scoring, architectural role classification, fuzzy search | GUI-focused — no CLI pipeline, no watch mode, no CI integration | +| [Claude-code-memory](https://github.com/Durafen/Claude-code-memory) | Persistent codebase memory for Claude Code, Memory Guard quality gate | Cloud-dependent (Voyage AI), requires Qdrant, not a code analysis tool | | [Madge](https://github.com/pahen/madge) | Simple file-level JS/TS dependency graphs | No function-level analysis, no impact tracing, JS/TS only | | [dependency-cruiser](https://github.com/sverweij/dependency-cruiser) | Architectural rule validation for JS/TS | Module-level only (function-level explicitly out of scope), requires config | -| [Skott](https://github.com/antoine-music/skott) | Module graph with unused code detection | File-level only, JS/TS only, no persistent database | | [Nx graph](https://nx.dev/) | Monorepo project-level dependency graph | Requires Nx workspace, project-level only (not file or function) | -| [Sourcetrail](https://github.com/CoatiSoftware/Sourcetrail) | Rich GUI with symbol-level graphs | Archived/discontinued (2021), no JS/TS, no CLI | -| [Sourcegraph](https://sourcegraph.com/) | Enterprise code search and navigation | Cloud/SaaS — code sent to servers, $19+/user/mo, no longer open source | -| [CodeSee](https://www.codesee.io/) | Visual codebase maps | Cloud-based — code leaves your machine, acquired by GitKraken | -| [Understand](https://scitools.com/) | Deep multi-language static analysis | $100+/month per seat, proprietary, GUI-only, no CI or AI integration | -| [Snyk Code](https://snyk.io/) | AI-powered security scanning | Cloud-based — code sent to Snyk servers for analysis, not a dependency graph tool | | [pyan](https://github.com/Technologicat/pyan) / [cflow](https://www.gnu.org/software/cflow/) | Function-level call graphs | Single-language each (Python / C only), no persistence, no queries | -| [GitNexus](https://gitnexus.dev/) | Function-level graph with hybrid search and MCP | PolyForm Noncommercial license, no watch mode, no cycle detection, no CI workflow | ---