๐ฌ๐ง English | ๐จ๐ณ ไธญๆ
Enterprise-grade Code Intelligence Platform โ Make AI agents understand your codebase through semantic navigation, not grep.
codeindex generates AI-readable documentation with two-phase pipeline: structural indexing (AST parsing via tree-sitter) + AI-powered module descriptions. AI agents can browse README_AI.md hierarchy, see module purposes at a glance, and navigate directly to the right code โ across Python, PHP, Java, TypeScript, JavaScript, Swift, and Objective-C. Designed for enterprise environments with intranet isolation.
๐ข Enterprise Ready: โ Intranet compatible โ Self-contained โ Version stable โ Data sovereignty
For LoomGraph Developers:
FOR_LOOMGRAPH.md(quick start) |docs/guides/loomgraph-integration.md(full guide)
- Two-phase documentation pipeline (v0.23.0) โ Phase 1: structural README_AI.md via SmartWriter; Phase 2: AI generates one-line functional descriptions per module. AI agents can browse README_AI.md hierarchy and find the right module without grep.
- Smart indexing โ Tiered documentation (overview โ navigation โ detailed) optimized for AI agents, โค50KB per file
- Auto-AI enrichment โ When
ai_commandis configured,scan-allautomatically enables AI module descriptions. Use--no-aito opt out - Auto-update hooks โ Post-commit hook automatically regenerates README_AI.md for changed directories. Thin wrapper pattern:
pip upgradeauto-updates hook logic
- Multi-language AST parsing โ Python, PHP, Java, TypeScript, JavaScript, Swift, Objective-C via tree-sitter (Go, Rust, C# planned)
- Call relationship extraction โ Function/method call graphs across Python, Java, PHP, TypeScript, JavaScript
- Inheritance extraction โ Class hierarchy and interface relationships
- Framework route extraction โ ThinkPHP and Spring Boot route tables (more planned)
- Technical debt analysis โ Detect large files, god classes, symbol overload, test smells
- Single file parse โ
codeindex parse <file>with JSON output for tool integration - Structured JSON output โ
--output jsonfor CI/CD, knowledge graphs, and downstream tools
- Adaptive symbol extraction โ Dynamic 5โ150 symbols per file based on size
- CLAUDE.md injection โ
codeindex initauto-configures Claude Code integration - Auto-update guide โ Post-install hook automatically updates
~/.claude/CLAUDE.mdafterpip upgrade - Template-based test generation โ YAML + Jinja2 for rapid language support (88โ91% time savings)
- Parallel scanning โ Concurrent directory processing with configurable workers
Without external tools: When Serena MCP or other cloud-based code intelligence tools are unavailable due to network isolation or security policies, codeindex becomes the primary code understanding tool.
# Enterprise developer workflow
git clone <internal-repo>
codeindex init # Configure project
codeindex scan-all # Structural + AI descriptions (auto)
# AI agent reads README_AI.md โ sees module purposes โ navigates directly
# No grep needed for code discovery
codeindex tech-debt src/ --output review.md # Code quality analysisWhy enterprises choose codeindex:
- โ Semantic navigation โ AI agents understand module purposes from README_AI.md hierarchy
- โ Intranet compatible โ no external dependencies, fully offline
- โ Self-contained โ no upstream MCP servers required
- โ Version stable โ enterprise-controlled release cycle
- โ Data sovereignty โ code never leaves internal network
For enterprise teams: codeindex serves as the core data source for LoomGraph knowledge graphs, enabling semantic code search across the organization.
# Data pipeline
codeindex scan --output json > parse_results.json
loomgraph inject parse_results.json # Build knowledge graph
# Team can now search code using natural languageThree-repo architecture:
codeindex (Parse) โ LoomGraph (Orchestrate) โ LightRAG (Store)
โ ParseResult โ Embeddings โ Semantic Search
AST extraction Knowledge Graph Vector + Graph DB
Without codeindex, LoomGraph cannot function. See LoomGraph Integration Guide.
With Serena MCP: For individual developers using Claude Code + Serena MCP, codeindex provides complementary value:
- codeindex (build-time): Semantic architecture map (README_AI.md with module descriptions) + quality analysis
- Serena (real-time): Precise symbol navigation (
find_symbol,find_referencing_symbols)
# Personal developer workflow
codeindex init # Setup CLAUDE.md integration
codeindex scan-all # Structural + AI descriptions (auto)
codeindex hooks install post-commit # Auto-update on commit
# Claude Code reads README_AI.md โ understands module purpose โ uses Serena for detailsRelationship: codeindex provides the "map with labels," Serena provides the "GPS navigation."
codeindex uses lazy loading โ language parsers are only imported when needed.
# All languages (recommended)
pip install ai-codeindex[all]
# Or specific languages only
pip install ai-codeindex[python]
pip install ai-codeindex[php]
pip install ai-codeindex[java]
pip install ai-codeindex[typescript]
pip install ai-codeindex[python,php]
pip install ai-codeindex[swift]
pip install ai-codeindex[ios] # Swift + Objective-Cpipx install ai-codeindex[all]git clone https://github.com/dreamlx/codeindex.git
cd codeindex
pip install -e ".[all]"cd /your/project
codeindex initThis creates:
.codeindex.yamlโ scan configuration (languages, include/exclude patterns)CLAUDE.mdโ injects codeindex instructions so Claude Code uses README_AI.md automaticallyCODEINDEX.mdโ project-level documentation reference
# Scan all directories
# When ai_command is configured โ auto Phase 1 (structural) + Phase 2 (AI descriptions)
# Without ai_command โ Phase 1 only (structural)
codeindex scan-all
# Structural only (skip AI enrichment)
codeindex scan-all --no-ai
# Scan a single directory
codeindex scan ./src/auth
# Full AI-generated README for a single directory
codeindex scan ./src/auth --ai
# Preview AI prompt without executing
codeindex scan ./src/auth --ai --dry-runcodeindex statusIndexing Status
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
src/auth/
โ
src/utils/
โ ๏ธ src/api/ (no README_AI.md)
Indexed: 2/3 (67%)
# Global symbol index (PROJECT_SYMBOLS.md)
codeindex symbols
# Module overview (PROJECT_INDEX.md)
codeindex index
# Git change impact analysis
codeindex affected --since HEAD~5| Command | Description | Guide |
|---|---|---|
codeindex scan --output json |
JSON output for tools | JSON Output Guide |
codeindex parse <file> |
Parse single file to JSON | LoomGraph Integration |
codeindex tech-debt ./src |
Code quality analysis (debt + test smells) | Enhanced in v0.22.0 |
codeindex debt-scan ./src |
Alias for tech-debt | Backward compatibility |
codeindex hooks install |
Git hooks for auto-update | Git Hooks Guide |
codeindex config explain <param> |
Parameter help | Configuration Guide |
For personal developers using Claude Code + Serena MCP:
v0.17.0: codeindex init automatically injects instructions into your project's CLAUDE.md, so Claude Code reads README_AI.md files first โ no manual setup required.
# One command sets everything up
codeindex init
# Claude Code will now:
# โ
Read README_AI.md for architecture understanding
# โ
Use Serena MCP tools for precise navigation (find_symbol, etc.)
# โ
Apply tech-debt analysis for code quality checksFor enterprise users without Serena: README_AI.md and PROJECT_SYMBOLS.md become your primary code navigation tools.
For manual setup, MCP skills (/mo:arch, /mo:index), and Git hooks integration, see the Claude Code Integration Guide.
| Language | Status | Since | Key Features |
|---|---|---|---|
| Python | โ Supported | v0.1.0 | Classes, functions, methods, imports, docstrings, inheritance, calls |
| PHP | โ Supported | v0.5.0 | Classes (extends/implements), methods, properties, PHPDoc, inheritance, calls |
| Java | โ Supported | v0.7.0 | Classes, interfaces, enums, records, annotations, Spring routes, Lombok, calls |
| TypeScript/JS | โ Supported | v0.19.0 | Classes, interfaces, enums, type aliases, arrow functions, JSX/TSX, imports/exports, calls |
| Swift | โ Supported | v0.21.0 | Classes, structs, enums, protocols, extensions, methods, properties |
| Objective-C | โ Supported | v0.21.0 | Classes, protocols, categories, properties, methods (instance/class) |
| Go | ๐ Planned | โ | Packages, interfaces, struct methods |
| Rust | ๐ Planned | โ | Structs, traits, modules |
| C# | ๐ Planned | โ | Classes, interfaces, .NET projects |
Want to add a language? The template-based test system lets you contribute by writing YAML specs โ no Python knowledge required. See CONTRIBUTING.md for details.
| Framework | Language | Status |
|---|---|---|
| ThinkPHP | PHP | โ Stable (v0.5.0) |
| Spring Boot | Java | โ Stable (v0.8.0) |
| Laravel | PHP | ๐ Planned |
| FastAPI | Python | ๐ Planned |
| Django | Python | ๐ Planned |
| Express.js | JS/TS | ๐ Planned |
The tech-debt command provides comprehensive code quality analysis, now including test smells detection:
# JSON output (for LoomGraph integration)
codeindex tech-debt ./src --format json > debt-data.json
# Markdown report (for documentation)
codeindex tech-debt ./src --format markdown > report.md
# Console output (for quick checks)
codeindex tech-debt ./src --format console
# Alias: debt-scan also works (backward compatibility)
codeindex debt-scan ./src --format jsonWhat it detects:
- ๐ด Super large files (>5000 lines), Large files (>2000 lines)
- ๐ด God Classes (>50 methods)
- ๐ด Long methods (>80/150 lines)
- ๐ก High coupling (>8 internal imports)
- ๐ก Symbol overload (>100 symbols, high noise ratio)
- ๐งช Test smells (skipped tests, giant test files) โ New in v0.22.0
- ๐ Quality scoring (0-100 scale per file)
Enhanced JSON output (v0.22.0):
{
"timestamp": "2026-03-06T13:45:39Z",
"summary": {
"total_files": 97,
"giant_files": 0,
"giant_functions": 3,
"test_smells": 64,
"avg_maintainability": 9.9
},
"total_files": 97,
"average_quality_score": 99.4,
"giant_files": [],
"giant_functions": [...],
"test_smells": [
{
"path": "tests/test_example.py",
"type": "skipped_test",
"details": "Skipped test detected: @pytest.mark.skip at line 42",
"line_number": 42
}
],
"file_reports": [...]
}Key features:
- โ Unified command: Single entry point for all quality checks
- โ Backward compatible: All existing JSON fields preserved
- โ LoomGraph ready: Enhanced summary for knowledge graph integration
- โ Framework-agnostic: Detects test smells across Jest, pytest, JUnit, etc.
- โ KISS design: 90% code reuse, simple regex patterns for test detection
Phase 1 (Structural):
Directory โ Scanner โ Parser (tree-sitter) โ SmartWriter โ README_AI.md
Phase 2 (AI Enrichment, automatic when ai_command configured):
README_AI.md โ symbol names + file names โ AI โ one-line description โ blockquote injection
Phase 1: Structural generation (always runs)
- Scanner โ walks directories, filters by config patterns
- Parser โ extracts symbols (classes, functions, imports, calls, inheritance) via tree-sitter
- SmartWriter โ generates tiered documentation with size limits (โค50KB)
- Output โ
README_AI.mdoptimized for AI consumption, or JSON for tool integration
Phase 2: AI enrichment (auto-enabled when ai_command configured)
- Generates a one-line functional description for each non-leaf module
- Writes as blockquote:
> ไผๅ็ญ็บง็ฎก็ใ็งฏๅๅ ๆขใๆ็ๅกๅธ - ~200-400 tokens per directory, 10-20x cheaper than full AI generation
- Parent directories read child descriptions for hierarchical navigation
Before (structural only):
โโโ Application/
โโโ Vip/ โ 48 files | 386 symbols โ AI agent cannot determine purpose
โโโ Pay/ โ 23 files | 178 symbols
โโโ SmallProgramApi/ โ 31 files | 245 symbols
After (structural + AI enrichment):
โโโ Application/
โโโ Vip/ โ ไผๅ็ญ็บง็ฎก็ใ็งฏๅๅ
ๆขใๆ็ๅกๅธ | 48 files
โโโ Pay/ โ ๆฏไป็ฝๅ
ณ๏ผๆฏไปๅฎ/ๅพฎไฟก/้ๆฌพ๏ผ | 23 files
โโโ SmallProgramApi/ โ ๅฐ็จๅบ็ซฏAPI๏ผ็ปๅฝใๅคดๅใๅๅ๏ผ | 31 files
โ AI agent can navigate directly
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Enterprise Intranet Environment โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ ๐ฆ Code Repository (Git) โ
โ โ โ
โ ๐ codeindex (Parse Layer) โ
โ โโโ scan --output json โ ParseResult โ
โ โโโ README_AI.md โ architecture docs โ
โ โโโ tech-debt โ comprehensive quality scan โ
โ โ โ
โ ๐ธ๏ธ LoomGraph (Orchestration Layer) โ
โ โโโ inject ParseResult โ
โ โโโ generate embeddings โ
โ โโโ build knowledge graph โ
โ โ โ
โ ๐พ LightRAG (Storage Layer) โ
โ โโโ PostgreSQL (graph data) โ
โ โโโ Vector DB (embeddings) โ
โ โโโ Query API (semantic search) โ
โ โ โ
โ ๐ฌ AI Agents (Claude Code, Internal Chat) โ
โ โโโ Natural language code search โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
codeindex role: Bottom layer (data collection & parsing) โ the entire system depends on codeindex providing structured ParseResult data.
| Guide | Description |
|---|---|
| Getting Started | Installation and first scan |
| Configuration Guide | All config options explained |
| Advanced Usage | Parallel scanning, custom prompts |
| Git Hooks Integration | Automated quality checks and doc updates |
| Claude Code Integration | AI agent setup and MCP skills |
| JSON Output Integration | Machine-readable output for tools |
| LoomGraph Integration | Knowledge graph data pipeline |
| Guide | Description |
|---|---|
| CONTRIBUTING.md | Development setup, TDD workflow, code style |
| CLAUDE.md | Quick reference for Claude Code and contributors |
| Design Philosophy | Core design principles and architecture |
| Release Automation | 5-minute automated release workflow |
| Multi-Language Support | Adding new language parsers |
| Language Support Contribution | Template-based test generation for new languages |
- Strategic Roadmap โ long-term vision and priorities
- Changelog โ version history and breaking changes
We welcome contributions! See CONTRIBUTING.md for guidelines.
git clone https://github.com/dreamlx/codeindex.git
cd codeindex
pip install -e ".[dev,all]"
make install-hooks
make testmake release VERSION=0.17.0
# GitHub Actions: tests โ PyPI publish โ GitHub ReleaseSee Release Automation Guide for details.
Current version: v0.23.2
Recent milestones:
- v0.23.0 โ AI-Enhanced Module Descriptions: two-phase pipeline, auto-AI enrichment, post-commit thin wrapper
- v0.22.2 โ Auto-update CLAUDE.md on
pip upgrade,/codeindex-update-guideskill - v0.22.0 โ Unified tech-debt + test smells analysis
- v0.21.0 โ Swift & Objective-C language support
- v0.19.0 โ TypeScript/JavaScript support with call extraction
Next:
- Framework routes expansion: Express, Laravel, FastAPI, Django (Epic 17)
- Go, Rust, C# language support
Moved to LoomGraph:
- Code similarity search, refactoring suggestions, team collaboration, IDE integration
See Strategic Roadmap for detailed plans.
MIT License โ see LICENSE file for details.
- tree-sitter โ fast, incremental parsing
- Claude CLI โ AI integration inspiration
- All contributors and users
- Questions: GitHub Discussions
- Bugs: GitHub Issues
- Feature Requests: GitHub Issues
Made with โค๏ธ by the codeindex team