An accurate Retrieval-Augmented Generation (RAG) system that analyzes multi-language codebases using Tree-sitter, builds comprehensive knowledge graphs, and enables natural language querying of codebase structure and relationships as well as editing capabilities.
combined.mp4
Use the Makefile for:
- make install: Install project dependencies with full language support.
- make python: Install dependencies for Python only.
- make dev: Setup dev environment (install deps + pre-commit hooks).
- make test: Run all tests.
- make clean: Clean up build artifacts and cache.
- make help: Show available commands.
- π Multi-Language Support: Supports Python, JavaScript, TypeScript, Rust, Go, Scala, Java, C++, and C codebases
- π³ Tree-sitter Parsing: Uses Tree-sitter for robust, language-agnostic AST parsing
- π Knowledge Graph Storage: Uses Memgraph to store codebase structure as an interconnected graph
- π£οΈ Natural Language Querying: Ask questions about your codebase in plain English
- π€ AI-Powered Cypher Generation: Supports multiple AI providers - Google Gemini, OpenAI, Anthropic Claude, and local models (Ollama) for natural language to Cypher translation
- π€ Multi-Provider Flexibility: Seamlessly switch between providers based on your needs and preferences
- π Code Snippet Retrieval: Retrieves actual source code snippets for found functions/methods
- βοΈ Advanced File Editing: Surgical code replacement with AST-based function targeting, visual diff previews, and exact code block modifications
- β‘οΈ Shell Command Execution: Can execute terminal commands for tasks like running tests or using CLI tools.
- π Interactive Code Optimization: AI-powered codebase optimization with language-specific best practices and interactive approval workflow
- π Reference-Guided Optimization: Use your own coding standards and architectural documents to guide optimization suggestions
- π Dependency Analysis: Parses
pyproject.tomlto understand external dependencies - π― Nested Function Support: Handles complex nested functions and class hierarchies
- π Language-Agnostic Design: Unified graph schema across all supported languages
- π€ MCP Server: Model Context Protocol server for AI agent integration
- β‘ Parallel Processing: Multi-core parallel file parsing with configurable worker pools
- πΎ Memory Optimization: Streaming parsers and memory-mapped file handling for large codebases
- π Graph Indexing: Automatic index creation for 20+ common query patterns
- π Query Caching: LRU cache with configurable TTL for faster repeated queries
- π Progress Reporting: Real-time progress tracking with ETA for large operations
- π§ C Language Support: Full support for C including:
- Function pointers and callbacks
- Macros and preprocessor directives
- Structs, unions, and enums
- Linux kernel patterns (syscalls, exports, locks)
- π§ͺ Test Framework Integration: Automatic detection and parsing of tests:
- Python: pytest, unittest
- JavaScript/TypeScript: Jest, Mocha, Jasmine
- C: Unity, Check, CMocka
- Rust: cargo test
- Go: testing package, Ginkgo
- Java: JUnit, TestNG
- π₯ BDD Support: Parse and link Gherkin feature files to implementations
- π Security Vulnerability Detection:
- SQL/Command injection detection
- XSS vulnerability scanning
- Hardcoded secrets detection
- Buffer overflow analysis (C/C++)
- Taint flow tracking
- π Data Flow Analysis: Track variable usage and data movement through code
- 𧬠Inheritance Analysis: Full OOP relationship tracking (inheritance, interfaces, overrides)
- π Test Coverage Analysis: Find untested code and calculate coverage metrics
- π Circular Dependency Detection: Identify and visualize circular imports
- π Git Repository Analysis: Full Git history integration with commit and author tracking
- π₯ Contributor Analysis: Track top contributors, commit patterns, and expertise areas
- π Change History: Query files by modification date and track change frequency
- π Commit Metadata: Access commit messages, author information, and parent relationships
- π Blame Information: Line-by-line authorship tracking for accountability
- π File History: Complete modification history for each file
- βοΈ Configuration File Parsing: Support for JSON, YAML, TOML, INI, .env files
- π¦ Dependency Extraction: Extract dependencies from config files
- ποΈ Build Script Analysis: Parse npm scripts, Makefiles, Dockerfiles, Kconfig
- π Environment Detection: Identify environment-specific configurations
- π Config References: Track relationships between configuration files
- π Secret Detection: Find API keys, tokens, and sensitive configuration
The system consists of three main components:
- Multi-language Parser: Tree-sitter based parsing system that analyzes codebases and ingests data into Memgraph
- RAG System (
codebase_rag/): Interactive CLI for querying the stored knowledge graph - MCP Server (
mcp_server/): Model Context Protocol server enabling AI agents to interact with the system
- Python 3.12+
- Docker & Docker Compose (for Memgraph)
- For AI providers: At least one API key from:
- Google Gemini
- OpenAI
- Anthropic Claude
- Or Ollama for local models
uvpackage manager
For a complete step-by-step setup guide, see SETUP.md.
git clone https://github.com/vitali87/code-graph-rag.git
cd code-graph-rag- Install dependencies:
For basic Python support:
uv syncFor full multi-language support:
uv sync --extra treesitter-fullFor development (including tests and pre-commit hooks):
make devThis installs all dependencies and sets up pre-commit hooks automatically.
This installs Tree-sitter grammars for all supported languages (see Multi-Language Support section).
- Set up environment variables:
cp .env.example .env
# Edit .env with your configuration (see options below)Configure one or more AI providers in your .env file:
# .env file
GEMINI_API_KEY=your_gemini_api_key_hereGet your free API key from Google AI Studio.
# .env file
OPENAI_API_KEY=your_openai_api_key_hereGet your API key from OpenAI Platform.
# .env file
ANTHROPIC_API_KEY=your_anthropic_api_key_hereGet your API key from Anthropic Console.
# .env file
LOCAL_MODEL_ENDPOINT=http://localhost:11434/v1
LOCAL_ORCHESTRATOR_MODEL_ID=llama3
LOCAL_CYPHER_MODEL_ID=llama3
LOCAL_MODEL_API_KEY=ollamaInstall and run Ollama:
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh
# Pull required models
ollama pull llama3
# Or try other models like:
# ollama pull llama3.1
# ollama pull mistral
# ollama pull codellama
# Ollama will automatically start serving on localhost:11434Note: Local models provide privacy and no API costs, but may have lower accuracy compared to cloud models like Gemini.
- Start Memgraph database:
docker-compose up -dThe Graph-Code system offers four main modes of operation:
- Parse & Ingest: Build knowledge graph from your codebase
- Interactive Query: Ask questions about your code in natural language
- Export & Analyze: Export graph data for programmatic analysis
- AI Optimization: Get AI-powered optimization suggestions for your code.
- Editing: Perform surgical code replacements and modifications with precise targeting.
Parse and ingest a multi-language repository into the knowledge graph:
For the first repository (clean start):
python -m codebase_rag.main start --repo-path /path/to/repo1 --update-graph --cleanFor additional repositories (preserve existing data):
python -m codebase_rag.main start --repo-path /path/to/repo2 --update-graph
python -m codebase_rag.main start --repo-path /path/to/repo3 --update-graphPerformance Options (New!):
# Enable parallel processing with automatic worker detection
python -m codebase_rag.main start --repo-path /path/to/repo --update-graph --parallel
# Specify number of worker processes
python -m codebase_rag.main start --repo-path /path/to/repo --update-graph --parallel --workers 8
# Process only specific folders
python -m codebase_rag.main start --repo-path /path/to/repo --update-graph --folder-filter "src,lib,tests"
# Filter files by pattern
python -m codebase_rag.main start --repo-path /path/to/repo --update-graph --file-pattern "*.py,*.js"
# Skip test files for faster processing
python -m codebase_rag.main start --repo-path /path/to/repo --update-graph --skip-tests
# Combine options for large codebases
python -m codebase_rag.main start --repo-path /path/to/linux-kernel \
--update-graph --clean \
--parallel --workers 16 \
--folder-filter "drivers,kernel,fs" \
--skip-testsThe system automatically detects and processes files for all supported languages (see Multi-Language Support section).
Start the interactive RAG CLI:
python -m codebase_rag.main start --repo-path /path/to/your/repoYou can switch between providers and models at runtime using CLI arguments:
Use specific models directly (auto-detects provider):
# Gemini models
python -m codebase_rag.main start --repo-path /path/to/your/repo \
--orchestrator-model gemini-2.5-pro \
--cypher-model gemini-2.5-flash-lite-preview-06-17
# OpenAI models
python -m codebase_rag.main start --repo-path /path/to/your/repo \
--orchestrator-model gpt-4o \
--cypher-model gpt-4o-mini
# Anthropic models
python -m codebase_rag.main start --repo-path /path/to/your/repo \
--orchestrator-model claude-3-5-sonnet-20241022 \
--cypher-model claude-3-5-haiku-20241022
# Local models (Ollama)
python -m codebase_rag.main start --repo-path /path/to/your/repo \
--orchestrator-model llama3.1 \
--cypher-model codellamaMix providers for different tasks:
# Use Claude for orchestration, Gemini for Cypher generation
python -m codebase_rag.main start --repo-path /path/to/your/repo \
--orchestrator-model claude-3-5-sonnet-20241022 \
--cypher-model gemini-2.5-flash-lite-preview-06-17Example queries (works across all supported languages):
Basic Structure Queries:
- "Show me all classes that contain 'user' in their name"
- "Find functions related to database operations"
- "What methods does the User class have?"
- "Show me all TypeScript interfaces"
- "Find Rust structs and their methods"
Security & Quality Queries:
- "Show me all SQL injection vulnerabilities"
- "Find hardcoded passwords or API keys"
- "What functions haven't been tested?"
- "Show me high-severity security issues"
- "Find buffer overflow vulnerabilities in C code"
Architecture & Dependencies:
- "Show me circular dependencies"
- "What external packages do we depend on?"
- "Find classes that inherit from BaseModel"
- "Which modules import the database layer?"
- "Show me all abstract classes"
Version Control & History:
- "Who are the top contributors to this project?"
- "What files changed in the last week?"
- "Show me the most frequently modified files"
- "Who last modified the authentication module?"
Configuration & Testing:
- "Show me all configuration files"
- "Find database connection settings"
- "List all test suites and their coverage"
- "Show me BDD scenarios for user authentication"
- "Find all npm scripts in package.json files"
Code Modification Examples:
- "Add logging to all database connection functions"
- "Refactor the User class to use dependency injection"
- "Convert these Python functions to async/await pattern"
- "Add error handling to authentication methods"
- "Optimize this function for better performance"
For programmatic access and integration with other tools, you can export the entire knowledge graph to JSON:
Export during graph update:
python -m codebase_rag.main start --repo-path /path/to/repo --update-graph --clean -o my_graph.jsonExport existing graph without updating:
python -m codebase_rag.main export -o my_graph.jsonWorking with exported data:
from codebase_rag.graph_loader import load_graph
# Load the exported graph
graph = load_graph("my_graph.json")
# Get summary statistics
summary = graph.summary()
print(f"Total nodes: {summary['total_nodes']}")
print(f"Total relationships: {summary['total_relationships']}")
# Find specific node types
functions = graph.find_nodes_by_label("Function")
classes = graph.find_nodes_by_label("Class")
# Analyze relationships
for func in functions[:5]:
relationships = graph.get_relationships_for_node(func.node_id)
print(f"Function {func.properties['name']} has {len(relationships)} relationships")Example analysis scripts:
# Basic graph analysis
python examples/graph_export_example.py my_graph.json
# Process large codebases with parallel processing
python examples/large_codebase_example.py /path/to/linux-kernel --workers 16 --folder-filter "drivers,kernel"
# Analyze C code and test coverage
python examples/c_and_test_analysis_example.py --reportThis provides a reliable, programmatic way to access your codebase structure without LLM restrictions, perfect for:
- Integration with other tools
- Custom analysis scripts
- Building documentation generators
- Creating code metrics dashboards
- Processing million-line codebases efficiently
- Analyzing test coverage and quality
For AI-powered codebase optimization with best practices guidance:
Basic optimization for a specific language:
python -m codebase_rag.main optimize python --repo-path /path/to/your/repoOptimization with reference documentation:
python -m codebase_rag.main optimize python \
--repo-path /path/to/your/repo \
--reference-document /path/to/best_practices.mdUsing specific models for optimization:
python -m codebase_rag.main optimize javascript \
--repo-path /path/to/frontend \
--llm-provider gemini \
--orchestrator-model gemini-2.0-flash-thinking-exp-01-21Supported Languages for Optimization:
All supported languages: python, javascript, typescript, rust, go, java, scala, cpp
How It Works:
- Analysis Phase: The agent analyzes your codebase structure using the knowledge graph
- Pattern Recognition: Identifies common anti-patterns, performance issues, and improvement opportunities
- Best Practices Application: Applies language-specific best practices and patterns
- Interactive Approval: Presents each optimization suggestion for your approval before implementation
- Guided Implementation: Implements approved changes with detailed explanations
Example Optimization Session:
Starting python optimization session...
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β The agent will analyze your python codebase and propose specific β
β optimizations. You'll be asked to approve each suggestion before β
β implementation. Type 'exit' or 'quit' to end the session. β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Analyzing codebase structure...
π Found 23 Python modules with potential optimizations
π‘ Optimization Suggestion #1:
File: src/data_processor.py
Issue: Using list comprehension in a loop can be optimized
Suggestion: Replace with generator expression for memory efficiency
[y/n] Do you approve this optimization?
Reference Document Support: You can provide reference documentation (like coding standards, architectural guidelines, or best practices documents) to guide the optimization process:
# Use company coding standards
python -m codebase_rag.main optimize python \
--reference-document ./docs/coding_standards.md
# Use architectural guidelines
python -m codebase_rag.main optimize java \
--reference-document ./ARCHITECTURE.md
# Use performance best practices
python -m codebase_rag.main optimize rust \
--reference-document ./docs/performance_guide.mdThe agent will incorporate the guidance from your reference documents when suggesting optimizations, ensuring they align with your project's standards and architectural decisions.
Common CLI Arguments:
--llm-provider: Choosegeminiorlocalmodels--orchestrator-model: Specify model for main operations--cypher-model: Specify model for graph queries--repo-path: Path to repository (defaults to current directory)--reference-document: Path to reference documentation (optimization only)
The knowledge graph uses the following node types and relationships:
- Project: Root node representing the entire repository
- Package: Language packages (Python:
__init__.py, etc.) - Module: Individual source code files (
.py,.js,.jsx,.ts,.tsx,.rs,.go,.scala,.sc,.java,.c,.h) - Class: Class/Struct/Enum definitions across all languages
- Function: Module-level functions and standalone functions
- Method: Class methods and associated functions
- Folder: Regular directories
- File: All files (source code and others)
- ExternalPackage: External dependencies
- TestSuite: Test suite containers (pytest classes, Jest describe blocks, etc.)
- TestCase: Individual test cases across frameworks
- Assertion: Test assertions with expected/actual values
- BDDFeature: Gherkin feature definitions
- BDDScenario: BDD scenarios within features
- BDDStep: Given/When/Then steps
- Macro: C preprocessor macros
- Struct: C struct definitions
- GlobalVariable: C global variables
- Pointer: C pointer variables
- Typedef: C type definitions
- Syscall: Linux kernel syscall definitions
- KernelExport: Kernel exported symbols
- Vulnerability: Security vulnerabilities with severity and CWE IDs
- Author: Git commit authors with contribution statistics
- Commit: Git commits with metadata and relationships
- ConfigFile: Configuration files (YAML, JSON, INI, etc.)
- ConfigValue: Individual configuration settings
- Python:
function_definition,class_definition - JavaScript/TypeScript:
function_declaration,arrow_function,class_declaration - Rust:
function_item,struct_item,enum_item,impl_item - Go:
function_declaration,method_declaration,type_declaration - Scala:
function_definition,class_definition,object_definition,trait_definition - Java:
method_declaration,class_declaration,interface_declaration,enum_declaration - C++:
function_definition,constructor_definition,destructor_definition,class_specifier,struct_specifier,union_specifier,enum_specifier - C:
function_definition,struct_specifier,union_specifier,enum_specifier,declaration(for typedefs and variables)
CONTAINS_PACKAGE: Project or Package contains Package nodesCONTAINS_FOLDER: Project, Package, or Folder contains Folder nodesCONTAINS_FILE: Project, Package, or Folder contains File nodesCONTAINS_MODULE: Project, Package, or Folder contains Module nodesDEFINES: Module defines classes/functionsDEFINES_METHOD: Class defines methodsDEPENDS_ON_EXTERNAL: Project depends on external packagesCALLS: Function or Method calls other functions/methodsPOINTS_TO: Pointer points to a variable or functionASSIGNS_FP: Function pointer assignmentINVOKES_FP: Function pointer invocationLOCKS/UNLOCKS: Concurrency primitive usageEXPANDS_TO: Macro expansion relationshipsTESTS: Test case tests a function/methodASSERTS: Assertion validates behaviorIN_SUITE: Test case belongs to test suiteIN_TEST: Assertion belongs to test caseIN_FEATURE: Scenario belongs to BDD featureIN_SCENARIO: Step belongs to BDD scenarioIMPLEMENTS_STEP: Function implements BDD stepGIVEN_LINKS_TO/WHEN_LINKS_TO/THEN_LINKS_TO: BDD step linkagesFLOWS_TO: Data flow between variablesINHERITS_FROM/IMPLEMENTS: OOP inheritance relationshipsOVERRIDES: Method override relationshipsHAS_VULNERABILITY: Code element has security vulnerabilityTAINT_FLOW: Tainted data flow pathsAUTHORED_BY: Commit authored by contributorPARENT_OF: Commit parent relationshipsMODIFIED_IN/ADDED_IN/REMOVED_IN: File modification in commitsCONFIGURES: Configuration file configures moduleINCLUDES_CONFIG: Configuration file includes anotherREFERENCES_CONFIG: Code references configuration
Configuration is managed through environment variables in .env file:
GEMINI_API_KEY: Required when using Google Gemini modelsGEMINI_MODEL_ID: Main model for orchestration (default:gemini-2.5-pro)MODEL_CYPHER_ID: Model for Cypher generation (default:gemini-2.5-flash-lite-preview-06-17)
OPENAI_API_KEY: Required when using OpenAI modelsOPENAI_ORCHESTRATOR_MODEL_ID: Model for orchestration (default:gpt-4o-mini)OPENAI_CYPHER_MODEL_ID: Model for Cypher generation (default:gpt-4o-mini)
ANTHROPIC_API_KEY: Required when using Anthropic modelsANTHROPIC_ORCHESTRATOR_MODEL_ID: Model for orchestration (default:claude-3-5-sonnet-20241022)ANTHROPIC_CYPHER_MODEL_ID: Model for Cypher generation (default:claude-3-5-haiku-20241022)
LOCAL_MODEL_ENDPOINT: Ollama endpoint (default:http://localhost:11434/v1)LOCAL_ORCHESTRATOR_MODEL_ID: Model for main RAG orchestration (default:llama3)LOCAL_CYPHER_MODEL_ID: Model for Cypher query generation (default:llama3)LOCAL_MODEL_API_KEY: API key for local models (default:ollama)
MEMGRAPH_HOST: Memgraph hostname (default:localhost)MEMGRAPH_PORT: Memgraph port (default:7687)TARGET_REPO_PATH: Default repository path (default:.)
- tree-sitter: Core Tree-sitter library for language-agnostic parsing
- tree-sitter-{language}: Language-specific grammars (Python, JS, TS, Rust, Go, Scala, Java, C++, C)
- pydantic-ai: AI agent framework for RAG orchestration
- pymgclient: Memgraph Python client for graph database operations
- loguru: Advanced logging with structured output
- python-dotenv: Environment variable management
- tqdm: Progress bars for large operations
- psutil: Memory and system monitoring
- multiprocessing: Parallel processing support
The agent is designed with a deliberate workflow to ensure it acts with context and precision, especially when modifying the file system.
The agent has access to a suite of tools to understand and interact with the codebase:
query_codebase_knowledge_graph: The primary tool for understanding the repository. It queries the graph database to find files, functions, classes, and their relationships based on natural language.get_code_snippet: Retrieves the exact source code for a specific function or class.read_file_content: Reads the entire content of a specified file.create_new_file: Creates a new file with specified content.replace_code_surgically: Surgically replaces specific code blocks in files. Requires exact target code and replacement. Only modifies the specified block, leaving rest of file unchanged. True surgical patching.execute_shell_command: Executes a shell command in the project's environment.
The agent uses AST-based function targeting with Tree-sitter for precise code modifications. Features include:
- Visual diff preview before changes
- Surgical patching that only modifies target code blocks
- Multi-language support across all supported languages
- Security sandbox preventing edits outside project directory
- Smart function matching with qualified names and line numbers
| Language | Extensions | Functions | Classes/Structs | Modules | Package Detection |
|---|---|---|---|---|---|
| Python | .py |
β | β | β | __init__.py |
| JavaScript | .js, .jsx |
β | β | β | - |
| TypeScript | .ts, .tsx |
β | β | β | - |
| Rust | .rs |
β | β (structs/enums) | β | - |
| Go | .go |
β | β (structs) | β | - |
| Scala | .scala, .sc |
β | β (classes/objects/traits) | β | package declarations |
| Java | .java |
β | β (classes/interfaces/enums) | β | package declarations |
| C++ | .cpp, .h, .hpp, .cc, .cxx, .hxx, .hh |
β | β (classes/structs/unions/enums) | β | - |
| C | .c, .h |
β | β (structs/unions/enums) | β | - |
- Python: Full support including nested functions, methods, classes, and package structure
- JavaScript/TypeScript: Functions, arrow functions, classes, and method definitions
- Rust: Functions, structs, enums, impl blocks, and associated functions
- Go: Functions, methods, type declarations, and struct definitions
- Scala: Functions, methods, classes, objects, traits, case classes, and Scala 3 syntax
- Java: Methods, constructors, classes, interfaces, enums, and annotation types
- C++: Functions, classes, structs, and methods
- C: Functions, structs, unions, enums, typedefs, macros, function pointers, and kernel-specific constructs
The system uses a configuration-driven approach for language support. Each language is defined in codebase_rag/language_config.py.
The Graph-Code RAG system includes a Model Context Protocol (MCP) server that enables AI agents and LLMs to interact with codebases programmatically.
- π Standardized Protocol: Compatible with Claude Desktop, OpenAI assistants, and custom AI agents
- π οΈ Rich Tool Set: Load repositories, query graphs, analyze security, check coverage, and more
- π Real-time Analysis: Stream analysis results directly to AI conversations
- π Secure Access: Built-in security features and rate limiting
- π High Performance: Caching, parallel processing, and optimized queries
- Install MCP dependencies:
pip install mcp>=0.9.0- Start the MCP server:
python -m mcp_server.server
# or use the launcher script
./mcp_server/launch.sh- Configure Claude Desktop (add to config):
{
"mcpServers": {
"code-graph-rag": {
"command": "python",
"args": ["-m", "mcp_server.server"],
"cwd": "/path/to/code-graph-rag"
}
}
}load_repository- Load and analyze a codebasequery_graph- Query using natural language or Cypheranalyze_security- Find security vulnerabilitiesanalyze_test_coverage- Check test coverageget_code_metrics- Get complexity and quality metricsanalyze_git_history- Analyze contributors and commitsfind_code_patterns- Find patterns or anti-patternsexport_graph- Export graph data
# AI agent analyzing code quality
async def analyze_codebase():
# Load repository
await mcp.load_repository("/path/to/repo")
# Check security
vulns = await mcp.analyze_security()
# Get test coverage
coverage = await mcp.analyze_test_coverage()
# Generate recommendations
if vulns['critical'] > 0:
print("Fix critical security issues first!")For detailed MCP documentation, see mcp_server/README.md.
You can build a binary of the application using the build_binary.py script. This script uses PyInstaller to package the application and its dependencies into a single executable.
python build_binary.pyThe resulting binary will be located in the dist directory.
-
Check Memgraph connection:
- Ensure Docker containers are running:
docker-compose ps - Verify Memgraph is accessible on port 7687
- Ensure Docker containers are running:
-
View database in Memgraph Lab:
- Open http://localhost:3000
- Connect to memgraph:7687
-
For local models:
- Verify Ollama is running:
ollama list - Check if models are downloaded:
ollama pull llama3 - Test Ollama API:
curl http://localhost:11434/v1/models - Check Ollama logs:
ollama logs
- Verify Ollama is running:
Please see CONTRIBUTING.md for detailed contribution guidelines.
Good first PRs are from TODO issues.
For issues or questions:
- Check the logs for error details
- Verify Memgraph connection
- Ensure all environment variables are set
- Review the graph schema matches your expectations
- SETUP.md - Complete step-by-step setup guide
- ADVANCED_FEATURES.md - Comprehensive guide to all advanced features
- ADVANCED_USAGE.md - Advanced usage scenarios for large-scale analysis
- QUERY_COOKBOOK.md - Practical query examples and patterns
- MIGRATION.md - Guide for upgrading to latest features
- CHANGELOG.md - Detailed list of changes and new features
- CONTRIBUTING.md - Guidelines for contributors
- Examples - Sample scripts demonstrating features:
graph_export_example.py- Basic graph analysislarge_codebase_example.py- Parallel processing democ_and_test_analysis_example.py- C language and test analysisvcs_and_config_example.py- Git history and configuration analysissecurity_and_test_example.py- Security scanning and test coveragecomprehensive_analysis_example.py- All features integratedkernel_analysis_example.py- Linux kernel code analysislarge_scale_management.py- Managing million-line codebasesmulti_repo_analysis.py- Ecosystem and cross-repository analysisperformance_benchmark.py- Performance testing and optimization
- Security Analysis - Vulnerability detection and security scanning
- Test Coverage - Test analysis and coverage metrics
- Data Flow - Variable tracking and taint analysis
- Git Integration - Version control history and blame
- Configuration Parsing - Config file analysis
- Query Templates - Pre-built Cypher query templates
- Query Cookbook - Extensive collection of query examples