Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ node_modules/
## Build
build/
dist/
*.egg-info/

## Logs
logs/
Expand All @@ -41,4 +42,5 @@ data/
## .cursor
.cursor/
docs/
tests/
tests/
.aider*
165 changes: 165 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

Code Graph Knowledge System is a Neo4j-based intelligent knowledge management system that combines vector search, graph databases, and LLM integration for document processing and RAG (Retrieval Augmented Generation). The system processes documents into a knowledge graph and provides intelligent querying capabilities.

## Architecture

### Core Components
- **FastAPI Application** (`main.py`, `core/app.py`): Main web server with async request handling
- **Neo4j Knowledge Service** (`services/neo4j_knowledge_service.py`): Primary service handling LlamaIndex + Neo4j integration for knowledge graph operations
- **SQL Parsers** (`services/sql_parser.py`, `services/universal_sql_schema_parser.py`): Database schema analysis and parsing
- **Task Queue System** (`services/task_queue.py`, `monitoring/task_monitor.py`): Async background processing with web monitoring
- **MCP Server** (`mcp_server.py`, `start_mcp.py`): Model Context Protocol integration for AI assistants

### Multi-Provider LLM Support
The system supports multiple LLM and embedding providers:
- **Ollama**: Local LLM hosting (default)
- **OpenAI**: GPT models and embeddings
- **Google Gemini**: Gemini models and embeddings
- **OpenRouter**: Access to multiple model providers
- **HuggingFace**: Local embedding models

Configuration is handled via environment variables in `.env` file (see `env.example`).

## Development Commands

### Running the Application
```bash
# Start main application
python start.py

# Start MCP server (for AI assistant integration)
python start_mcp.py

# Using script entry points (after uv sync)
uv run server
uv run mcp_client

# Direct FastAPI startup
python main.py
```

### Testing
```bash
# Run tests
pytest tests/

# Run with coverage
pytest tests/ --cov

# Run specific test file
pytest tests/test_specific.py
```

### Code Quality
```bash
# Format code
black .
isort .

# Lint code
ruff check .
```

### Dependencies
```bash
# Install dependencies
pip install -e .

# Using uv (recommended)
uv pip install -e .
```

## Configuration

### Environment Setup
1. Copy `env.example` to `.env`
2. Configure Neo4j connection: `NEO4J_URI`, `NEO4J_USER`, `NEO4J_PASSWORD`
3. Choose LLM provider: `LLM_PROVIDER` (ollama/openai/gemini/openrouter)
4. Set embedding provider: `EMBEDDING_PROVIDER`

### Neo4j Requirements
- Neo4j 5.0+ with APOC plugin
- Default connection: `bolt://localhost:7687`
- Database: `neo4j` (default)

### Service Dependencies
The application checks service health on startup via `start.py:check_dependencies()`. Required services:
- Neo4j database connection
- LLM provider (Ollama/OpenAI/etc.)

## Key Development Patterns

### Service Initialization
All services use async initialization patterns. The `Neo4jKnowledgeService` must be initialized before use:
```python
await knowledge_service.initialize()
```

### Error Handling
Services return structured responses with `success` field and error details:
```python
result = await service.operation()
if not result.get("success"):
# Handle error from result["error"]
```

### Timeout Management
Operations use configurable timeouts from `config.py`:
- `connection_timeout`: Database connections
- `operation_timeout`: Standard operations
- `large_document_timeout`: Large document processing

### LlamaIndex Integration
The system uses LlamaIndex's `KnowledgeGraphIndex` with Neo4j backend. Global settings are configured in `services/neo4j_knowledge_service.py:initialize()`.

## API Structure

### Main Endpoints
- `/api/v1/health`: Service health check
- `/api/v1/knowledge/query`: Query knowledge base with RAG
- `/api/v1/knowledge/search`: Vector similarity search
- `/api/v1/documents/*`: Document management
- `/api/v1/sql/*`: SQL parsing and analysis

### Real-time Task Monitoring
The system provides multiple approaches for real-time task monitoring:

#### Web UI Monitoring (`/ui/monitor`)
When `ENABLE_MONITORING=true`, NiceGUI monitoring interface is available with:
- Real-time task status updates via WebSocket
- File upload functionality (50KB size limit)
- Directory batch processing
- Task progress visualization

#### Server-Sent Events (SSE) API
SSE endpoints for streaming real-time updates:
- `/api/v1/sse/task/{task_id}`: Monitor single task progress
- `/api/v1/sse/tasks`: Monitor all tasks with optional status filtering
- `/api/v1/sse/stats`: Get active SSE connection statistics

#### MCP Real-time Tools
MCP server provides real-time monitoring tools:
- `watch_task`: Monitor single task with progress history
- `watch_tasks`: Monitor multiple tasks until completion
- Supports custom timeouts and update intervals
- **Note**: These are MCP protocol tools, not HTTP endpoints

#### Client Implementation Examples
- `examples/pure_mcp_client.py`: Pure MCP client using `watch_task` tools
- `examples/hybrid_http_sse_client.py`: HTTP + SSE hybrid approach

### Large File Handling Strategy
The system handles large documents through multiple approaches:
- **Small files (<10KB)**: Direct synchronous processing
- **Medium files (10-50KB)**: Temporary file strategy with background processing
- **Large files (>50KB)**: UI prompts for directory processing or MCP client usage
- **MCP client**: Automatic temporary file creation for large documents

## Testing Approach

Tests are located in `tests/` directory. The system includes comprehensive testing for SQL parsing functionality. Use `pytest` for running tests.
68 changes: 66 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,15 @@ Code Graph Knowledge System is an enterprise-grade solution that transforms unst
- **Universal SQL Schema Parser**: Configurable database schema analysis with industry-specific templates
- **Intelligent Query Engine**: Hybrid search combining vector similarity and graph traversal
- **Asynchronous Task Processing**: Background processing for large document collections with real-time monitoring
- **Web-based Monitoring Dashboard**: Real-time task queue monitoring with NiceGUI interface
- **Real-time Task Monitoring**: Multiple real-time monitoring solutions
- Web UI Monitoring: NiceGUI interface with file upload and directory batch processing
- SSE Streaming API: HTTP Server-Sent Events for real-time task progress updates
- MCP Real-time Tools: AI assistant integrated task monitoring tools
- **Multi-Database Support**: Oracle, MySQL, PostgreSQL, SQL Server schema parsing and analysis
- **RESTful API**: Complete API endpoints for document management and knowledge querying
- **MCP Protocol Support**: Model Context Protocol integration for AI assistant compatibility
- **Multi-provider LLM Support**: Compatible with Ollama, OpenAI, and Gemini models
- **Multi-provider LLM Support**: Compatible with Ollama, OpenAI, Gemini, and OpenRouter models
- **Large File Handling Strategy**: Intelligent file size detection with multiple processing approaches

### Technical Architecture
- **FastAPI Backend**: High-performance async web framework
Expand Down Expand Up @@ -105,12 +109,21 @@ Code Graph Knowledge System is an enterprise-grade solution that transforms unst

5. **Run the Application**
```bash
# Start main service
python start.py
# or use script entry points
uv run server

# Start MCP service (optional)
python start_mcp.py
# or use script entry points
uv run mcp_client
```

6. **Access the Interface**
- API Documentation: http://localhost:8000/docs
- Task Monitor: http://localhost:8000/ui/monitor
- Real-time SSE Monitor: http://localhost:8000/api/v1/sse/tasks
- Health Check: http://localhost:8000/api/v1/health

## API Usage
Expand Down Expand Up @@ -154,6 +167,53 @@ response = httpx.post("http://localhost:8000/api/v1/knowledge/search", json={
})
```

## Real-time Task Monitoring

The system provides three real-time task monitoring approaches:

### 1. Web UI Monitoring Interface
Access http://localhost:8000/ui/monitor for graphical monitoring:
- Real-time task status updates
- File upload functionality (50KB size limit)
- Directory batch processing
- Task progress visualization

### 2. Server-Sent Events (SSE) API
Real-time monitoring via HTTP streaming endpoints:

```javascript
// Monitor single task
const eventSource = new EventSource('/api/v1/sse/task/task-id');
eventSource.onmessage = function(event) {
const data = JSON.parse(event.data);
console.log('Task progress:', data.progress);
};

// Monitor all tasks
const allTasksSource = new EventSource('/api/v1/sse/tasks');
```

### 3. MCP Real-time Tools
Task monitoring via MCP protocol:

```python
# Use pure MCP client monitoring
# See examples/pure_mcp_client.py

# Monitor single task
result = await session.call_tool("watch_task", {
"task_id": task_id,
"timeout": 300,
"interval": 1.0
})

# Monitor multiple tasks
result = await session.call_tool("watch_tasks", {
"task_ids": [task1, task2, task3],
"timeout": 300
})
```

## MCP Integration

The system supports Model Context Protocol (MCP) for seamless integration with AI assistants:
Expand All @@ -174,6 +234,10 @@ python start_mcp.py
}
```

### Client Implementation Examples
- `examples/pure_mcp_client.py`: Pure MCP client using MCP tools for monitoring
- `examples/hybrid_http_sse_client.py`: HTTP + SSE hybrid approach

## Configuration

Key configuration options in `.env`:
Expand Down
68 changes: 66 additions & 2 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,14 @@
- **Neo4j GraphRAG 集成**:使用 Neo4j 原生向量索引的高级图检索增强生成
- **智能查询引擎**:结合向量相似度和图遍历的混合搜索
- **异步任务处理**:支持大型文档集合的后台处理和实时监控
- **基于Web的监控仪表板**:使用 NiceGUI 界面进行实时任务队列监控
- **实时任务监控**:多种实时监控方案
- Web UI监控:NiceGUI界面,支持文件上传和目录批处理
- SSE流式API:HTTP Server-Sent Events实时任务进度推送
- MCP实时工具:AI助手集成的任务监控工具
- **RESTful API**:完整的文档管理和知识查询 API 端点
- **MCP 协议支持**:模型上下文协议集成,兼容 AI 助手
- **多提供商LLM支持**:兼容 Ollama、OpenAI 和 Gemini 模型
- **多提供商LLM支持**:兼容 Ollama、OpenAI、Gemini 和 OpenRouter 模型
- **大文件处理策略**:智能文件大小检测和多种处理方案

### 技术架构
- **FastAPI 后端**:高性能异步网络框架
Expand Down Expand Up @@ -92,12 +96,21 @@

5. **运行应用程序**
```bash
# 启动主服务
python start.py
# 或使用脚本入口点
uv run server

# 启动MCP服务(可选)
python start_mcp.py
# 或使用脚本入口点
uv run mcp_client
```

6. **访问界面**
- API 文档:http://localhost:8000/docs
- 任务监控:http://localhost:8000/ui/monitor
- 实时监控SSE:http://localhost:8000/api/v1/sse/tasks
- 健康检查:http://localhost:8000/api/v1/health

## API 使用
Expand Down Expand Up @@ -141,6 +154,53 @@ response = httpx.post("http://localhost:8000/api/v1/knowledge/search", json={
})
```

## 实时任务监控

系统提供三种实时任务监控方案:

### 1. Web UI 监控界面
访问 http://localhost:8000/ui/monitor 使用图形界面:
- 实时任务状态更新
- 文件上传功能(50KB大小限制)
- 目录批量处理
- 任务进度可视化

### 2. Server-Sent Events (SSE) API
通过 HTTP 流式端点进行实时监控:

```javascript
// 监控单个任务
const eventSource = new EventSource('/api/v1/sse/task/task-id');
eventSource.onmessage = function(event) {
const data = JSON.parse(event.data);
console.log('Task progress:', data.progress);
};

// 监控所有任务
const allTasksSource = new EventSource('/api/v1/sse/tasks');
```

### 3. MCP 实时工具
通过 MCP 协议进行任务监控:

```python
# 使用纯MCP客户端监控
# 参见 examples/pure_mcp_client.py

# 监控单个任务
result = await session.call_tool("watch_task", {
"task_id": task_id,
"timeout": 300,
"interval": 1.0
})

# 监控多个任务
result = await session.call_tool("watch_tasks", {
"task_ids": [task1, task2, task3],
"timeout": 300
})
```

## MCP 集成

系统支持模型上下文协议(MCP),可与 AI 助手无缝集成:
Expand All @@ -161,6 +221,10 @@ python start_mcp.py
}
```

### 客户端实现示例
- `examples/pure_mcp_client.py`: 纯MCP客户端,使用MCP工具进行监控
- `examples/hybrid_http_sse_client.py`: HTTP + SSE 混合方案

## 配置

`.env` 文件中的关键配置选项:
Expand Down
Loading