royisme · royisme · Jun 15, 2025 · Jun 10, 2025 · Jun 10, 2025 · Jun 10, 2025
diff --git a/.gitignore b/.gitignore
@@ -21,6 +21,7 @@ node_modules/
 ## Build
 build/
 dist/
+*.egg-info/
 
 ## Logs
 logs/
@@ -41,4 +42,5 @@ data/
 ## .cursor
 .cursor/
 docs/
-tests/
+tests/
+.aider*
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,165 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+Code Graph Knowledge System is a Neo4j-based intelligent knowledge management system that combines vector search, graph databases, and LLM integration for document processing and RAG (Retrieval Augmented Generation). The system processes documents into a knowledge graph and provides intelligent querying capabilities.
+
+## Architecture
+
+### Core Components
+- **FastAPI Application** (`main.py`, `core/app.py`): Main web server with async request handling
+- **Neo4j Knowledge Service** (`services/neo4j_knowledge_service.py`): Primary service handling LlamaIndex + Neo4j integration for knowledge graph operations
+- **SQL Parsers** (`services/sql_parser.py`, `services/universal_sql_schema_parser.py`): Database schema analysis and parsing
+- **Task Queue System** (`services/task_queue.py`, `monitoring/task_monitor.py`): Async background processing with web monitoring
+- **MCP Server** (`mcp_server.py`, `start_mcp.py`): Model Context Protocol integration for AI assistants
+
+### Multi-Provider LLM Support
+The system supports multiple LLM and embedding providers:
+- **Ollama**: Local LLM hosting (default)
+- **OpenAI**: GPT models and embeddings
+- **Google Gemini**: Gemini models and embeddings  
+- **OpenRouter**: Access to multiple model providers
+- **HuggingFace**: Local embedding models
+
+Configuration is handled via environment variables in `.env` file (see `env.example`).
+
+## Development Commands
+
+### Running the Application
+```bash
+# Start main application
+python start.py
+
+# Start MCP server (for AI assistant integration)
+python start_mcp.py
+
+# Using script entry points (after uv sync)
+uv run server
+uv run mcp_client
+
+# Direct FastAPI startup
+python main.py
+```
+
+### Testing
+```bash
+# Run tests
+pytest tests/
+
+# Run with coverage
+pytest tests/ --cov
+
+# Run specific test file
+pytest tests/test_specific.py
+```
+
+### Code Quality
+```bash
+# Format code
+black .
+isort .
+
+# Lint code
+ruff check .
+```
+
+### Dependencies
+```bash
+# Install dependencies
+pip install -e .
+
+# Using uv (recommended)
+uv pip install -e .
+```
+
+## Configuration
+
+### Environment Setup
+1. Copy `env.example` to `.env`
+2. Configure Neo4j connection: `NEO4J_URI`, `NEO4J_USER`, `NEO4J_PASSWORD`
+3. Choose LLM provider: `LLM_PROVIDER` (ollama/openai/gemini/openrouter)
+4. Set embedding provider: `EMBEDDING_PROVIDER`
+
+### Neo4j Requirements
+- Neo4j 5.0+ with APOC plugin
+- Default connection: `bolt://localhost:7687`
+- Database: `neo4j` (default)
+
+### Service Dependencies
+The application checks service health on startup via `start.py:check_dependencies()`. Required services:
+- Neo4j database connection
+- LLM provider (Ollama/OpenAI/etc.)
+
+## Key Development Patterns
+
+### Service Initialization
+All services use async initialization patterns. The `Neo4jKnowledgeService` must be initialized before use:
+```python
+await knowledge_service.initialize()
+```
+
+### Error Handling
+Services return structured responses with `success` field and error details:
+```python
+result = await service.operation()
+if not result.get("success"):
+    # Handle error from result["error"]
+```
+
+### Timeout Management
+Operations use configurable timeouts from `config.py`:
+- `connection_timeout`: Database connections
+- `operation_timeout`: Standard operations
+- `large_document_timeout`: Large document processing
+
+### LlamaIndex Integration
+The system uses LlamaIndex's `KnowledgeGraphIndex` with Neo4j backend. Global settings are configured in `services/neo4j_knowledge_service.py:initialize()`.
+
+## API Structure
+
+### Main Endpoints
+- `/api/v1/health`: Service health check
+- `/api/v1/knowledge/query`: Query knowledge base with RAG
+- `/api/v1/knowledge/search`: Vector similarity search
+- `/api/v1/documents/*`: Document management
+- `/api/v1/sql/*`: SQL parsing and analysis
+
+### Real-time Task Monitoring
+The system provides multiple approaches for real-time task monitoring:
+
+#### Web UI Monitoring (`/ui/monitor`)
+When `ENABLE_MONITORING=true`, NiceGUI monitoring interface is available with:
+- Real-time task status updates via WebSocket
+- File upload functionality (50KB size limit)
+- Directory batch processing
+- Task progress visualization
+
+#### Server-Sent Events (SSE) API
+SSE endpoints for streaming real-time updates:
+- `/api/v1/sse/task/{task_id}`: Monitor single task progress
+- `/api/v1/sse/tasks`: Monitor all tasks with optional status filtering
+- `/api/v1/sse/stats`: Get active SSE connection statistics
+
+#### MCP Real-time Tools
+MCP server provides real-time monitoring tools:
+- `watch_task`: Monitor single task with progress history
+- `watch_tasks`: Monitor multiple tasks until completion
+- Supports custom timeouts and update intervals
+- **Note**: These are MCP protocol tools, not HTTP endpoints
+
+#### Client Implementation Examples
+- `examples/pure_mcp_client.py`: Pure MCP client using `watch_task` tools
+- `examples/hybrid_http_sse_client.py`: HTTP + SSE hybrid approach
+
+### Large File Handling Strategy
+The system handles large documents through multiple approaches:
+- **Small files (<10KB)**: Direct synchronous processing
+- **Medium files (10-50KB)**: Temporary file strategy with background processing
+- **Large files (>50KB)**: UI prompts for directory processing or MCP client usage
+- **MCP client**: Automatic temporary file creation for large documents
+
+## Testing Approach
+
+Tests are located in `tests/` directory. The system includes comprehensive testing for SQL parsing functionality. Use `pytest` for running tests.
diff --git a/README.md b/README.md
@@ -14,11 +14,15 @@ Code Graph Knowledge System is an enterprise-grade solution that transforms unst
 - **Universal SQL Schema Parser**: Configurable database schema analysis with industry-specific templates
 - **Intelligent Query Engine**: Hybrid search combining vector similarity and graph traversal
 - **Asynchronous Task Processing**: Background processing for large document collections with real-time monitoring
-- **Web-based Monitoring Dashboard**: Real-time task queue monitoring with NiceGUI interface
+- **Real-time Task Monitoring**: Multiple real-time monitoring solutions
+  - Web UI Monitoring: NiceGUI interface with file upload and directory batch processing
+  - SSE Streaming API: HTTP Server-Sent Events for real-time task progress updates
+  - MCP Real-time Tools: AI assistant integrated task monitoring tools
 - **Multi-Database Support**: Oracle, MySQL, PostgreSQL, SQL Server schema parsing and analysis
 - **RESTful API**: Complete API endpoints for document management and knowledge querying
 - **MCP Protocol Support**: Model Context Protocol integration for AI assistant compatibility
-- **Multi-provider LLM Support**: Compatible with Ollama, OpenAI, and Gemini models
+- **Multi-provider LLM Support**: Compatible with Ollama, OpenAI, Gemini, and OpenRouter models
+- **Large File Handling Strategy**: Intelligent file size detection with multiple processing approaches
 
 ### Technical Architecture
 - **FastAPI Backend**: High-performance async web framework
@@ -105,12 +109,21 @@ Code Graph Knowledge System is an enterprise-grade solution that transforms unst
 
 5. **Run the Application**
    ```bash
+   # Start main service
    python start.py
+   # or use script entry points
+   uv run server
+
+   # Start MCP service (optional)
+   python start_mcp.py
+   # or use script entry points
+   uv run mcp_client
    ```
 
 6. **Access the Interface**
    - API Documentation: http://localhost:8000/docs
    - Task Monitor: http://localhost:8000/ui/monitor
+   - Real-time SSE Monitor: http://localhost:8000/api/v1/sse/tasks
    - Health Check: http://localhost:8000/api/v1/health
 
 ## API Usage
@@ -154,6 +167,53 @@ response = httpx.post("http://localhost:8000/api/v1/knowledge/search", json={
 })
 ```
 
+## Real-time Task Monitoring
+
+The system provides three real-time task monitoring approaches:
+
+### 1. Web UI Monitoring Interface
+Access http://localhost:8000/ui/monitor for graphical monitoring:
+- Real-time task status updates
+- File upload functionality (50KB size limit)
+- Directory batch processing
+- Task progress visualization
+
+### 2. Server-Sent Events (SSE) API
+Real-time monitoring via HTTP streaming endpoints:
+
+```javascript
+// Monitor single task
+const eventSource = new EventSource('/api/v1/sse/task/task-id');
+eventSource.onmessage = function(event) {
+    const data = JSON.parse(event.data);
+    console.log('Task progress:', data.progress);
+};
+
+// Monitor all tasks
+const allTasksSource = new EventSource('/api/v1/sse/tasks');
+```
+
+### 3. MCP Real-time Tools
+Task monitoring via MCP protocol:
+
+```python
+# Use pure MCP client monitoring
+# See examples/pure_mcp_client.py
+
+# Monitor single task
+result = await session.call_tool("watch_task", {
+    "task_id": task_id,
+    "timeout": 300,
+    "interval": 1.0
+})
+
+# Monitor multiple tasks
+result = await session.call_tool("watch_tasks", {
+    "task_ids": [task1, task2, task3],
+    "timeout": 300
+})
+```
+
 ## MCP Integration
 
 The system supports Model Context Protocol (MCP) for seamless integration with AI assistants:
@@ -174,6 +234,10 @@ python start_mcp.py
 }
 ```
 
+### Client Implementation Examples
+- `examples/pure_mcp_client.py`: Pure MCP client using MCP tools for monitoring
+- `examples/hybrid_http_sse_client.py`: HTTP + SSE hybrid approach
+
 ## Configuration
 
 Key configuration options in `.env`:

diff --git a/README_CN.md b/README_CN.md
@@ -13,10 +13,14 @@
 - **Neo4j GraphRAG 集成**：使用 Neo4j 原生向量索引的高级图检索增强生成
 - **智能查询引擎**：结合向量相似度和图遍历的混合搜索
 - **异步任务处理**：支持大型文档集合的后台处理和实时监控
-- **基于Web的监控仪表板**：使用 NiceGUI 界面进行实时任务队列监控
+- **实时任务监控**：多种实时监控方案
+  - Web UI监控：NiceGUI界面，支持文件上传和目录批处理
+  - SSE流式API：HTTP Server-Sent Events实时任务进度推送
+  - MCP实时工具：AI助手集成的任务监控工具
 - **RESTful API**：完整的文档管理和知识查询 API 端点
 - **MCP 协议支持**：模型上下文协议集成，兼容 AI 助手
-- **多提供商LLM支持**：兼容 Ollama、OpenAI 和 Gemini 模型
+- **多提供商LLM支持**：兼容 Ollama、OpenAI、Gemini 和 OpenRouter 模型
+- **大文件处理策略**：智能文件大小检测和多种处理方案
 
 ### 技术架构
 - **FastAPI 后端**：高性能异步网络框架
@@ -92,12 +96,21 @@
 
 5. **运行应用程序**
    ```bash
+   # 启动主服务
    python start.py
+   # 或使用脚本入口点
+   uv run server
+
+   # 启动MCP服务（可选）
+   python start_mcp.py
+   # 或使用脚本入口点
+   uv run mcp_client
    ```
 
 6. **访问界面**
    - API 文档：http://localhost:8000/docs
    - 任务监控：http://localhost:8000/ui/monitor
+   - 实时监控SSE：http://localhost:8000/api/v1/sse/tasks
    - 健康检查：http://localhost:8000/api/v1/health
 
 ## API 使用
@@ -141,6 +154,53 @@ response = httpx.post("http://localhost:8000/api/v1/knowledge/search", json={
 })
 ```
 
+## 实时任务监控
+
+系统提供三种实时任务监控方案：
+
+### 1. Web UI 监控界面
+访问 http://localhost:8000/ui/monitor 使用图形界面：
+- 实时任务状态更新
+- 文件上传功能（50KB大小限制）
+- 目录批量处理
+- 任务进度可视化
+
+### 2. Server-Sent Events (SSE) API
+通过 HTTP 流式端点进行实时监控：
+
+```javascript
+// 监控单个任务
+const eventSource = new EventSource('/api/v1/sse/task/task-id');
+eventSource.onmessage = function(event) {
+    const data = JSON.parse(event.data);
+    console.log('Task progress:', data.progress);
+};
+
+// 监控所有任务
+const allTasksSource = new EventSource('/api/v1/sse/tasks');
+```
+
+### 3. MCP 实时工具
+通过 MCP 协议进行任务监控：
+
+```python
+# 使用纯MCP客户端监控
+# 参见 examples/pure_mcp_client.py
+
+# 监控单个任务
+result = await session.call_tool("watch_task", {
+    "task_id": task_id,
+    "timeout": 300,
+    "interval": 1.0
+})
+
+# 监控多个任务
+result = await session.call_tool("watch_tasks", {
+    "task_ids": [task1, task2, task3],
+    "timeout": 300
+})
+```
+
 ## MCP 集成
 
 系统支持模型上下文协议（MCP），可与 AI 助手无缝集成：
@@ -161,6 +221,10 @@ python start_mcp.py
 }
 ```
 
+### 客户端实现示例
+- `examples/pure_mcp_client.py`: 纯MCP客户端，使用MCP工具进行监控
+- `examples/hybrid_http_sse_client.py`: HTTP + SSE 混合方案
+
 ## 配置
 
 `.env` 文件中的关键配置选项：