DeepDocs is an AI-powered developer onboarding tool that ingests GitHub repositories (via ZIP upload), indexes source code into a vector database, and generates comprehensive documentation including architecture diagrams, API references, setup instructions, and contribution guidelines.
- GitHub Repository Ingestion: Upload repository ZIP files (no GitHub token required)
- Smart Code Analysis: Automatic code parsing, chunking, and semantic indexing
- AI-Powered Documentation: Generate detailed architecture docs, onboarding guides, and API references
- Vector Search: Qdrant-based semantic search for intelligent code exploration
- Flexible LLM Support:
- Local inference with Ollama (open-source models)
- OpenAI API integration
- Interactive Web UI: Modern React/TypeScript frontend for document visualization
- Production Ready: Docker containerization, structured logging, and comprehensive testing
DeepDocs/
├── app/ # FastAPI backend
│ ├── main.py # API endpoints
│ ├── core/ # Configuration and logging
│ ├── models/ # Pydantic schemas
│ └── services/ # Business logic
│ ├── ingest.py # Repository extraction
│ ├── chunk.py # Code chunking
│ ├── embeddings.py # Embedding generation
│ ├── vectorstore.py # Qdrant integration
│ ├── analyze.py # Code analysis
│ ├── docgen.py # Documentation generation
│ ├── polish.py # Markdown refinement
│ └── render.py # Diagram rendering
├── web/ # React + TypeScript frontend
│ ├── src/
│ │ ├── DeepDoc.tsx # Main component
│ │ └── main.tsx # Entry point
│ └── Dockerfile.prod # Production build
├── tests/ # Unit and integration tests
├── artifacts/ # Generated documentation (gitignored)
├── docker-compose.yml # Multi-container orchestration
└── requirements.txt # Python dependencies
- Docker and Docker Compose
- (Optional) Ollama installed locally for open-source LLM support
- (Optional) OpenAI API key for GPT-based generation
-
Clone the repository
git clone https://github.com/rileyafox/DeepDocs.git cd DeepDocs -
Configure environment variables
cp .env.example .env
Edit
.envand choose your LLM provider:Option A: Ollama (Local)
LLM_PROVIDER=ollama OLLAMA_BASE_URL=http://host.docker.internal:11434 OLLAMA_MODEL=qwen2.5:14b
Option B: OpenAI
LLM_PROVIDER=openai OPENAI_API_KEY=sk-proj-your-api-key OPENAI_PROJECT=proj-your-project-id
-
Start the services
docker-compose up --build
Services will be available at:
- API: http://localhost:8000/docs (FastAPI Swagger UI)
- Web UI: http://localhost:5173
- Qdrant: http://localhost:6333/dashboard
- Open http://localhost:5173
- Upload a repository ZIP file
- Wait for processing to complete
- View generated documentation, architecture diagrams, and dependency graphs
-
Ingest a repository
curl -X POST "http://localhost:8000/ingest" \ -F "file=@yourrepo.zip"
Response:
{ "project_id": "abc123def456", "message": "Repository ingested successfully" } -
Retrieve documentation
curl "http://localhost:8000/docs/{project_id}"Generated artifacts are saved to
artifacts/{project_id}/:ONBOARDING.md- Developer onboarding guideARCHITECTURE.md- System architecture documentationarchitecture.svg- Visual architecture diagramdependency_graph.json- Code dependency graphrepo_map.json- Repository structure map
| Variable | Default | Description |
|---|---|---|
LLM_PROVIDER |
ollama |
LLM backend: ollama or openai |
OLLAMA_BASE_URL |
http://host.docker.internal:11434 |
Ollama API endpoint |
OLLAMA_MODEL |
qwen2.5:14b |
Ollama model name |
OPENAI_API_KEY |
- | OpenAI API key (when using OpenAI) |
OPENAI_PROJECT |
- | OpenAI project ID |
EMBEDDING_MODEL |
text-embedding-3-large |
OpenAI embedding model |
GENERATION_MODEL |
gpt-4o-mini |
OpenAI generation model |
QDRANT_HOST |
qdrant |
Qdrant service hostname |
QDRANT_PORT |
6333 |
Qdrant service port |
MAX_FILE_BYTES |
10485760 |
Max file size (10MB) |
MAX_EMBED_CHUNKS |
100000 |
Max chunks to embed |
ANALYSIS_TOPN |
20 |
Top N files for detailed analysis |
POLISH_MD |
true |
Enable markdown polishing |
MERMAID_RENDER_URL |
https://kroki.io/mermaid/svg |
Mermaid diagram renderer |
See .env.example for complete configuration options.
-
Backend
python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate pip install -r requirements.txt uvicorn app.main:app --reload
-
Frontend
cd web npm install npm run dev -
Qdrant (run separately)
docker run -p 6333:6333 qdrant/qdrant
# Run tests
pytest tests/
# Run with coverage
pytest --cov=app tests/# Format code
make format
# Lint code
make lint
# Type check
make type-check- Ingestion: User uploads a repository ZIP file
- Extraction: Files are extracted and filtered by type (code, docs, configs)
- Chunking: Code is split into semantic chunks (functions, classes, modules)
- Embedding: Chunks are converted to vector embeddings using LLM
- Indexing: Vectors are stored in Qdrant for semantic search
- Analysis: Repository structure, dependencies, and key modules are analyzed
- Generation: AI generates comprehensive documentation based on code context
- Rendering: Mermaid diagrams are converted to SVG for visualization
- Output: Documentation bundle is returned and saved to artifacts
- Input validation and sanitization for uploaded ZIPs
- Temporary extraction workspace isolated from application code
- Whitelist-based file type filtering
- No arbitrary code execution
- Environment-based configuration (12-factor app)
- Structured logging for audit trails
- Docker isolation and resource limits