DeepDocs

DeepDocs is an AI-powered developer onboarding tool that ingests GitHub repositories (via ZIP upload), indexes source code into a vector database, and generates comprehensive documentation including architecture diagrams, API references, setup instructions, and contribution guidelines.

Features

GitHub Repository Ingestion: Upload repository ZIP files (no GitHub token required)
Smart Code Analysis: Automatic code parsing, chunking, and semantic indexing
AI-Powered Documentation: Generate detailed architecture docs, onboarding guides, and API references
Vector Search: Qdrant-based semantic search for intelligent code exploration
Flexible LLM Support:
- Local inference with Ollama (open-source models)
- OpenAI API integration
Interactive Web UI: Modern React/TypeScript frontend for document visualization
Production Ready: Docker containerization, structured logging, and comprehensive testing

Architecture

DeepDocs/
├── app/                      # FastAPI backend
│   ├── main.py              # API endpoints
│   ├── core/                # Configuration and logging
│   ├── models/              # Pydantic schemas
│   └── services/            # Business logic
│       ├── ingest.py        # Repository extraction
│       ├── chunk.py         # Code chunking
│       ├── embeddings.py    # Embedding generation
│       ├── vectorstore.py   # Qdrant integration
│       ├── analyze.py       # Code analysis
│       ├── docgen.py        # Documentation generation
│       ├── polish.py        # Markdown refinement
│       └── render.py        # Diagram rendering
├── web/                     # React + TypeScript frontend
│   ├── src/
│   │   ├── DeepDoc.tsx     # Main component
│   │   └── main.tsx        # Entry point
│   └── Dockerfile.prod     # Production build
├── tests/                   # Unit and integration tests
├── artifacts/               # Generated documentation (gitignored)
├── docker-compose.yml       # Multi-container orchestration
└── requirements.txt         # Python dependencies

Quick Start

Prerequisites

Docker and Docker Compose
(Optional) Ollama installed locally for open-source LLM support
(Optional) OpenAI API key for GPT-based generation

Setup

Clone the repository

git clone https://github.com/rileyafox/DeepDocs.git
cd DeepDocs

Configure environment variables

cp .env.example .env

Edit .env and choose your LLM provider:

Option A: Ollama (Local)

LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://host.docker.internal:11434
OLLAMA_MODEL=qwen2.5:14b

Option B: OpenAI

LLM_PROVIDER=openai
OPENAI_API_KEY=sk-proj-your-api-key
OPENAI_PROJECT=proj-your-project-id

Start the services
```
docker-compose up --build
```
Services will be available at:
- API: http://localhost:8000/docs (FastAPI Swagger UI)
- Web UI: http://localhost:5173
- Qdrant: http://localhost:6333/dashboard

Usage

Method 1: Web UI (Recommended)

Open http://localhost:5173
Upload a repository ZIP file
Wait for processing to complete
View generated documentation, architecture diagrams, and dependency graphs

Method 2: API

Ingest a repository

curl -X POST "http://localhost:8000/ingest" \
  -F "file=@yourrepo.zip"

Response:

{
  "project_id": "abc123def456",
  "message": "Repository ingested successfully"
}

Retrieve documentation
```
curl "http://localhost:8000/docs/{project_id}"
```
Generated artifacts are saved to artifacts/{project_id}/:
- ONBOARDING.md - Developer onboarding guide
- ARCHITECTURE.md - System architecture documentation
- architecture.svg - Visual architecture diagram
- dependency_graph.json - Code dependency graph
- repo_map.json - Repository structure map

Configuration

Environment Variables

Variable	Default	Description
`LLM_PROVIDER`	`ollama`	LLM backend: `ollama` or `openai`
`OLLAMA_BASE_URL`	`http://host.docker.internal:11434`	Ollama API endpoint
`OLLAMA_MODEL`	`qwen2.5:14b`	Ollama model name
`OPENAI_API_KEY`	-	OpenAI API key (when using OpenAI)
`OPENAI_PROJECT`	-	OpenAI project ID
`EMBEDDING_MODEL`	`text-embedding-3-large`	OpenAI embedding model
`GENERATION_MODEL`	`gpt-4o-mini`	OpenAI generation model
`QDRANT_HOST`	`qdrant`	Qdrant service hostname
`QDRANT_PORT`	`6333`	Qdrant service port
`MAX_FILE_BYTES`	`10485760`	Max file size (10MB)
`MAX_EMBED_CHUNKS`	`100000`	Max chunks to embed
`ANALYSIS_TOPN`	`20`	Top N files for detailed analysis
`POLISH_MD`	`true`	Enable markdown polishing
`MERMAID_RENDER_URL`	`https://kroki.io/mermaid/svg`	Mermaid diagram renderer

See .env.example for complete configuration options.

Development

Running Locally (Without Docker)

Backend

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reload

Frontend
```
cd web
npm install
npm run dev
```
Qdrant (run separately)
```
docker run -p 6333:6333 qdrant/qdrant
```

Testing

# Run tests
pytest tests/

# Run with coverage
pytest --cov=app tests/

Code Quality

# Format code
make format

# Lint code
make lint

# Type check
make type-check

How It Works

Ingestion: User uploads a repository ZIP file
Extraction: Files are extracted and filtered by type (code, docs, configs)
Chunking: Code is split into semantic chunks (functions, classes, modules)
Embedding: Chunks are converted to vector embeddings using LLM
Indexing: Vectors are stored in Qdrant for semantic search
Analysis: Repository structure, dependencies, and key modules are analyzed
Generation: AI generates comprehensive documentation based on code context
Rendering: Mermaid diagrams are converted to SVG for visualization
Output: Documentation bundle is returned and saved to artifacts

Security & Best Practices

Input validation and sanitization for uploaded ZIPs
Temporary extraction workspace isolated from application code
Whitelist-based file type filtering
No arbitrary code execution
Environment-based configuration (12-factor app)
Structured logging for audit trails
Docker isolation and resource limits

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
tests		tests
web		web
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pre-commit-config.yaml		pre-commit-config.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepDocs

Features

Architecture

Quick Start

Prerequisites

Setup

Usage

Method 1: Web UI (Recommended)

Method 2: API

Configuration

Environment Variables

Development

Running Locally (Without Docker)

Testing

Code Quality

How It Works

Security & Best Practices

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeepDocs

Features

Architecture

Quick Start

Prerequisites

Setup

Usage

Method 1: Web UI (Recommended)

Method 2: API

Configuration

Environment Variables

Development

Running Locally (Without Docker)

Testing

Code Quality

How It Works

Security & Best Practices

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages