Skip to content

rileyafox/DeepDoc

Repository files navigation

DeepDocs

DeepDocs is an AI-powered developer onboarding tool that ingests GitHub repositories (via ZIP upload), indexes source code into a vector database, and generates comprehensive documentation including architecture diagrams, API references, setup instructions, and contribution guidelines.

Features

  • GitHub Repository Ingestion: Upload repository ZIP files (no GitHub token required)
  • Smart Code Analysis: Automatic code parsing, chunking, and semantic indexing
  • AI-Powered Documentation: Generate detailed architecture docs, onboarding guides, and API references
  • Vector Search: Qdrant-based semantic search for intelligent code exploration
  • Flexible LLM Support:
    • Local inference with Ollama (open-source models)
    • OpenAI API integration
  • Interactive Web UI: Modern React/TypeScript frontend for document visualization
  • Production Ready: Docker containerization, structured logging, and comprehensive testing

Architecture

DeepDocs/
├── app/                      # FastAPI backend
│   ├── main.py              # API endpoints
│   ├── core/                # Configuration and logging
│   ├── models/              # Pydantic schemas
│   └── services/            # Business logic
│       ├── ingest.py        # Repository extraction
│       ├── chunk.py         # Code chunking
│       ├── embeddings.py    # Embedding generation
│       ├── vectorstore.py   # Qdrant integration
│       ├── analyze.py       # Code analysis
│       ├── docgen.py        # Documentation generation
│       ├── polish.py        # Markdown refinement
│       └── render.py        # Diagram rendering
├── web/                     # React + TypeScript frontend
│   ├── src/
│   │   ├── DeepDoc.tsx     # Main component
│   │   └── main.tsx        # Entry point
│   └── Dockerfile.prod     # Production build
├── tests/                   # Unit and integration tests
├── artifacts/               # Generated documentation (gitignored)
├── docker-compose.yml       # Multi-container orchestration
└── requirements.txt         # Python dependencies

Quick Start

Prerequisites

  • Docker and Docker Compose
  • (Optional) Ollama installed locally for open-source LLM support
  • (Optional) OpenAI API key for GPT-based generation

Setup

  1. Clone the repository

    git clone https://github.com/rileyafox/DeepDocs.git
    cd DeepDocs
  2. Configure environment variables

    cp .env.example .env

    Edit .env and choose your LLM provider:

    Option A: Ollama (Local)

    LLM_PROVIDER=ollama
    OLLAMA_BASE_URL=http://host.docker.internal:11434
    OLLAMA_MODEL=qwen2.5:14b

    Option B: OpenAI

    LLM_PROVIDER=openai
    OPENAI_API_KEY=sk-proj-your-api-key
    OPENAI_PROJECT=proj-your-project-id
  3. Start the services

    docker-compose up --build

    Services will be available at:

Usage

Method 1: Web UI (Recommended)

  1. Open http://localhost:5173
  2. Upload a repository ZIP file
  3. Wait for processing to complete
  4. View generated documentation, architecture diagrams, and dependency graphs

Method 2: API

  1. Ingest a repository

    curl -X POST "http://localhost:8000/ingest" \
      -F "file=@yourrepo.zip"

    Response:

    {
      "project_id": "abc123def456",
      "message": "Repository ingested successfully"
    }
  2. Retrieve documentation

    curl "http://localhost:8000/docs/{project_id}"

    Generated artifacts are saved to artifacts/{project_id}/:

    • ONBOARDING.md - Developer onboarding guide
    • ARCHITECTURE.md - System architecture documentation
    • architecture.svg - Visual architecture diagram
    • dependency_graph.json - Code dependency graph
    • repo_map.json - Repository structure map

Configuration

Environment Variables

Variable Default Description
LLM_PROVIDER ollama LLM backend: ollama or openai
OLLAMA_BASE_URL http://host.docker.internal:11434 Ollama API endpoint
OLLAMA_MODEL qwen2.5:14b Ollama model name
OPENAI_API_KEY - OpenAI API key (when using OpenAI)
OPENAI_PROJECT - OpenAI project ID
EMBEDDING_MODEL text-embedding-3-large OpenAI embedding model
GENERATION_MODEL gpt-4o-mini OpenAI generation model
QDRANT_HOST qdrant Qdrant service hostname
QDRANT_PORT 6333 Qdrant service port
MAX_FILE_BYTES 10485760 Max file size (10MB)
MAX_EMBED_CHUNKS 100000 Max chunks to embed
ANALYSIS_TOPN 20 Top N files for detailed analysis
POLISH_MD true Enable markdown polishing
MERMAID_RENDER_URL https://kroki.io/mermaid/svg Mermaid diagram renderer

See .env.example for complete configuration options.

Development

Running Locally (Without Docker)

  1. Backend

    python -m venv venv
    source venv/bin/activate  # Windows: venv\Scripts\activate
    pip install -r requirements.txt
    uvicorn app.main:app --reload
  2. Frontend

    cd web
    npm install
    npm run dev
  3. Qdrant (run separately)

    docker run -p 6333:6333 qdrant/qdrant

Testing

# Run tests
pytest tests/

# Run with coverage
pytest --cov=app tests/

Code Quality

# Format code
make format

# Lint code
make lint

# Type check
make type-check

How It Works

  1. Ingestion: User uploads a repository ZIP file
  2. Extraction: Files are extracted and filtered by type (code, docs, configs)
  3. Chunking: Code is split into semantic chunks (functions, classes, modules)
  4. Embedding: Chunks are converted to vector embeddings using LLM
  5. Indexing: Vectors are stored in Qdrant for semantic search
  6. Analysis: Repository structure, dependencies, and key modules are analyzed
  7. Generation: AI generates comprehensive documentation based on code context
  8. Rendering: Mermaid diagrams are converted to SVG for visualization
  9. Output: Documentation bundle is returned and saved to artifacts

Security & Best Practices

  • Input validation and sanitization for uploaded ZIPs
  • Temporary extraction workspace isolated from application code
  • Whitelist-based file type filtering
  • No arbitrary code execution
  • Environment-based configuration (12-factor app)
  • Structured logging for audit trails
  • Docker isolation and resource limits

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors