API Map

API Map - Visual Structure

Base URL: http://localhost:8004/api/v1

/api/v1
│
├── /health                              GET    ✓  Check API health
├── /ready                               GET    ✓  Check dependencies
├── /info                                GET    ✓  Get API info
│
├── /knowledge-bases
│   ├── /                                POST   ✓  Create KB
│   ├── /                                GET    ✓  List KBs (paginated)
│   ├── /{kb_id}                         GET    ✓  Get KB details
│   ├── /{kb_id}                         PUT    ✓  Update KB
│   ├── /{kb_id}/retrieval-settings      GET    ✓  Get KB retrieval settings
│   ├── /{kb_id}/retrieval-settings      PUT    ✓  Update KB retrieval settings
│   ├── /{kb_id}/retrieval-settings      DELETE ✓  Clear KB retrieval settings
│   ├── /{kb_id}                         DELETE ✓  Delete KB (soft)
│   ├── /{kb_id}/reprocess               POST   ✓  Reprocess all docs
│   ├── /{kb_id}/regenerate_chat_titles  POST   🤖 Regenerate chat titles
│   └── /{kb_id}/cleanup-orphaned-chunks POST   ✓  Clean orphans
│
├── /documents
│   ├── /                                POST   ✓  Upload document (multipart/form-data)
│   ├── /                                GET    ✓  List documents (paginated, filterable)
│   ├── /{doc_id}                        GET    ✓  Get document + content
│   ├── /{doc_id}                        DELETE ✓  Delete document + vectors
│   ├── /{doc_id}/status                 GET    ⚡ Get processing status (optimized for polling)
│   ├── /{doc_id}/reprocess              POST   ✓  Reprocess document
│   ├── /{doc_id}/analyze                POST   🤖 Analyze structure (LLM)
│   ├── /{doc_id}/structure/apply        POST   ✓  Apply structure
│   └── /{doc_id}/structure              GET    ✓  Get structure
│
├── /chat
│   ├── /                                POST   🤖 Query KB (RAG)
│   ├── /knowledge-bases/{kb_id}/stats   GET    ✓  Get chat stats
│   ├── /conversations                   GET    ✓  List conversations
│   ├── /conversations/{id}              GET    ✓  Get conversation
│   ├── /conversations/{id}              PATCH  ✓  Update conversation (title)
│   ├── /conversations/{id}              DELETE ✓  Delete conversation
│   ├── /conversations/{id}/settings     PATCH  ✓  Update settings
│   └── /conversations/{id}/messages     GET    ✓  Get messages
│
├── /retrieve
│   └── /                                POST   ✓  Retrieve chunks (no LLM)
│
├── /prompts
│   ├── /                                GET    ✓  List chat prompt versions
│   ├── /                                POST   ✓  Create chat prompt version
│   ├── /active                          GET    ✓  Get active chat prompt
│   ├── /{id}                            GET    ✓  Get chat prompt version
│   ├── /{id}/activate                   POST   ✓  Activate chat prompt
│   ├── /self-check                      GET    ✓  List self-check prompts
│   ├── /self-check                      POST   ✓  Create self-check prompt
│   ├── /self-check/active               GET    ✓  Get active self-check prompt
│   ├── /self-check/{id}                 GET    ✓  Get self-check prompt version
│   └── /self-check/{id}/activate        POST   ✓  Activate self-check prompt
│
├── /embeddings
│   ├── /models                          GET    ✓  List all models
│   ├── /models/{name}                   GET    ✓  Get model details
│   ├── /providers                       GET    ✓  List providers
│   └── /providers/{provider}/models     GET    ✓  Get provider models
│
├── /llm
│   ├── /models                          GET    ✓  List LLM models
│   └── /providers                       GET    ✓  List LLM providers
│
├── /ollama
│   ├── /status                          GET    ✓  Check Ollama status
│   ├── /models                          GET    ✓  List all Ollama models
│   ├── /models/embeddings               GET    ✓  List embedding models
│   └── /models/llm                      GET    ✓  List LLM models
│
└── /settings
    ├── /                                GET    ✓  Get app settings
    ├── /                                PUT    ✓  Update settings
    ├── /reset                           POST   ✓  Reset to defaults
    └── /metadata                        GET    ✓  Get metadata (options)

Legend

✓ Standard CRUD operation
⚡ Optimized for polling/real-time updates
🤖 Uses LLM/AI processing
📊 Returns statistics/analytics

Data Flow Diagrams

Document Upload & Processing Flow

┌─────────────┐
│   Client    │
└──────┬──────┘
       │ POST /documents/
       │ (multipart/form-data)
       ├─ file
       ├─ knowledge_base_id
       └─ filename (optional)
       │
       ▼
┌──────────────────────┐
│   Document Upload    │
│    (documents.py)    │
└──────┬───────────────┘
       │ Returns: 201 Created
       │ {id, status: "pending", progress: 0}
       │
       ▼
┌──────────────────────┐
│  Background Task     │
│ (document_processor) │
└──────┬───────────────┘
       │
       ├─ 5%  "Loading document..."
       ├─ 15% "Preparing to chunk..."
       ├─ 30% "Chunking completed (N chunks)"
       ├─ 35% "Generating embeddings (0/N)"
       │      ├─ Batch 1 processed
       │      ├─ 48% "Generating embeddings (100/N)"
       │      ├─ Batch 2 processed
       │      └─ 62% "Generating embeddings (200/N)"
       ├─ 75% "Embeddings created (N)"
       ├─ 80% "Indexing in Qdrant..."
       ├─ 85% "Qdrant indexing completed"
       ├─ 90% "Indexing BM25..."
       ├─ 95% "BM25 indexing completed"
       └─ 100% "Completed"
       │
       ▼
┌──────────────────────┐
│   Status Polling     │
│ GET /documents/{id}/ │
│       status         │
└──────┬───────────────┘
       │ Poll every 1s
       │ Check progress_percentage
       │ Check processing_stage
       │
       ▼
  status = "completed"

Chat/RAG Query Flow

┌─────────────┐
│   Client    │
└──────┬──────┘
       │ POST /chat/
       │ {
       │   question,
       │   knowledge_base_id,
       │   conversation_id (optional),
       │   top_k, temperature, ...
       │ }
       ▼
┌──────────────────────┐
│   Chat Endpoint      │
│     (chat.py)        │
└──────┬───────────────┘
       │
       ├─ Load KB config
       ├─ Load/create conversation
       │
       ▼
┌──────────────────────┐
│  Retrieval Engine    │
│   (retrieval.py)     │
└──────┬───────────────┘
       │
       ├─ Generate query embedding
       │
       ├─ Retrieval Mode?
       │  ├─ Dense:
       │  │  └─ Qdrant vector search
       │  │
       │  └─ Hybrid:
       │     ├─ Qdrant vector search (dense)
       │     ├─ OpenSearch BM25 (lexical)
       │     └─ Merge + rerank results
       │
       ├─ Filter by score_threshold
       ├─ Apply MMR if enabled
       └─ Return top_k chunks
       │
       ▼
┌──────────────────────┐
│   LLM Generation     │
│   (assistant.py)     │
└──────┬───────────────┘
       │
       ├─ Build prompt with context
       ├─ Call LLM (OpenAI/Ollama)
       └─ Generate answer
       │
       ▼
┌──────────────────────┐
│  Save to Database    │
│ (conversation msgs)  │
└──────┬───────────────┘
       │
       ▼
┌─────────────┐
│   Client    │ Returns: {answer, sources, ...}
└─────────────┘

Knowledge Base Creation Flow

┌─────────────┐
│   Client    │
└──────┬──────┘
       │ POST /knowledge-bases/
       │ {name, embedding_model, chunking_strategy, ...}
       │
       ▼
┌──────────────────────┐
│   KB Endpoint        │
│ (knowledge_bases.py) │
└──────┬───────────────┘
       │
       ├─ Validate embedding model exists
       ├─ Get model dimension
       ├─ Generate collection name (kb_{hash})
       │
       ▼
┌──────────────────────┐
│   Create Collection  │
│   in Qdrant          │
└──────┬───────────────┘
       │ vector_size = embedding_dimension
       │ distance = cosine
       │
       ▼
┌──────────────────────┐
│  Save to Database    │
│  (PostgreSQL)        │
└──────┬───────────────┘
       │
       ▼
┌─────────────┐
│   Client    │ Returns: 201 Created {id, ...}
└─────────────┘

Request/Response Patterns

Pagination Pattern

Request:

GET /api/v1/documents/?page=2&page_size=10&knowledge_base_id=uuid

page_size default is 10, max is 100.

Response:

{
  "items": [...],
  "total": 150,
  "page": 2,
  "page_size": 10,
  "pages": 8
}

Filtering Pattern

Request:

GET /api/v1/documents/?status=processing&knowledge_base_id=uuid

Progress Tracking Pattern

Polling Loop (Client-side):

async function pollDocumentStatus(docId) {
  const interval = setInterval(async () => {
    const status = await fetch(`/api/v1/documents/${docId}/status`)
    const data = await status.json()

    // Update UI with progress
    updateProgressBar(data.progress_percentage)
    updateStatusText(data.processing_stage)

    // Stop when completed or failed
    if (data.status === 'completed' || data.status === 'failed') {
      clearInterval(interval)
    }
  }, 1000) // Poll every 1 second
}

Integration Points

External Services

Knowledge Base Platform
│
├─ OpenAI API
│  ├─ Embeddings (text-embedding-3-*)
│  └─ Chat Completions (gpt-4o, gpt-4o-mini)
│
├─ Voyage AI API
│  └─ Embeddings (voyage-4, voyage-code-3)
│
├─ Ollama (Local)
│  ├─ Embeddings (nomic-embed-text, mxbai-embed-large)
│  └─ Chat (llama3.1, mistral, qwen)
│
├─ Qdrant (Vector Store)
│  ├─ Collections management
│  ├─ Vector CRUD
│  └─ Similarity search
│
├─ OpenSearch (Lexical Store)
│  ├─ BM25 indexing
│  └─ Keyword search
│
└─ PostgreSQL (Metadata)
   ├─ Knowledge Bases
   ├─ Documents
   ├─ Conversations
   └─ Settings

Common Use Cases

1. Build a Documentation Search

# Create KB
KB=$(curl -X POST /api/v1/knowledge-bases/ -d '{"name":"Docs","chunking_strategy":"semantic"}' | jq -r .id)

# Upload docs
for file in docs/*.md; do
  curl -X POST /api/v1/documents/ -F file=@$file -F knowledge_base_id=$KB
done

# Query
curl -X POST /api/v1/chat/ -d '{"question":"How to install?","knowledge_base_id":"'$KB'"}'

2. Monitor Processing Progress

# Upload
DOC=$(curl -X POST /api/v1/documents/ -F file=@large.pdf -F knowledge_base_id=$KB | jq -r .id)

# Watch progress
watch -n 1 "curl -s /api/v1/documents/$DOC/status | jq '{progress:.progress_percentage,stage:.processing_stage}'"

3. Hybrid Search for Better Results

curl -X POST /api/v1/chat/ -d '{
  "question": "specific technical term",
  "knowledge_base_id": "'$KB'",
  "retrieval_mode": "hybrid",
  "hybrid_dense_weight": 0.6,
  "hybrid_lexical_weight": 0.4,
  "lexical_top_k": 15,
  "top_k": 5
}'

4. Conversation with Context

# First message
CONV=$(curl -X POST /api/v1/chat/ -d '{
  "question": "What is Python?",
  "knowledge_base_id": "'$KB'"
}' | jq -r .conversation_id)

# Follow-up (with context)
curl -X POST /api/v1/chat/ -d '{
  "question": "How do I install it?",
  "knowledge_base_id": "'$KB'",
  "conversation_id": "'$CONV'"
}'

Performance Tips

1. Batch Document Uploads

Upload multiple documents without waiting for each to complete.

2. Use Appropriate Chunk Sizes

Small docs (< 10 pages): 500-800 chars
Medium docs (10-50 pages): 800-1200 chars
Large docs (50+ pages): 1200-2000 chars

3. Choose Right Retrieval Mode

Dense only: General questions, semantic similarity
Hybrid: Technical terms, specific phrases, entity names

4. Optimize Polling

Poll every 1s during processing
Stop polling when status = completed/failed
Use status endpoint (lighter than full document GET)

5. Conversation Settings

Save settings per conversation
Reuse conversation_id for multi-turn chats
Use lower temperature (0.5) for factual answers

Last Updated: 2026-02-05

📝 Questions? Open an issue | 🌟 Like it? Star the repo | 📖 API Docs: Swagger UI

📚 Documentation

Getting Started

🏠 Home

API Reference

Operations

🔧 Troubleshooting Guide

Links

Version: v1.0
Updated: 2026-02-08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Map

API Map - Visual Structure

Legend

Data Flow Diagrams

Document Upload & Processing Flow

Chat/RAG Query Flow

Knowledge Base Creation Flow

Request/Response Patterns

Pagination Pattern

Filtering Pattern

Progress Tracking Pattern

Integration Points

External Services

Common Use Cases

1. Build a Documentation Search

2. Monitor Processing Progress

3. Hybrid Search for Better Results

4. Conversation with Context

Performance Tips

1. Batch Document Uploads

2. Use Appropriate Chunk Sizes

3. Choose Right Retrieval Mode

4. Optimize Polling

5. Conversation Settings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

📚 Documentation

Getting Started

API Reference

Operations

Links

Clone this wiki locally