-
Notifications
You must be signed in to change notification settings - Fork 1
API Map
loglux edited this page Feb 6, 2026
·
5 revisions
Base URL: http://localhost:8004/api/v1
/api/v1
│
├── /health GET ✓ Check API health
├── /ready GET ✓ Check dependencies
├── /info GET ✓ Get API info
│
├── /knowledge-bases
│ ├── / POST ✓ Create KB
│ ├── / GET ✓ List KBs (paginated)
│ ├── /{kb_id} GET ✓ Get KB details
│ ├── /{kb_id} PUT ✓ Update KB
│ ├── /{kb_id}/retrieval-settings GET ✓ Get KB retrieval settings
│ ├── /{kb_id}/retrieval-settings PUT ✓ Update KB retrieval settings
│ ├── /{kb_id}/retrieval-settings DELETE ✓ Clear KB retrieval settings
│ ├── /{kb_id} DELETE ✓ Delete KB (soft)
│ ├── /{kb_id}/reprocess POST ✓ Reprocess all docs
│ ├── /{kb_id}/regenerate_chat_titles POST 🤖 Regenerate chat titles
│ └── /{kb_id}/cleanup-orphaned-chunks POST ✓ Clean orphans
│
├── /documents
│ ├── / POST ✓ Upload document (multipart/form-data)
│ ├── / GET ✓ List documents (paginated, filterable)
│ ├── /{doc_id} GET ✓ Get document + content
│ ├── /{doc_id} DELETE ✓ Delete document + vectors
│ ├── /{doc_id}/status GET ⚡ Get processing status (optimized for polling)
│ ├── /{doc_id}/reprocess POST ✓ Reprocess document
│ ├── /{doc_id}/analyze POST 🤖 Analyze structure (LLM)
│ ├── /{doc_id}/structure/apply POST ✓ Apply structure
│ └── /{doc_id}/structure GET ✓ Get structure
│
├── /chat
│ ├── / POST 🤖 Query KB (RAG)
│ ├── /knowledge-bases/{kb_id}/stats GET ✓ Get chat stats
│ ├── /conversations GET ✓ List conversations
│ ├── /conversations/{id} GET ✓ Get conversation
│ ├── /conversations/{id} PATCH ✓ Update conversation (title)
│ ├── /conversations/{id} DELETE ✓ Delete conversation
│ ├── /conversations/{id}/settings PATCH ✓ Update settings
│ └── /conversations/{id}/messages GET ✓ Get messages
│
├── /retrieve
│ └── / POST ✓ Retrieve chunks (no LLM)
│
├── /prompts
│ ├── / GET ✓ List chat prompt versions
│ ├── / POST ✓ Create chat prompt version
│ ├── /active GET ✓ Get active chat prompt
│ ├── /{id} GET ✓ Get chat prompt version
│ ├── /{id}/activate POST ✓ Activate chat prompt
│ ├── /self-check GET ✓ List self-check prompts
│ ├── /self-check POST ✓ Create self-check prompt
│ ├── /self-check/active GET ✓ Get active self-check prompt
│ ├── /self-check/{id} GET ✓ Get self-check prompt version
│ └── /self-check/{id}/activate POST ✓ Activate self-check prompt
│
├── /embeddings
│ ├── /models GET ✓ List all models
│ ├── /models/{name} GET ✓ Get model details
│ ├── /providers GET ✓ List providers
│ └── /providers/{provider}/models GET ✓ Get provider models
│
├── /llm
│ ├── /models GET ✓ List LLM models
│ └── /providers GET ✓ List LLM providers
│
├── /ollama
│ ├── /status GET ✓ Check Ollama status
│ ├── /models GET ✓ List all Ollama models
│ ├── /models/embeddings GET ✓ List embedding models
│ └── /models/llm GET ✓ List LLM models
│
└── /settings
├── / GET ✓ Get app settings
├── / PUT ✓ Update settings
├── /reset POST ✓ Reset to defaults
└── /metadata GET ✓ Get metadata (options)
- ✓ Standard CRUD operation
- ⚡ Optimized for polling/real-time updates
- 🤖 Uses LLM/AI processing
- 📊 Returns statistics/analytics
┌─────────────┐
│ Client │
└──────┬──────┘
│ POST /documents/
│ (multipart/form-data)
├─ file
├─ knowledge_base_id
└─ filename (optional)
│
▼
┌──────────────────────┐
│ Document Upload │
│ (documents.py) │
└──────┬───────────────┘
│ Returns: 201 Created
│ {id, status: "pending", progress: 0}
│
▼
┌──────────────────────┐
│ Background Task │
│ (document_processor) │
└──────┬───────────────┘
│
├─ 5% "Loading document..."
├─ 15% "Preparing to chunk..."
├─ 30% "Chunking completed (N chunks)"
├─ 35% "Generating embeddings (0/N)"
│ ├─ Batch 1 processed
│ ├─ 48% "Generating embeddings (100/N)"
│ ├─ Batch 2 processed
│ └─ 62% "Generating embeddings (200/N)"
├─ 75% "Embeddings created (N)"
├─ 80% "Indexing in Qdrant..."
├─ 85% "Qdrant indexing completed"
├─ 90% "Indexing BM25..."
├─ 95% "BM25 indexing completed"
└─ 100% "Completed"
│
▼
┌──────────────────────┐
│ Status Polling │
│ GET /documents/{id}/ │
│ status │
└──────┬───────────────┘
│ Poll every 1s
│ Check progress_percentage
│ Check processing_stage
│
▼
status = "completed"
┌─────────────┐
│ Client │
└──────┬──────┘
│ POST /chat/
│ {
│ question,
│ knowledge_base_id,
│ conversation_id (optional),
│ top_k, temperature, ...
│ }
▼
┌──────────────────────┐
│ Chat Endpoint │
│ (chat.py) │
└──────┬───────────────┘
│
├─ Load KB config
├─ Load/create conversation
│
▼
┌──────────────────────┐
│ Retrieval Engine │
│ (retrieval.py) │
└──────┬───────────────┘
│
├─ Generate query embedding
│
├─ Retrieval Mode?
│ ├─ Dense:
│ │ └─ Qdrant vector search
│ │
│ └─ Hybrid:
│ ├─ Qdrant vector search (dense)
│ ├─ OpenSearch BM25 (lexical)
│ └─ Merge + rerank results
│
├─ Filter by score_threshold
├─ Apply MMR if enabled
└─ Return top_k chunks
│
▼
┌──────────────────────┐
│ LLM Generation │
│ (assistant.py) │
└──────┬───────────────┘
│
├─ Build prompt with context
├─ Call LLM (OpenAI/Ollama)
└─ Generate answer
│
▼
┌──────────────────────┐
│ Save to Database │
│ (conversation msgs) │
└──────┬───────────────┘
│
▼
┌─────────────┐
│ Client │ Returns: {answer, sources, ...}
└─────────────┘
┌─────────────┐
│ Client │
└──────┬──────┘
│ POST /knowledge-bases/
│ {name, embedding_model, chunking_strategy, ...}
│
▼
┌──────────────────────┐
│ KB Endpoint │
│ (knowledge_bases.py) │
└──────┬───────────────┘
│
├─ Validate embedding model exists
├─ Get model dimension
├─ Generate collection name (kb_{hash})
│
▼
┌──────────────────────┐
│ Create Collection │
│ in Qdrant │
└──────┬───────────────┘
│ vector_size = embedding_dimension
│ distance = cosine
│
▼
┌──────────────────────┐
│ Save to Database │
│ (PostgreSQL) │
└──────┬───────────────┘
│
▼
┌─────────────┐
│ Client │ Returns: 201 Created {id, ...}
└─────────────┘
Request:
GET /api/v1/documents/?page=2&page_size=10&knowledge_base_id=uuid
page_size default is 10, max is 100.
Response:
{
"items": [...],
"total": 150,
"page": 2,
"page_size": 10,
"pages": 8
}Request:
GET /api/v1/documents/?status=processing&knowledge_base_id=uuid
Polling Loop (Client-side):
async function pollDocumentStatus(docId) {
const interval = setInterval(async () => {
const status = await fetch(`/api/v1/documents/${docId}/status`)
const data = await status.json()
// Update UI with progress
updateProgressBar(data.progress_percentage)
updateStatusText(data.processing_stage)
// Stop when completed or failed
if (data.status === 'completed' || data.status === 'failed') {
clearInterval(interval)
}
}, 1000) // Poll every 1 second
}Knowledge Base Platform
│
├─ OpenAI API
│ ├─ Embeddings (text-embedding-3-*)
│ └─ Chat Completions (gpt-4o, gpt-4o-mini)
│
├─ Voyage AI API
│ └─ Embeddings (voyage-4, voyage-code-3)
│
├─ Ollama (Local)
│ ├─ Embeddings (nomic-embed-text, mxbai-embed-large)
│ └─ Chat (llama3.1, mistral, qwen)
│
├─ Qdrant (Vector Store)
│ ├─ Collections management
│ ├─ Vector CRUD
│ └─ Similarity search
│
├─ OpenSearch (Lexical Store)
│ ├─ BM25 indexing
│ └─ Keyword search
│
└─ PostgreSQL (Metadata)
├─ Knowledge Bases
├─ Documents
├─ Conversations
└─ Settings
# Create KB
KB=$(curl -X POST /api/v1/knowledge-bases/ -d '{"name":"Docs","chunking_strategy":"semantic"}' | jq -r .id)
# Upload docs
for file in docs/*.md; do
curl -X POST /api/v1/documents/ -F file=@$file -F knowledge_base_id=$KB
done
# Query
curl -X POST /api/v1/chat/ -d '{"question":"How to install?","knowledge_base_id":"'$KB'"}'# Upload
DOC=$(curl -X POST /api/v1/documents/ -F file=@large.pdf -F knowledge_base_id=$KB | jq -r .id)
# Watch progress
watch -n 1 "curl -s /api/v1/documents/$DOC/status | jq '{progress:.progress_percentage,stage:.processing_stage}'"curl -X POST /api/v1/chat/ -d '{
"question": "specific technical term",
"knowledge_base_id": "'$KB'",
"retrieval_mode": "hybrid",
"hybrid_dense_weight": 0.6,
"hybrid_lexical_weight": 0.4,
"lexical_top_k": 15,
"top_k": 5
}'# First message
CONV=$(curl -X POST /api/v1/chat/ -d '{
"question": "What is Python?",
"knowledge_base_id": "'$KB'"
}' | jq -r .conversation_id)
# Follow-up (with context)
curl -X POST /api/v1/chat/ -d '{
"question": "How do I install it?",
"knowledge_base_id": "'$KB'",
"conversation_id": "'$CONV'"
}'Upload multiple documents without waiting for each to complete.
- Small docs (< 10 pages): 500-800 chars
- Medium docs (10-50 pages): 800-1200 chars
- Large docs (50+ pages): 1200-2000 chars
- Dense only: General questions, semantic similarity
- Hybrid: Technical terms, specific phrases, entity names
- Poll every 1s during processing
- Stop polling when status = completed/failed
- Use status endpoint (lighter than full document GET)
- Save settings per conversation
- Reuse conversation_id for multi-turn chats
- Use lower temperature (0.5) for factual answers
Last Updated: 2026-02-05
📝 Questions? Open an issue | 🌟 Like it? Star the repo | 📖 API Docs: Swagger UI
Version: v1.0
Updated: 2026-02-08