An AI-powered clinical document analysis platform using RAG (Retrieval-Augmented Generation), hybrid search, and intelligent reranking for evidence-based medical question answering. Upload clinical documents (PDF, DOCX, or TXT) and ask questions in natural language — powered by any OpenAI-compatible LLM endpoint, Groq, OpenRouter, or a locally running Ollama model.
- ClinIQ — Clinical Q&A AI Assistant
ClinIQ is an intelligent clinical question-answering platform that transforms uploaded medical documents into a searchable knowledge base using advanced RAG techniques. Healthcare professionals can ask questions in natural language and receive accurate, evidence-based answers with source citations.
This makes ClinIQ suitable for:
- Enterprise deployments — connect to a GenAI Gateway or any managed LLM API
- Air-gapped environments — run fully offline with Ollama and a locally hosted model
- Local experimentation — quick setup on a laptop with GPU-accelerated inference
- Multi-provider flexibility — switch between OpenAI, Groq, OpenRouter, Ollama, or custom endpoints
-
Document Upload: Users upload clinical documents (PDF, DOCX, or TXT) through the web interface. The system validates file formats and initiates background processing.
-
Intelligent Processing: Documents are extracted, chunked using semantic boundaries (800 tokens with 150 token overlap), and converted to vector embeddings using the configured embedding model.
-
Hybrid Search: When users ask questions, ClinIQ employs a dual-search strategy combining dense vector search (semantic similarity) and sparse BM25 search (keyword matching), fused using Reciprocal Rank Fusion (RRF) for optimal retrieval.
-
Intelligent Reranking: Retrieved chunks are reranked using cosine similarity with the query embedding to ensure the most relevant context is prioritized.
-
Answer Generation: The top-ranked context is fed to the configured LLM with a carefully designed prompt that enforces evidence-based reasoning, includes source citations, and displays step-by-step thinking when enabled.
The platform stores embeddings in ChromaDB for fast retrieval and supports real-time streaming responses for a responsive user experience. All answers include citations linking back to source documents, ensuring clinical traceability.
This application uses a modern microservices architecture with a React frontend, Flask REST API backend, and ChromaDB vector database. The RAG pipeline implements hybrid search combining dense and sparse retrieval methods, followed by intelligent reranking for optimal context selection. The LLM layer is fully pluggable — any OpenAI-compatible remote endpoint, Groq, OpenRouter, or a locally running Ollama instance can be used via environment configuration.
graph TB
subgraph "Client Layer (port 3000)"
A[React Web UI]
A1[Document Upload]
A2[Query Interface]
A3[Real-time Streaming]
end
subgraph "Backend Layer (port 5000)"
B[Flask REST API]
C[RAG Pipeline]
H[Document Processor]
end
subgraph "Search & Retrieval"
D[Dense Search<br/>Vector Similarity]
E[Sparse Search<br/>BM25 Keyword]
F[Hybrid Fusion<br/>RRF Algorithm]
G[Reranker<br/>Cosine Similarity]
end
subgraph "Processing Pipeline"
I[Text Extractor<br/>PDF/DOCX/TXT]
J[Semantic Chunker<br/>tiktoken]
K[Embedding Generator]
end
subgraph "Storage Layer"
L[(ChromaDB<br/>Vector Database)]
M[(File Storage<br/>uploads/)]
end
subgraph "LLM Inference - Option A: Cloud APIs"
N1[OpenAI API]
N2[Groq API]
N3[OpenRouter API]
end
subgraph "LLM Inference - Option B: Local"
O[Ollama<br/>localhost:11434]
end
A1 --> B
A2 --> B
B --> C
B --> H
H --> I
I --> J
J --> K
K -->|Store Embeddings| L
B -->|Save File| M
C -->|Retrieve| D
C -->|Retrieve| E
D --> L
E --> L
D --> F
E --> F
F --> G
G -->|Top Chunks| C
C -->|LLM_PROVIDER=openai| N1
C -->|LLM_PROVIDER=groq| N2
C -->|LLM_PROVIDER=openrouter| N3
C -->|LLM_PROVIDER=ollama| O
K -->|Embedding Request| N1
N1 -->|Streaming Answer| C
N2 -->|Streaming Answer| C
N3 -->|Streaming Answer| C
O -->|Streaming Answer| C
C -->|SSE Stream| B
B -->|Real-time Updates| A3
style A fill:#61dafb
style B fill:#000000,color:#fff
style C fill:#ff6b6b
style D fill:#4ecdc4
style E fill:#4ecdc4
style F fill:#95e1d3
style G fill:#95e1d3
style H fill:#f38181
style I fill:#aa96da
style J fill:#aa96da
style K fill:#aa96da
style L fill:#feca57
style M fill:#feca57
style N1 fill:#10a37f
style N2 fill:#10a37f
style N3 fill:#10a37f
style O fill:#f3e5f5
| Service | Container | Host Port | Description |
|---|---|---|---|
backend |
backend |
5000 |
Flask REST API — document processing, RAG pipeline orchestration, streaming responses |
frontend |
frontend |
3000 |
React UI — document upload with drag-and-drop, real-time chat, streaming responses, citations |
Core Components:
-
React Web UI (Port 3000) - Document upload with drag-and-drop, real-time query interface with streaming responses, chat history with syntax-highlighted citations, and thinking process visualization
-
Flask REST API (Port 5000) - API routing and request validation, orchestrates document processing pipeline, manages ChromaDB connections and operations, streams responses via Server-Sent Events (SSE), implements background processing for uploads
-
RAG Pipeline - Query rewriting with conversation context, hybrid search with RRF fusion, cosine similarity reranking, answer generation with configured LLM, thinking and answer section parsing, source citation generation
-
Search & Retrieval System:
- Dense Search: Vector similarity using embeddings for semantic matching
- Sparse Search: BM25 algorithm for keyword-based retrieval
- Hybrid Fusion: Reciprocal Rank Fusion (RRF) combines both methods
- Reranker: Cosine similarity reranking for final context selection
-
Document Processing Pipeline:
- Text Extractor: Supports PDF (PyPDF2), DOCX (python-docx), and TXT
- Semantic Chunker: Uses tiktoken for token-aware chunking (800 tokens, 150 overlap)
- Embedding Generator: Creates embeddings via configured embedding model
-
ChromaDB - Persistent vector database storing document embeddings, chunk metadata (source, page numbers, chunk IDs), and BM25 sparse indexes for hybrid search
-
File Storage - Manages uploaded document files in
uploads/directory -
LLM Inference - Pluggable inference layer supporting OpenAI, Groq, Ollama, OpenRouter, and custom OpenAI-compatible APIs
- User uploads clinical document (PDF/DOCX/TXT) via web UI
- Backend saves file and initiates background processing
- Document processor extracts text and creates semantic chunks
- Embedding generator creates vector embeddings for each chunk
- Embeddings and metadata stored in ChromaDB with BM25 index
- User submits natural language query
- Query is embedded and sent to hybrid search system
- Dense search finds semantically similar chunks via vector similarity
- Sparse search finds keyword-matching chunks via BM25
- RRF algorithm fuses results from both methods
- Reranker applies cosine similarity to prioritize best chunks
- Top context is sent to configured LLM with system prompt
- AI generates answer with thinking process and citations
- Response streams back to user in real-time via SSE
- Citations link to specific source documents and pages
Before you begin, ensure you have the following installed and configured:
- Docker and Docker Compose (v2)
- An LLM provider — one of:
- OpenAI: Get API Key
- Groq: Get API Key
- OpenRouter: Get API Key
- Ollama installed natively (no API key needed)
- Any custom OpenAI-compatible API endpoint
# Check Docker
docker --version
docker compose version
# Verify Docker is running
docker psgit clone https://github.com/cld2labs/ClinIQ.git
cd ClinIQ# Copy the example environment file
cp backend/.env.example backend/.envOpen backend/.env and configure your LLM provider. See LLM Provider Configuration for detailed per-provider instructions.
Example for OpenAI:
LLM_PROVIDER=openai
LLM_API_KEY=sk-your-api-key-here
LLM_BASE_URL=https://api.openai.com/v1
LLM_CHAT_MODEL=gpt-3.5-turbo
LLM_EMBEDDING_MODEL=text-embedding-3-smallExample for Ollama:
LLM_PROVIDER=ollama
LLM_BASE_URL=http://localhost:11434/v1
LLM_CHAT_MODEL=qwen2.5:7b
LLM_EMBEDDING_MODEL=nomic-embed-text
# LLM_API_KEY not needed for Ollama# Standard (attached)
docker compose up --build
# Detached (background)
docker compose up -d --buildOnce containers are running:
- Frontend UI: http://localhost:3000
- Backend API: http://localhost:5000
- Health Check: http://localhost:5000/api/health
# Health check
curl http://localhost:5000/api/health
# View running containers
docker compose psView logs:
# All services
docker compose logs -f
# Backend only
docker compose logs -f backend
# Frontend only
docker compose logs -f frontenddocker compose downFor developers who want to run services locally without Docker
Backend (Python / Flask)
cd backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your LLM provider settings
# Start backend
python api.pyBackend will run on http://localhost:5000
Frontend (Node / Vite)
cd frontend
# Install dependencies
npm install
# Start frontend
npm run devFrontend will run on http://localhost:3000
Note: The frontend Vite proxy automatically forwards /api/* requests to http://localhost:5000, so no additional configuration is needed for local development.
ClinIQ/
├── backend/ # Backend Flask Application
│ ├── api.py # Main Flask REST API server
│ │ # - 7 API endpoints
│ │ # - Background document processing
│ │ # - SSE streaming support
│ │ # - Health checks and status
│ │
│ ├── config.py # Multi-provider LLM configuration
│ │ # - LLM_PROVIDER selection
│ │ # - API key management
│ │ # - Base URL configuration
│ │ # - Model selection
│ │ # - Generation parameters
│ │
│ ├── utils/ # Core backend utilities
│ │ ├── __init__.py
│ │ │
│ │ ├── constants.py # Model configuration constants
│ │ │
│ │ ├── document_processor.py # Document processing
│ │ │ # - PDF extraction (PyPDF2)
│ │ │ # - DOCX extraction (python-docx)
│ │ │ # - Semantic chunking (tiktoken)
│ │ │ # - Embedding creation
│ │ │
│ │ ├── rag_pipeline.py # RAG pipeline implementation
│ │ │ # - Query rewriting
│ │ │ # - Context retrieval & citations
│ │ │ # - Answer generation (streaming)
│ │ │ # - Thinking/answer parsing
│ │ │
│ │ └── vector_store.py # Search & storage
│ │ # - ChromaDB operations
│ │ # - Dense search (semantic)
│ │ # - Sparse search (BM25)
│ │ # - Hybrid search (RRF fusion)
│ │ # - Reranking (cosine similarity)
│ │
│ ├── .env.example # Environment variable template
│ │ # - Multi-provider configuration
│ │ # - All supported variables
│ │
│ ├── requirements.txt # Python dependencies
│ └── Dockerfile # Backend container configuration
│
├── frontend/ # React + Vite Frontend Application
│ ├── src/
│ │ ├── components/
│ │ │ ├── DocumentUpload.jsx # File upload with drag-and-drop
│ │ │ │ # - Multi-file support
│ │ │ │ # - Progress tracking
│ │ │ │ # - File validation
│ │ │ │
│ │ │ ├── ChatInterface.jsx # Chat UI
│ │ │ │ # - Message display
│ │ │ │ # - Real-time streaming
│ │ │ │ # - Thinking process display
│ │ │ │ # - Citation rendering
│ │ │ │
│ │ │ └── layout/
│ │ │ ├── Header.jsx # App header with logo
│ │ │ ├── Footer.jsx # Footer with tech info
│ │ │ └── Layout.jsx # Main layout wrapper
│ │ │
│ │ ├── pages/
│ │ │ ├── Home.jsx # Landing page
│ │ │ └── Chat.jsx # Main chat page
│ │ │ # - State management
│ │ │ # - Document status polling
│ │ │ # - Upload handling
│ │ │
│ │ └── services/
│ │ └── api.js # API service layer
│ │ # - uploadDocument()
│ │ # - queryDocuments() with SSE
│ │ # - getStatus()
│ │ # - clearDocuments()
│ │
│ ├── package.json # npm dependencies
│ ├── vite.config.js # Vite configuration (proxy)
│ ├── tailwind.config.js # TailwindCSS configuration
│ └── Dockerfile # Frontend container configuration
│
├── docker-compose.yml # Service orchestration
│ # - Frontend service (port 3000)
│ # - Backend service (port 5000)
│ # - Volume mounts (data, uploads)
│
├── .chromadb/ # ChromaDB persistent storage (gitignored)
│ └── [vector database files] # - Document embeddings
│ # - Metadata & indexes
│
├── uploads/ # Uploaded document files (gitignored)
│ └── [user-uploaded files] # - PDF, DOCX, TXT files
│
├── Docs/ # Project documentation
│ ├── DOCKER_SETUP.md
│ ├── PROJECT_DOCUMENTATION.md
│ ├── QUICKSTART.md
│ └── assets/
│
├── README.md # Project documentation (this file)
├── CONTRIBUTING.md # Contribution guidelines
├── TROUBLESHOOTING.md # Troubleshooting guide
├── SECURITY.md # Security policy
├── LICENSE.md # MIT License
└── DISCLAIMER.md # Usage disclaimer
-
Open the Application
- Navigate to
http://localhost:3000
- Navigate to
-
Upload Clinical Documents
- Click "Upload Document" or drag-and-drop files
- Supported formats: PDF, DOCX, TXT
- Multiple files can be uploaded
- Wait for processing to complete (status shows "processed")
-
Ask Questions
- Type your clinical question in the chat input
- Examples:
- "What are the contraindications for this medication?"
- "What are the recommended dosage guidelines?"
- "What side effects should I monitor?"
- "What are the drug interactions?"
- "What is the mechanism of action?"
-
Review Answers
- Read the AI-generated answer with context
- Review the thinking process (if enabled)
- Check source citations linking to specific documents
- Citations include document name and chunk information
-
Manage Documents
- View current document count in status area
- Clear all documents using the "Clear Documents" button
- Re-upload documents as needed for new analysis
Hybrid Search
- Combines semantic search (meaning-based) with keyword search (BM25)
- Uses Reciprocal Rank Fusion to merge results
- Provides more comprehensive retrieval than either method alone
- Best for complex queries with specific terms
- Configurable via UI toggle or environment variable
Reranking
- Applies cosine similarity to reorder retrieved chunks
- Prioritizes chunks most relevant to the query
- Improves answer quality by focusing on best context
- Slight performance overhead but better accuracy
- Configurable via UI toggle or environment variable
Thinking Mode
- Shows AI's reasoning process before the answer
- Useful for understanding how the AI reached conclusions
- Helps verify evidence-based reasoning
- Can be toggled on/off in configuration
Conversation History
- Previous queries and answers are maintained in session
- Context from prior conversation used for query rewriting
- Enables follow-up questions and clarifications
- Cleared when page refreshes or documents are cleared
-
Document Quality
- Upload well-formatted documents with clear text
- Avoid scanned images without OCR
- Use PDF or DOCX for best extraction results
-
Query Formulation
- Be specific and detailed in your questions
- Include relevant clinical terms
- Reference specific conditions or medications when applicable
-
Answer Verification
- Always check source citations
- Verify answers against original documents
- Consult healthcare professionals for critical decisions
-
Performance
- Process documents before starting queries
- Enable hybrid search for comprehensive results
- Use reranking for higher quality (with slight latency trade-off)
The table below compares inference performance across different providers, deployment modes, and hardware profiles using a standardized clinical Q&A workload (averaged over 3 runs).
| Provider | Model | Deployment | Context Window | Avg Input Tokens | Avg Output Tokens | Avg Tokens / Request | P50 Latency (ms) | P95 Latency (ms) | Throughput (req/s) | Hardware |
|---|---|---|---|---|---|---|---|---|---|---|
| vLLM | meta-llama/Llama-3.2-3B-Instruct + BAAI/bge-base-en-v1.5 |
Local | 8.1K | 1223.8 | 361.73 | 1585.53 | 28,553 | 62,149 | 0.033 | Apple Silicon Metal (Macbook Pro M4) |
| Intel OPEA EI | meta-llama/Llama-3.2-3B-Instruct + BAAI/bge-base-en-v1.5 |
CPU (Xeon) | 8.1K | 1195.80 | 141.27 | 1337.07 | 4,389.48 | 11,188.87 | 0.183 | CPU only |
| Cloud LLM | gpt-4o-mini + text-embedding-3-small |
API (Cloud) | 128K | 1173.80 | 88.33 | 1262.13 | 2,846.13 | 4200.51 | 0.359 | N/A |
Notes:
- All benchmarks use the same ClinIQ RAG pipeline with hybrid search. Token counts may vary slightly per run due to non-deterministic model output and query complexity.
- vLLM on Apple Silicon uses Metal (MPS) GPU acceleration — running it inside Docker would fall back to CPU-only inference.
- Intel OPEA Enterprise Inference runs on Intel Xeon CPUs without GPU acceleration.
A 3-billion-parameter open-weight instruction-tuned model from Meta (September 2024 release), optimized for on-prem and edge deployment.
| Attribute | Details |
|---|---|
| Parameters | 3.2B total |
| Architecture | Transformer with Grouped Query Attention (GQA) |
| Context Window | 8,192 tokens (8K) native |
| Reasoning Mode | Standard instruction-following |
| Tool / Function Calling | Supported via structured prompts |
| Structured Output | JSON-structured responses supported |
| Multilingual | English-focused with multilingual capabilities |
| Benchmarks | MMLU: 63.4%, GSM8K: 75.7%, HumanEval: 58.5% |
| Quantization Formats | GGUF, AWQ (int4), GPTQ (int4), MLX |
| Inference Runtimes | Ollama, vLLM, llama.cpp, LMStudio, TGI (Text Generation Inference) |
| Fine-Tuning | Full fine-tuning and adapter-based (LoRA); community adapters available |
| License | Llama 3.2 Community License (permits commercial use with conditions) |
| Deployment | Local, on-prem, air-gapped, cloud — full data sovereignty |
OpenAI's cost-efficient multimodal model, accessible exclusively via cloud API.
| Attribute | Details |
|---|---|
| Parameters | Not publicly disclosed |
| Architecture | Multimodal Transformer (text + image input, text output) |
| Context Window | 128,000 tokens input / 16,384 tokens max output |
| Reasoning Mode | Standard inference (no explicit chain-of-thought toggle) |
| Tool / Function Calling | Supported; parallel function calling |
| Structured Output | JSON mode and strict JSON schema adherence supported |
| Multilingual | Broad multilingual support |
| Benchmarks | MMLU: ~87%, strong HumanEval and MBPP scores |
| Pricing | $0.15 / 1M input tokens, $0.60 / 1M output tokens (Batch API: 50% discount) |
| Fine-Tuning | Supervised fine-tuning via OpenAI API |
| License | Proprietary (OpenAI Terms of Use) |
| Deployment | Cloud-only — OpenAI API or Azure OpenAI Service. No self-hosted or on-prem option |
| Knowledge Cutoff | October 2023 |
| Capability | Meta-Llama-3.2-3B-Instruct | GPT-4o-mini |
|---|---|---|
| Clinical Q&A with RAG | Yes | Yes |
| Function / tool calling | Yes | Yes |
| JSON structured output | Yes | Yes |
| On-prem / air-gapped deployment | Yes | No |
| Data sovereignty | Full (weights run locally) | No (data sent to cloud API) |
| Open weights | Yes (Llama 3.2 Community License) | No (proprietary) |
| Custom fine-tuning | Full fine-tuning + LoRA adapters | Supervised fine-tuning (API only) |
| Quantization for edge devices | GGUF / AWQ / GPTQ / MLX | N/A |
| Multimodal (image input) | No | Yes |
| Native context window | 8K | 128K |
Both models support clinical Q&A with RAG, function calling, and JSON-structured output. However, only Meta-Llama-3.2-3B offers open weights, data sovereignty, and local deployment flexibility — making it suitable for air-gapped, regulated, or cost-sensitive clinical environments. GPT-4o-mini offers lower latency and higher throughput via OpenAI's cloud infrastructure, with added multimodal capabilities and larger context window.
ClinIQ supports five LLM providers via environment configuration in backend/.env. All providers are configured via the same set of variables — switching requires only updating the .env file.
Best for: High-quality embeddings and chat responses
LLM_PROVIDER=openai
LLM_API_KEY=sk-your-api-key-here
LLM_BASE_URL=https://api.openai.com/v1
LLM_CHAT_MODEL=gpt-3.5-turbo
LLM_EMBEDDING_MODEL=text-embedding-3-small- Get API Key: https://platform.openai.com/account/api-keys
- Recommended Models:
- Chat:
gpt-3.5-turbo,gpt-4,gpt-4-turbo,gpt-4o - Embeddings:
text-embedding-3-small,text-embedding-3-large
- Chat:
- Pricing: Pay-per-use (check OpenAI Pricing)
Best for: Fast inference with competitive pricing
LLM_PROVIDER=groq
LLM_API_KEY=gsk_your-groq-api-key
LLM_BASE_URL=https://api.groq.com/openai/v1
LLM_CHAT_MODEL=llama-3.2-90b-text-preview
LLM_EMBEDDING_MODEL=text-embedding-3-small # Falls back to OpenAI- Get API Key: https://console.groq.com/
- Recommended Models:
llama-3.2-90b-text-previewllama-3.1-70b-versatilemixtral-8x7b-32768
- Note: Groq doesn't provide embeddings; falls back to OpenAI for embeddings
Best for: Private, local deployment with no API costs
LLM_PROVIDER=ollama
LLM_BASE_URL=http://localhost:11434/v1
LLM_CHAT_MODEL=qwen2.5:7b
LLM_EMBEDDING_MODEL=nomic-embed-text
# LLM_API_KEY not required for OllamaSetup:
- Install Ollama: https://ollama.com/download
- Pull models:
# Chat models ollama pull qwen2.5:7b ollama pull llama3.1:8b ollama pull llama3.2:3b ollama pull mistral:7b # Embedding model ollama pull nomic-embed-text
- Verify Ollama is running:
curl http://localhost:11434/api/tags
Recommended Models:
- Chat:
qwen2.5:7b,llama3.1:8b,llama3.2:3b,mistral:7b - Embeddings:
nomic-embed-text
Note: Run Ollama natively on the host (not in Docker) for best GPU acceleration
Best for: Access to multiple models through single API
LLM_PROVIDER=openrouter
LLM_API_KEY=sk-or-v1-your-openrouter-key
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_CHAT_MODEL=anthropic/claude-3.5-sonnet
LLM_EMBEDDING_MODEL=text-embedding-3-small # Falls back to OpenAI- Get API Key: https://openrouter.ai/keys
- Recommended Models:
anthropic/claude-3.5-sonnetgoogle/gemini-pro-1.5meta-llama/llama-3.1-70b-instruct
- Note: OpenRouter doesn't provide embeddings; falls back to OpenAI for embeddings
Best for: Enterprise deployments with custom endpoints
LLM_PROVIDER=custom
LLM_API_KEY=your-custom-api-key
LLM_BASE_URL=https://your-custom-endpoint.com/v1
LLM_CHAT_MODEL=your-model-name
LLM_EMBEDDING_MODEL=your-embedding-model-nameAny enterprise gateway that exposes an OpenAI-compatible /v1/chat/completions and /v1/embeddings endpoint works without code changes.
- Edit
backend/.envwith the new provider's values - Restart the application:
docker compose restart backend
No rebuild is needed — all settings are injected at runtime via environment variables.
All variables are defined in backend/.env (copied from backend/.env.example). The backend reads them at startup via the config.py module.
| Variable | Description | Default | Type |
|---|---|---|---|
LLM_PROVIDER |
Provider selection: openai, groq, ollama, openrouter, custom |
openai |
string |
LLM_API_KEY |
API key for the selected provider (not needed for Ollama) | - | string |
LLM_BASE_URL |
Base URL of the LLM API endpoint | https://api.openai.com/v1 |
string |
| Variable | Description | Default | Type |
|---|---|---|---|
LLM_CHAT_MODEL |
Model for chat completions | gpt-3.5-turbo |
string |
LLM_EMBEDDING_MODEL |
Model for creating embeddings | text-embedding-3-small |
string |
| Variable | Description | Default | Type |
|---|---|---|---|
TEMPERATURE |
Sampling temperature. Lower = more deterministic output (0.0–1.0) | 0.7 |
float |
MAX_TOKENS |
Maximum tokens in the generated answer | 1000 |
integer |
MAX_RETRIES |
Maximum retry attempts on API failures | 3 |
integer |
REQUEST_TIMEOUT |
API request timeout in seconds | 300 |
integer |
| Variable | Description | Default | Type |
|---|---|---|---|
VERIFY_SSL |
SSL certificate verification. Set false only for development |
true |
boolean |
| Variable | Description | Default | Type |
|---|---|---|---|
FLASK_ENV |
Flask environment mode | development |
string |
Example .env file:
# backend/.env
# ============================================================================
# LLM Provider Configuration
# ============================================================================
# Provider Selection
# Options: openai, groq, ollama, openrouter, custom
LLM_PROVIDER=openai
# API Key (not required for Ollama)
LLM_API_KEY=sk-your-api-key-here
# Base URL for LLM API
LLM_BASE_URL=https://api.openai.com/v1
# ============================================================================
# Model Configuration
# ============================================================================
# Chat Model (for generating answers)
LLM_CHAT_MODEL=gpt-3.5-turbo
# Embedding Model (for creating vector representations)
LLM_EMBEDDING_MODEL=text-embedding-3-small
# ============================================================================
# Generation Parameters
# ============================================================================
# Temperature: Controls randomness in responses (0.0 - 1.0)
TEMPERATURE=0.7
# Maximum Tokens: Maximum length of generated responses
MAX_TOKENS=1000
# Maximum Retry Attempts: Number of retries on API failures
MAX_RETRIES=3
# Request Timeout: API request timeout in seconds
REQUEST_TIMEOUT=300
# ============================================================================
# Security Configuration
# ============================================================================
# SSL Verification (use 'true' in production)
VERIFY_SSL=true
# ============================================================================
# Flask Configuration
# ============================================================================
# Flask Environment: development or production
FLASK_ENV=developmentFor complete examples of all provider configurations, see backend/.env.example.
- Framework: Flask (Python web framework with WSGI)
- LLM Integration:
- OpenAI Python SDK (multi-provider compatible)
- Configurable via environment variables
- Supports OpenAI, Groq, Ollama, OpenRouter, Custom APIs
- Vector Database: ChromaDB (persistent local storage)
- Document Processing:
- PyPDF2 (PDF text extraction)
- python-docx (DOCX text extraction)
- tiktoken (token counting and chunking)
- Search Algorithms:
- Dense vector search (cosine similarity)
- BM25 sparse search (keyword matching)
- Reciprocal Rank Fusion (RRF)
- Cosine similarity reranking
- API Features:
- Flask-CORS (cross-origin resource sharing)
- Server-Sent Events (SSE) for streaming
- Background task processing
- Utilities:
- NumPy (numerical operations)
- python-dotenv (environment variable management)
- Framework: React 18 with JavaScript
- Build Tool: Vite (fast bundler and dev server)
- Styling: Tailwind CSS + PostCSS
- UI Components:
- Custom design system
- Lucide React icons
- Drag-and-drop file upload
- State Management: React hooks (useState, useEffect, useRef)
- API Communication:
- Fetch API for REST calls
- EventSource for Server-Sent Events (SSE)
- Proxy configuration via Vite
- Markdown & Code:
- Syntax highlighting for citations
- Real-time streaming text display
- Containerization: Docker + Docker Compose
- Volumes:
- ChromaDB persistence (
.chromadb/) - File uploads storage (
uploads/)
- ChromaDB persistence (
- Networking: Docker bridge network
- Health Checks: Backend health monitoring
- RAG (Retrieval-Augmented Generation):
- Document chunking with semantic boundaries
- Vector embeddings for semantic search
- Context-aware answer generation
- Hybrid Search:
- Dense retrieval (embeddings + cosine similarity)
- Sparse retrieval (BM25 keyword matching)
- Reciprocal Rank Fusion (RRF) algorithm
- Reranking: Cosine similarity for context prioritization
- Prompt Engineering:
- Evidence-based reasoning prompts
- Citation formatting instructions
- Thinking process elicitation
For comprehensive troubleshooting guidance, common issues, and solutions, refer to:
Troubleshooting Guide - TROUBLESHOOTING.md
Issue: "No documents found" error
# Upload documents first and wait for processing to complete
# Check backend logs
docker compose logs backend --tail 50- Ensure documents were uploaded successfully
- Wait for background processing to complete
- Verify ChromaDB is accessible
Issue: LLM API errors
# Test API key and connectivity
curl -X POST http://localhost:5000/api/status
# Check backend logs for error details
docker compose logs backend --tail 50- Verify API key is correct in
backend/.env - Ensure API key has sufficient credits/quota
- Check network connectivity to LLM provider
- Verify
LLM_BASE_URLis correct for your provider
Issue: Ollama connection refused
# Confirm Ollama is running on the host
curl http://localhost:11434/api/tags
# If not running, start it
ollama serve- Ensure Ollama is running natively on the host (not in Docker)
- Verify
LLM_BASE_URL=http://localhost:11434/v1inbackend/.env - Check that required models are pulled (
ollama list)
Issue: Empty or poor quality answers
- Enable hybrid search for better retrieval
- Enable reranking for improved context selection
- Verify documents uploaded contain relevant information
- Try adjusting
TEMPERATUREinbackend/.env - Check that embeddings were created successfully
Issue: Slow responses
- Disable reranking if speed is critical
- Use faster LLM models (e.g.,
gpt-3.5-turbovsgpt-4) - For Ollama, ensure GPU acceleration is enabled
- Reduce number of retrieved chunks (modify code)
Enable verbose logging for deeper inspection:
# View real-time container logs
docker compose logs -f backend
# Check specific errors
docker compose logs backend | grep ERROR
# View all backend activity
docker compose logs backend --tail 200Clear data and restart:
# Stop services
docker compose down
# Clear all data
rm -rf .chromadb uploads
mkdir .chromadb uploads
# Restart fresh
docker compose up --buildThis project is licensed under the terms specified in the LICENSE.md file.
ClinIQ is provided as-is for research, educational, and informational purposes only. This tool is NOT intended for clinical diagnosis, treatment decisions, or patient care.
Important Warnings:
- Not Medical Advice: Answers generated by ClinIQ do not constitute medical advice, diagnosis, or treatment recommendations
- Always Verify: Healthcare professionals must verify all AI-generated information against authoritative clinical sources
- Human Review Required: All outputs must be reviewed by qualified medical professionals before any clinical application
- No Liability: The developers assume no liability for any decisions made based on ClinIQ outputs
- Data Privacy: Ensure compliance with HIPAA and other healthcare data regulations when uploading documents
- Experimental Technology: RAG and LLM technologies may produce inaccurate, incomplete, or hallucinated information
- Not FDA Approved: This software has not been evaluated or approved by the FDA or any regulatory agency
Best Practices:
- Only upload de-identified or appropriately authorized clinical documents
- Consult qualified healthcare professionals for all medical decisions
- Validate all information against peer-reviewed medical literature
- Conduct thorough testing in non-production environments before any real-world use
- Implement additional safety checks and human oversight for any clinical applications
- Maintain audit trails and version control for clinical decision support systems
For full disclaimer details, see DISCLAIMER.md
