DeepNeedle is a production-ready Retrieval-Augmented Generation (RAG) system that combines dense vector search, keyword-based BM25 search, and cross-encoder reranking to deliver high-quality, cited answers from your documents.
- Hybrid Retrieval: Combines semantic search (BGE embeddings) + keyword search (BM25) + cross-encoder reranking
- Multiple LLM Backends: Groq API (fast, cloud) or Local Mistral-7B (privacy-preserving)
- Citation Tracking: Automatic source attribution with provenance
- Production Ready: Docker Compose setup with Qdrant, Meilisearch, FastAPI backend, and Next.js frontend
- GPU Accelerated: NVIDIA GPU support for local inference
- Comprehensive Documentation: Extensively documented codebase with inline comments
βββββββββββββββ
β Query β
ββββββββ¬βββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Hybrid Retrieval Agent β
β ββ Dense Search (Qdrant + BGE) β
β ββ BM25 Search (Meilisearch) β
β ββ Cross-Encoder Reranking β
ββββββββ¬βββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Synthesis Agent β
β ββ LLM (Groq or Local Mistral) β
ββββββββ¬βββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββ
β Answer β
β + Citationsβ
βββββββββββββββ
- Prerequisites
- Quick Start
- Project Setup
- Data Ingestion
- Running with Docker Compose
- Local Development
- API Documentation
- Configuration
- Troubleshooting
- Docker and Docker Compose (v2.0+)
- Python 3.10+ (for local development or data ingestion)
- NVIDIA GPU (optional, for local LLM inference)
- NVIDIA Docker runtime for GPU support in containers
- 8GB+ RAM minimum (16GB+ recommended)
- NVIDIA GPU with 6GB+ VRAM (for local Mistral-7B)
- Groq API Key (free tier available at console.groq.com)
Get up and running in 3 steps:
# 1. Clone the repository
git clone <your-repo-url>
cd DeepNeedle
# 2. Set up environment variables
cp .env.example .env
# Edit .env and add your GROQ_API_KEY (optional but recommended)
# 3. Start all services with Docker Compose
docker-compose up -dAccess the application:
- Frontend: http://localhost:3000
- API: http://localhost:8000
- API Docs: http://localhost:8000/docs
git clone <your-repo-url>
cd DeepNeedleCreate your environment file:
cp .env.example .envEdit .env and configure:
# ===========================================
# REQUIRED: Groq API Configuration
# ===========================================
# Get your free API key from: https://console.groq.com
USE_GROQ=true
GROQ_API_KEY=your_groq_api_key_here
GROQ_MODEL=llama-3.3-70b-versatile
# ===========================================
# Database Configuration (defaults work for Docker)
# ===========================================
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION=openreview
MEILI_URL=http://localhost:7700
MEILI_KEY=devkey_for_development_only_not_secure
MEILI_INDEX=openreview
# ===========================================
# Model Configuration
# ===========================================
EMBED_MODEL=BAAI/bge-large-en-v1.5
RERANK_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2
# ===========================================
# Optional: Local LLM (if USE_GROQ=false)
# ===========================================
# LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.3Important Notes:
- Groq API Key: Sign up at console.groq.com for a free API key
- Database URLs: Use
localhostfor local development, container names (rag-qdrant,rag-meili) when running inside Docker - Collection/Index Names: Default is
openreview, change if using different datasets
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txtDeepNeedle requires pre-computed embeddings and metadata to be loaded into the vector stores.
You need two files:
-
metadata.parquet: Pandas DataFrame with columns:
chunk_index: Integer index of the chunkpdf_path: Source document path/identifiertitle: Document titletext: Chunk text content
-
embeddings.npy: NumPy array with shape
(n_chunks, embedding_dim)- Must match the row count of metadata.parquet
- Typical dimension: 768 (BGE-base) or 1024 (BGE-large)
Place your data files in the data/ directory:
DeepNeedle/
βββ data/
β βββ openreview_metadata.parquet
β βββ openreview_embeddings.npyStart the services first:
docker-compose up -d qdrant meiliWait for services to be healthy (~10 seconds), then run ingestion:
# Run ingestion inside the API container
docker-compose exec api python pipelines/load_precomputed_embeddings.py \
--metadata ./data/openreview_metadata.parquet \
--embeddings ./data/openreview_embeddings.npy \
--collection openreview \
--meili-index openreview# Make sure Qdrant and Meilisearch are running
docker-compose up -d qdrant meili
# Activate virtual environment
source venv/bin/activate # or venv\Scripts\activate on Windows
# Run ingestion script
python pipelines/load_precomputed_embeddings.py \
--metadata ./data/openreview_metadata.parquet \
--embeddings ./data/openreview_embeddings.npy \
--collection openreview \
--meili-index openreview# Test with limited chunks
python pipelines/load_precomputed_embeddings.py \
--metadata ./data/metadata.parquet \
--embeddings ./data/embeddings.npy \
--limit 100
# Custom batch size (default: 100)
python pipelines/load_precomputed_embeddings.py \
--metadata ./data/metadata.parquet \
--embeddings ./data/embeddings.npy \
--batch-size 500
# Use different collection/index names
python pipelines/load_precomputed_embeddings.py \
--metadata ./data/metadata.parquet \
--embeddings ./data/embeddings.npy \
--collection my_collection \
--meili-index my_indexCheck that data was loaded successfully:
# Check Qdrant (vector database)
curl http://localhost:6333/collections/openreview
# Check Meilisearch (search engine)
curl http://localhost:7700/indexes/openreview/stats \
-H "Authorization: Bearer devkey_for_development_only_not_secure"# Start all services (detached mode)
docker-compose up -d
# View logs
docker-compose logs -f
# View logs for specific service
docker-compose logs -f api| Service | Port | Description |
|---|---|---|
| Qdrant | 6333, 6334 | Vector database for semantic search |
| Meilisearch | 7700 | Search engine for BM25 keyword search |
| API | 8000 | FastAPI backend with RAG endpoints |
| Frontend | 3000 | Next.js web interface |
# Stop all services
docker-compose down
# Stop and remove volumes (β οΈ deletes all data)
docker-compose down -v
# Rebuild specific service
docker-compose build api
# Restart specific service
docker-compose restart api
# View service status
docker-compose ps
# Execute command in running container
docker-compose exec api bashThe API service is configured for NVIDIA GPU support:
# In docker-compose.yml
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]Prerequisites:
- NVIDIA Docker runtime installed
- GPU drivers installed on host
Verify GPU Access:
docker-compose exec api nvidia-smiData is persisted in named volumes:
# View volumes
docker volume ls | grep deepneedle
# Backup Qdrant data
docker run --rm -v deepneedle_qdrant_data:/data -v $(pwd):/backup \
ubuntu tar czf /backup/qdrant_backup.tar.gz /data
# Restore Qdrant data
docker run --rm -v deepneedle_qdrant_data:/data -v $(pwd):/backup \
ubuntu tar xzf /backup/qdrant_backup.tar.gz -C /# Start only Qdrant and Meilisearch
docker-compose up -d qdrant meili# Activate virtual environment
source venv/bin/activate # Windows: venv\Scripts\activate
# Run API with hot-reload
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000cd frontend
# Install dependencies
npm install
# Start development server
npm run devOnce the API is running, visit:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
GET /health
curl http://localhost:8000/healthGET /documents?limit=100&offset=0
curl http://localhost:8000/documentsPOST /retrieve
{
"query": "What is machine learning?",
"k": 12,
"rerank": true,
"weights": {
"dense": 0.5,
"bm25": 0.3,
"rerank": 0.2
}
}
# Example
curl -X POST http://localhost:8000/retrieve \
-H "Content-Type: application/json" \
-d '{
"query": "What is machine learning?",
"k": 5,
"rerank": true
}'POST /ask
{
"query": "What is machine learning?",
"k": 12,
"rerank": true
}
# Example
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{
"query": "Explain transformer architecture",
"k": 10
}'| Variable | Default | Description |
|---|---|---|
QDRANT_URL |
http://localhost:6333 |
Qdrant vector database URL |
QDRANT_COLLECTION |
dochay |
Collection name in Qdrant |
MEILI_URL |
http://localhost:7700 |
Meilisearch URL |
MEILI_KEY |
devkey_for_development_only_not_secure |
Meilisearch API key |
MEILI_INDEX |
dochay |
Index name in Meilisearch |
EMBED_MODEL |
BAAI/bge-base-en-v1.5 |
Embedding model from Hugging Face |
RERANK_MODEL |
cross-encoder/ms-marco-MiniLM-L-6-v2 |
Reranking model |
USE_GROQ |
true |
Use Groq API (true) or local LLM (false) |
GROQ_API_KEY |
`` | Your Groq API key |
GROQ_MODEL |
llama-3.3-70b-versatile |
Groq model to use |
LLM_MODEL |
mistralai/Mistral-7B-Instruct-v0.3 |
Local LLM model (if USE_GROQ=false) |
LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
Default hybrid retrieval weights:
weights = {
"dense": 0.5, # 50% - Semantic similarity
"bm25": 0.3, # 30% - Keyword matching
"rerank": 0.2 # 20% - Cross-encoder quality
}Customize in API requests or modify in agents/retriever.py.
Embedding Models (Hugging Face):
BAAI/bge-base-en-v1.5(768-dim, faster)BAAI/bge-large-en-v1.5(1024-dim, better quality)
Reranking Models:
cross-encoder/ms-marco-MiniLM-L-6-v2(balanced)cross-encoder/ms-marco-TinyBERT-L-2-v2(faster)
LLM Options:
- Groq API: Fast, high-quality, free tier available
llama-3.3-70b-versatilellama-3.1-70b-versatilemixtral-8x7b-32768
- Local: Privacy-preserving, no API costs
mistralai/Mistral-7B-Instruct-v0.3
# Check service logs
docker-compose logs
# Check specific service
docker-compose logs qdrant
docker-compose logs meili
# Restart services
docker-compose restart# Verify NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
# Check docker-compose.yml has runtime: nvidia
# Install NVIDIA Container Toolkit if needed
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html# Check file paths and formats
# Metadata must be .parquet with required columns
# Embeddings must be .npy with matching row count
# Test with limited chunks
python pipelines/load_precomputed_embeddings.py --limit 10
# Check Qdrant is accessible
curl http://localhost:6333/collections
# Check Meilisearch is accessible
curl http://localhost:7700/health# Check API logs
docker-compose logs api
# Common causes:
# - Missing GROQ_API_KEY (if USE_GROQ=true)
# - Empty database (run ingestion first)
# - Model download failed (check internet connection)
# Verify environment variables
docker-compose exec api env | grep GROQ# Reduce embedding model size
EMBED_MODEL=BAAI/bge-small-en-v1.5
# Disable reranking temporarily
# In API request: "rerank": false
# Use Groq instead of local LLM
USE_GROQ=true# Use Groq API for faster inference
USE_GROQ=true
# Reduce retrieval count
# In API request: "k": 5
# Check GPU is being used
docker-compose exec api nvidia-smi
# Reduce batch size in ingestion
--batch-size 50- DOCUMENTATION_GUIDE.md: Code documentation format and standards
- DOCUMENTATION_STATUS.md: Documentation coverage tracking
- API Docs: http://localhost:8000/docs (when running)
All core modules are comprehensively documented with:
- Module-level overviews
- Function docstrings with examples
- Inline comments explaining implementation
- Architecture diagrams and rationale

