DeepNeedle - Hybrid RAG System

DeepNeedle is a production-ready Retrieval-Augmented Generation (RAG) system that combines dense vector search, keyword-based BM25 search, and cross-encoder reranking to deliver high-quality, cited answers from your documents.

🌟 Key Features

Hybrid Retrieval: Combines semantic search (BGE embeddings) + keyword search (BM25) + cross-encoder reranking
Multiple LLM Backends: Groq API (fast, cloud) or Local Mistral-7B (privacy-preserving)
Citation Tracking: Automatic source attribution with provenance
Production Ready: Docker Compose setup with Qdrant, Meilisearch, FastAPI backend, and Next.js frontend
GPU Accelerated: NVIDIA GPU support for local inference
Comprehensive Documentation: Extensively documented codebase with inline comments

🏗️ Architecture

┌─────────────┐
│   Query     │
└──────┬──────┘
       │
       ▼
┌─────────────────────────────────────┐
│  Hybrid Retrieval Agent             │
│  ├─ Dense Search (Qdrant + BGE)     │
│  ├─ BM25 Search (Meilisearch)       │
│  └─ Cross-Encoder Reranking         │
└──────┬──────────────────────────────┘
       │
       ▼
┌─────────────────────────────────────┐
│  Synthesis Agent                    │
│  └─ LLM (Groq or Local Mistral)     │
└──────┬──────────────────────────────┘
       │
       ▼
┌─────────────┐
│  Answer     │
│  + Citations│
└─────────────┘

🔧 Prerequisites

Required

Docker and Docker Compose (v2.0+)
Python 3.10+ (for local development or data ingestion)
NVIDIA GPU (optional, for local LLM inference)
- NVIDIA Docker runtime for GPU support in containers
8GB+ RAM minimum (16GB+ recommended)

Optional

NVIDIA GPU with 6GB+ VRAM (for local Mistral-7B)
Groq API Key (free tier available at console.groq.com)

🚀 Quick Start

Get up and running in 3 steps:

# 1. Clone the repository
git clone <your-repo-url>
cd DeepNeedle

# 2. Set up environment variables
cp .env.example .env
# Edit .env and add your GROQ_API_KEY (optional but recommended)

# 3. Start all services with Docker Compose
docker-compose up -d

Access the application:

📦 Project Setup

1. Clone and Navigate

git clone <your-repo-url>
cd DeepNeedle

2. Environment Configuration

Create your environment file:

cp .env.example .env

Edit .env and configure:

# ===========================================
# REQUIRED: Groq API Configuration
# ===========================================
# Get your free API key from: https://console.groq.com
USE_GROQ=true
GROQ_API_KEY=your_groq_api_key_here
GROQ_MODEL=llama-3.3-70b-versatile

# ===========================================
# Database Configuration (defaults work for Docker)
# ===========================================
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION=openreview

MEILI_URL=http://localhost:7700
MEILI_KEY=devkey_for_development_only_not_secure
MEILI_INDEX=openreview

# ===========================================
# Model Configuration
# ===========================================
EMBED_MODEL=BAAI/bge-large-en-v1.5
RERANK_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2

# ===========================================
# Optional: Local LLM (if USE_GROQ=false)
# ===========================================
# LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.3

Important Notes:

Groq API Key: Sign up at console.groq.com for a free API key
Database URLs: Use localhost for local development, container names (rag-qdrant, rag-meili) when running inside Docker
Collection/Index Names: Default is openreview, change if using different datasets

3. Install Python Dependencies (for local development/ingestion)

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

📊 Data Ingestion

DeepNeedle requires pre-computed embeddings and metadata to be loaded into the vector stores.

Expected Data Format

You need two files:

metadata.parquet: Pandas DataFrame with columns:
- chunk_index: Integer index of the chunk
- pdf_path: Source document path/identifier
- title: Document title
- text: Chunk text content
embeddings.npy: NumPy array with shape (n_chunks, embedding_dim)
- Must match the row count of metadata.parquet
- Typical dimension: 768 (BGE-base) or 1024 (BGE-large)

Prepare Your Data

Place your data files in the data/ directory:

DeepNeedle/
├── data/
│   ├── openreview_metadata.parquet
│   └── openreview_embeddings.npy

Running the Ingestion Pipeline

Option 1: Using Docker Compose (Recommended)

Start the services first:

docker-compose up -d qdrant meili

Wait for services to be healthy (~10 seconds), then run ingestion:

# Run ingestion inside the API container
docker-compose exec api python pipelines/load_precomputed_embeddings.py \
  --metadata ./data/openreview_metadata.parquet \
  --embeddings ./data/openreview_embeddings.npy \
  --collection openreview \
  --meili-index openreview

Option 2: Local Python (if services running locally)

# Make sure Qdrant and Meilisearch are running
docker-compose up -d qdrant meili

# Activate virtual environment
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Run ingestion script
python pipelines/load_precomputed_embeddings.py \
  --metadata ./data/openreview_metadata.parquet \
  --embeddings ./data/openreview_embeddings.npy \
  --collection openreview \
  --meili-index openreview

Ingestion Options

# Test with limited chunks
python pipelines/load_precomputed_embeddings.py \
  --metadata ./data/metadata.parquet \
  --embeddings ./data/embeddings.npy \
  --limit 100

# Custom batch size (default: 100)
python pipelines/load_precomputed_embeddings.py \
  --metadata ./data/metadata.parquet \
  --embeddings ./data/embeddings.npy \
  --batch-size 500

# Use different collection/index names
python pipelines/load_precomputed_embeddings.py \
  --metadata ./data/metadata.parquet \
  --embeddings ./data/embeddings.npy \
  --collection my_collection \
  --meili-index my_index

Verify Ingestion

Check that data was loaded successfully:

# Check Qdrant (vector database)
curl http://localhost:6333/collections/openreview

# Check Meilisearch (search engine)
curl http://localhost:7700/indexes/openreview/stats \
  -H "Authorization: Bearer devkey_for_development_only_not_secure"

🐳 Running with Docker Compose

Start All Services

# Start all services (detached mode)
docker-compose up -d

# View logs
docker-compose logs -f

# View logs for specific service
docker-compose logs -f api

Service Overview

Service	Port	Description
Qdrant	6333, 6334	Vector database for semantic search
Meilisearch	7700	Search engine for BM25 keyword search
API	8000	FastAPI backend with RAG endpoints
Frontend	3000	Next.js web interface

Service Management

# Stop all services
docker-compose down

# Stop and remove volumes (⚠️ deletes all data)
docker-compose down -v

# Rebuild specific service
docker-compose build api

# Restart specific service
docker-compose restart api

# View service status
docker-compose ps

# Execute command in running container
docker-compose exec api bash

GPU Support

The API service is configured for NVIDIA GPU support:

# In docker-compose.yml
runtime: nvidia
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]

Prerequisites:

NVIDIA Docker runtime installed
GPU drivers installed on host

Verify GPU Access:

docker-compose exec api nvidia-smi

Volumes and Persistence

Data is persisted in named volumes:

# View volumes
docker volume ls | grep deepneedle

# Backup Qdrant data
docker run --rm -v deepneedle_qdrant_data:/data -v $(pwd):/backup \
  ubuntu tar czf /backup/qdrant_backup.tar.gz /data

# Restore Qdrant data
docker run --rm -v deepneedle_qdrant_data:/data -v $(pwd):/backup \
  ubuntu tar xzf /backup/qdrant_backup.tar.gz -C /

💻 Local Development

Running Services Separately

1. Start Infrastructure

# Start only Qdrant and Meilisearch
docker-compose up -d qdrant meili

2. Run API Locally

# Activate virtual environment
source venv/bin/activate  # Windows: venv\Scripts\activate

# Run API with hot-reload
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

3. Run Frontend Locally

cd frontend

# Install dependencies
npm install

# Start development server
npm run dev

📚 API Documentation

Interactive API Docs

Once the API is running, visit:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Key Endpoints

Health Check

GET /health

curl http://localhost:8000/health

List Documents

GET /documents?limit=100&offset=0

curl http://localhost:8000/documents

Hybrid Retrieval (Search)

POST /retrieve
{
  "query": "What is machine learning?",
  "k": 12,
  "rerank": true,
  "weights": {
    "dense": 0.5,
    "bm25": 0.3,
    "rerank": 0.2
  }
}

# Example
curl -X POST http://localhost:8000/retrieve \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is machine learning?",
    "k": 5,
    "rerank": true
  }'

Full RAG (Ask with Answer Generation)

POST /ask
{
  "query": "What is machine learning?",
  "k": 12,
  "rerank": true
}

# Example
curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Explain transformer architecture",
    "k": 10
  }'

⚙️ Configuration

Environment Variables Reference

Variable	Default	Description
`QDRANT_URL`	`http://localhost:6333`	Qdrant vector database URL
`QDRANT_COLLECTION`	`dochay`	Collection name in Qdrant
`MEILI_URL`	`http://localhost:7700`	Meilisearch URL
`MEILI_KEY`	`devkey_for_development_only_not_secure`	Meilisearch API key
`MEILI_INDEX`	`dochay`	Index name in Meilisearch
`EMBED_MODEL`	`BAAI/bge-base-en-v1.5`	Embedding model from Hugging Face
`RERANK_MODEL`	`cross-encoder/ms-marco-MiniLM-L-6-v2`	Reranking model
`USE_GROQ`	`true`	Use Groq API (true) or local LLM (false)
`GROQ_API_KEY`	``	Your Groq API key
`GROQ_MODEL`	`llama-3.3-70b-versatile`	Groq model to use
`LLM_MODEL`	`mistralai/Mistral-7B-Instruct-v0.3`	Local LLM model (if USE_GROQ=false)
`LOG_LEVEL`	`INFO`	Logging level (DEBUG, INFO, WARNING, ERROR)

Retrieval Weights

Default hybrid retrieval weights:

weights = {
  "dense": 0.5,   # 50% - Semantic similarity
  "bm25": 0.3,    # 30% - Keyword matching
  "rerank": 0.2   # 20% - Cross-encoder quality
}

Customize in API requests or modify in agents/retriever.py.

Model Selection

Embedding Models (Hugging Face):

BAAI/bge-base-en-v1.5 (768-dim, faster)
BAAI/bge-large-en-v1.5 (1024-dim, better quality)

Reranking Models:

cross-encoder/ms-marco-MiniLM-L-6-v2 (balanced)
cross-encoder/ms-marco-TinyBERT-L-2-v2 (faster)

LLM Options:

Groq API: Fast, high-quality, free tier available
- llama-3.3-70b-versatile
- llama-3.1-70b-versatile
- mixtral-8x7b-32768
Local: Privacy-preserving, no API costs
- mistralai/Mistral-7B-Instruct-v0.3

🔍 Troubleshooting

Common Issues

1. Services Won't Start

# Check service logs
docker-compose logs

# Check specific service
docker-compose logs qdrant
docker-compose logs meili

# Restart services
docker-compose restart

2. GPU Not Detected

# Verify NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

# Check docker-compose.yml has runtime: nvidia

# Install NVIDIA Container Toolkit if needed
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

3. Ingestion Fails

# Check file paths and formats
# Metadata must be .parquet with required columns
# Embeddings must be .npy with matching row count

# Test with limited chunks
python pipelines/load_precomputed_embeddings.py --limit 10

# Check Qdrant is accessible
curl http://localhost:6333/collections

# Check Meilisearch is accessible
curl http://localhost:7700/health

4. API Returns 500 Errors

# Check API logs
docker-compose logs api

# Common causes:
# - Missing GROQ_API_KEY (if USE_GROQ=true)
# - Empty database (run ingestion first)
# - Model download failed (check internet connection)

# Verify environment variables
docker-compose exec api env | grep GROQ

5. Out of Memory

# Reduce embedding model size
EMBED_MODEL=BAAI/bge-small-en-v1.5

# Disable reranking temporarily
# In API request: "rerank": false

# Use Groq instead of local LLM
USE_GROQ=true

6. Slow Inference

# Use Groq API for faster inference
USE_GROQ=true

# Reduce retrieval count
# In API request: "k": 5

# Check GPU is being used
docker-compose exec api nvidia-smi

# Reduce batch size in ingestion
--batch-size 50

📖 Documentation

DOCUMENTATION_GUIDE.md: Code documentation format and standards
DOCUMENTATION_STATUS.md: Documentation coverage tracking
API Docs: http://localhost:8000/docs (when running)

Code Documentation

All core modules are comprehensively documented with:

Module-level overviews
Function docstrings with examples
Inline comments explaining implementation
Architecture diagrams and rationale

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
agents		agents
app		app
data		data
docker/api		docker/api
frontend		frontend
models		models
pipelines		pipelines
schemas		schemas
storage		storage
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
DeepNeedle1.jpg		DeepNeedle1.jpg
DeepNeedle2.png		DeepNeedle2.png
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

DeepNeedle - Hybrid RAG System

🌟 Key Features

🏗️ Architecture

📋 Table of Contents

🔧 Prerequisites

Required

Optional

🚀 Quick Start

📦 Project Setup

1. Clone and Navigate

2. Environment Configuration

3. Install Python Dependencies (for local development/ingestion)

📊 Data Ingestion

Expected Data Format

Prepare Your Data

Running the Ingestion Pipeline

Option 1: Using Docker Compose (Recommended)

Option 2: Local Python (if services running locally)

Ingestion Options

Verify Ingestion

🐳 Running with Docker Compose

Start All Services

Service Overview

Service Management

GPU Support

Volumes and Persistence

💻 Local Development

Running Services Separately

1. Start Infrastructure

2. Run API Locally

3. Run Frontend Locally

📚 API Documentation

Interactive API Docs

Key Endpoints

Health Check

List Documents

Hybrid Retrieval (Search)

Full RAG (Ask with Answer Generation)

⚙️ Configuration

Environment Variables Reference

Retrieval Weights

Model Selection

🔍 Troubleshooting

Common Issues

1. Services Won't Start

2. GPU Not Detected

3. Ingestion Fails

4. API Returns 500 Errors

5. Out of Memory

6. Slow Inference

📖 Documentation

Code Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages