Skip to content

rileyafox/DeepNeedle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DeepNeedle - Hybrid RAG System

DeepNeedle DeepNeedle

DeepNeedle is a production-ready Retrieval-Augmented Generation (RAG) system that combines dense vector search, keyword-based BM25 search, and cross-encoder reranking to deliver high-quality, cited answers from your documents.

🌟 Key Features

  • Hybrid Retrieval: Combines semantic search (BGE embeddings) + keyword search (BM25) + cross-encoder reranking
  • Multiple LLM Backends: Groq API (fast, cloud) or Local Mistral-7B (privacy-preserving)
  • Citation Tracking: Automatic source attribution with provenance
  • Production Ready: Docker Compose setup with Qdrant, Meilisearch, FastAPI backend, and Next.js frontend
  • GPU Accelerated: NVIDIA GPU support for local inference
  • Comprehensive Documentation: Extensively documented codebase with inline comments

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Query     β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Hybrid Retrieval Agent             β”‚
β”‚  β”œβ”€ Dense Search (Qdrant + BGE)     β”‚
β”‚  β”œβ”€ BM25 Search (Meilisearch)       β”‚
β”‚  └─ Cross-Encoder Reranking         β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Synthesis Agent                    β”‚
β”‚  └─ LLM (Groq or Local Mistral)     β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Answer     β”‚
β”‚  + Citationsβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“‹ Table of Contents


πŸ”§ Prerequisites

Required

  • Docker and Docker Compose (v2.0+)
  • Python 3.10+ (for local development or data ingestion)
  • NVIDIA GPU (optional, for local LLM inference)
    • NVIDIA Docker runtime for GPU support in containers
  • 8GB+ RAM minimum (16GB+ recommended)

Optional

  • NVIDIA GPU with 6GB+ VRAM (for local Mistral-7B)
  • Groq API Key (free tier available at console.groq.com)

πŸš€ Quick Start

Get up and running in 3 steps:

# 1. Clone the repository
git clone <your-repo-url>
cd DeepNeedle

# 2. Set up environment variables
cp .env.example .env
# Edit .env and add your GROQ_API_KEY (optional but recommended)

# 3. Start all services with Docker Compose
docker-compose up -d

Access the application:


πŸ“¦ Project Setup

1. Clone and Navigate

git clone <your-repo-url>
cd DeepNeedle

2. Environment Configuration

Create your environment file:

cp .env.example .env

Edit .env and configure:

# ===========================================
# REQUIRED: Groq API Configuration
# ===========================================
# Get your free API key from: https://console.groq.com
USE_GROQ=true
GROQ_API_KEY=your_groq_api_key_here
GROQ_MODEL=llama-3.3-70b-versatile

# ===========================================
# Database Configuration (defaults work for Docker)
# ===========================================
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION=openreview

MEILI_URL=http://localhost:7700
MEILI_KEY=devkey_for_development_only_not_secure
MEILI_INDEX=openreview

# ===========================================
# Model Configuration
# ===========================================
EMBED_MODEL=BAAI/bge-large-en-v1.5
RERANK_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2

# ===========================================
# Optional: Local LLM (if USE_GROQ=false)
# ===========================================
# LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.3

Important Notes:

  • Groq API Key: Sign up at console.groq.com for a free API key
  • Database URLs: Use localhost for local development, container names (rag-qdrant, rag-meili) when running inside Docker
  • Collection/Index Names: Default is openreview, change if using different datasets

3. Install Python Dependencies (for local development/ingestion)

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

πŸ“Š Data Ingestion

DeepNeedle requires pre-computed embeddings and metadata to be loaded into the vector stores.

Expected Data Format

You need two files:

  1. metadata.parquet: Pandas DataFrame with columns:

    • chunk_index: Integer index of the chunk
    • pdf_path: Source document path/identifier
    • title: Document title
    • text: Chunk text content
  2. embeddings.npy: NumPy array with shape (n_chunks, embedding_dim)

    • Must match the row count of metadata.parquet
    • Typical dimension: 768 (BGE-base) or 1024 (BGE-large)

Prepare Your Data

Place your data files in the data/ directory:

DeepNeedle/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ openreview_metadata.parquet
β”‚   └── openreview_embeddings.npy

Running the Ingestion Pipeline

Option 1: Using Docker Compose (Recommended)

Start the services first:

docker-compose up -d qdrant meili

Wait for services to be healthy (~10 seconds), then run ingestion:

# Run ingestion inside the API container
docker-compose exec api python pipelines/load_precomputed_embeddings.py \
  --metadata ./data/openreview_metadata.parquet \
  --embeddings ./data/openreview_embeddings.npy \
  --collection openreview \
  --meili-index openreview

Option 2: Local Python (if services running locally)

# Make sure Qdrant and Meilisearch are running
docker-compose up -d qdrant meili

# Activate virtual environment
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Run ingestion script
python pipelines/load_precomputed_embeddings.py \
  --metadata ./data/openreview_metadata.parquet \
  --embeddings ./data/openreview_embeddings.npy \
  --collection openreview \
  --meili-index openreview

Ingestion Options

# Test with limited chunks
python pipelines/load_precomputed_embeddings.py \
  --metadata ./data/metadata.parquet \
  --embeddings ./data/embeddings.npy \
  --limit 100

# Custom batch size (default: 100)
python pipelines/load_precomputed_embeddings.py \
  --metadata ./data/metadata.parquet \
  --embeddings ./data/embeddings.npy \
  --batch-size 500

# Use different collection/index names
python pipelines/load_precomputed_embeddings.py \
  --metadata ./data/metadata.parquet \
  --embeddings ./data/embeddings.npy \
  --collection my_collection \
  --meili-index my_index

Verify Ingestion

Check that data was loaded successfully:

# Check Qdrant (vector database)
curl http://localhost:6333/collections/openreview

# Check Meilisearch (search engine)
curl http://localhost:7700/indexes/openreview/stats \
  -H "Authorization: Bearer devkey_for_development_only_not_secure"

🐳 Running with Docker Compose

Start All Services

# Start all services (detached mode)
docker-compose up -d

# View logs
docker-compose logs -f

# View logs for specific service
docker-compose logs -f api

Service Overview

Service Port Description
Qdrant 6333, 6334 Vector database for semantic search
Meilisearch 7700 Search engine for BM25 keyword search
API 8000 FastAPI backend with RAG endpoints
Frontend 3000 Next.js web interface

Service Management

# Stop all services
docker-compose down

# Stop and remove volumes (⚠️ deletes all data)
docker-compose down -v

# Rebuild specific service
docker-compose build api

# Restart specific service
docker-compose restart api

# View service status
docker-compose ps

# Execute command in running container
docker-compose exec api bash

GPU Support

The API service is configured for NVIDIA GPU support:

# In docker-compose.yml
runtime: nvidia
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]

Prerequisites:

  • NVIDIA Docker runtime installed
  • GPU drivers installed on host

Verify GPU Access:

docker-compose exec api nvidia-smi

Volumes and Persistence

Data is persisted in named volumes:

# View volumes
docker volume ls | grep deepneedle

# Backup Qdrant data
docker run --rm -v deepneedle_qdrant_data:/data -v $(pwd):/backup \
  ubuntu tar czf /backup/qdrant_backup.tar.gz /data

# Restore Qdrant data
docker run --rm -v deepneedle_qdrant_data:/data -v $(pwd):/backup \
  ubuntu tar xzf /backup/qdrant_backup.tar.gz -C /

πŸ’» Local Development

Running Services Separately

1. Start Infrastructure

# Start only Qdrant and Meilisearch
docker-compose up -d qdrant meili

2. Run API Locally

# Activate virtual environment
source venv/bin/activate  # Windows: venv\Scripts\activate

# Run API with hot-reload
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

3. Run Frontend Locally

cd frontend

# Install dependencies
npm install

# Start development server
npm run dev

πŸ“š API Documentation

Interactive API Docs

Once the API is running, visit:

Key Endpoints

Health Check

GET /health

curl http://localhost:8000/health

List Documents

GET /documents?limit=100&offset=0

curl http://localhost:8000/documents

Hybrid Retrieval (Search)

POST /retrieve
{
  "query": "What is machine learning?",
  "k": 12,
  "rerank": true,
  "weights": {
    "dense": 0.5,
    "bm25": 0.3,
    "rerank": 0.2
  }
}

# Example
curl -X POST http://localhost:8000/retrieve \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is machine learning?",
    "k": 5,
    "rerank": true
  }'

Full RAG (Ask with Answer Generation)

POST /ask
{
  "query": "What is machine learning?",
  "k": 12,
  "rerank": true
}

# Example
curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Explain transformer architecture",
    "k": 10
  }'

βš™οΈ Configuration

Environment Variables Reference

Variable Default Description
QDRANT_URL http://localhost:6333 Qdrant vector database URL
QDRANT_COLLECTION dochay Collection name in Qdrant
MEILI_URL http://localhost:7700 Meilisearch URL
MEILI_KEY devkey_for_development_only_not_secure Meilisearch API key
MEILI_INDEX dochay Index name in Meilisearch
EMBED_MODEL BAAI/bge-base-en-v1.5 Embedding model from Hugging Face
RERANK_MODEL cross-encoder/ms-marco-MiniLM-L-6-v2 Reranking model
USE_GROQ true Use Groq API (true) or local LLM (false)
GROQ_API_KEY `` Your Groq API key
GROQ_MODEL llama-3.3-70b-versatile Groq model to use
LLM_MODEL mistralai/Mistral-7B-Instruct-v0.3 Local LLM model (if USE_GROQ=false)
LOG_LEVEL INFO Logging level (DEBUG, INFO, WARNING, ERROR)

Retrieval Weights

Default hybrid retrieval weights:

weights = {
  "dense": 0.5,   # 50% - Semantic similarity
  "bm25": 0.3,    # 30% - Keyword matching
  "rerank": 0.2   # 20% - Cross-encoder quality
}

Customize in API requests or modify in agents/retriever.py.

Model Selection

Embedding Models (Hugging Face):

  • BAAI/bge-base-en-v1.5 (768-dim, faster)
  • BAAI/bge-large-en-v1.5 (1024-dim, better quality)

Reranking Models:

  • cross-encoder/ms-marco-MiniLM-L-6-v2 (balanced)
  • cross-encoder/ms-marco-TinyBERT-L-2-v2 (faster)

LLM Options:

  • Groq API: Fast, high-quality, free tier available
    • llama-3.3-70b-versatile
    • llama-3.1-70b-versatile
    • mixtral-8x7b-32768
  • Local: Privacy-preserving, no API costs
    • mistralai/Mistral-7B-Instruct-v0.3

πŸ” Troubleshooting

Common Issues

1. Services Won't Start

# Check service logs
docker-compose logs

# Check specific service
docker-compose logs qdrant
docker-compose logs meili

# Restart services
docker-compose restart

2. GPU Not Detected

# Verify NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

# Check docker-compose.yml has runtime: nvidia

# Install NVIDIA Container Toolkit if needed
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

3. Ingestion Fails

# Check file paths and formats
# Metadata must be .parquet with required columns
# Embeddings must be .npy with matching row count

# Test with limited chunks
python pipelines/load_precomputed_embeddings.py --limit 10

# Check Qdrant is accessible
curl http://localhost:6333/collections

# Check Meilisearch is accessible
curl http://localhost:7700/health

4. API Returns 500 Errors

# Check API logs
docker-compose logs api

# Common causes:
# - Missing GROQ_API_KEY (if USE_GROQ=true)
# - Empty database (run ingestion first)
# - Model download failed (check internet connection)

# Verify environment variables
docker-compose exec api env | grep GROQ

5. Out of Memory

# Reduce embedding model size
EMBED_MODEL=BAAI/bge-small-en-v1.5

# Disable reranking temporarily
# In API request: "rerank": false

# Use Groq instead of local LLM
USE_GROQ=true

6. Slow Inference

# Use Groq API for faster inference
USE_GROQ=true

# Reduce retrieval count
# In API request: "k": 5

# Check GPU is being used
docker-compose exec api nvidia-smi

# Reduce batch size in ingestion
--batch-size 50

πŸ“– Documentation

Code Documentation

All core modules are comprehensively documented with:

  • Module-level overviews
  • Function docstrings with examples
  • Inline comments explaining implementation
  • Architecture diagrams and rationale

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors