Multi-RAG Agent

A scalable Retrieval-Augmented Generation (RAG) system that can index and query multiple documents with memory-efficient processing.

Features

📄 Index multiple PDFs with memory-efficient processing
🔎 Vector database integration (Weaviate) for semantic search
🧠 Advanced document chunking with configurable sizes and overlap
🤖 Google's Generative AI (Gemini) integration
🚀 Memory-efficient processing designed for large documents
🛠️ Configurable page range processing for partial document indexing
📊 Detailed logging and progress tracking

Requirements

Python 3.10+
Docker and Docker Compose (for Weaviate vector database)
Conda (recommended for environment management)

Quick Start

1. Environment Setup

# Create conda environment
conda create -n arag_env python=3.10
conda activate arag_env

# Install dependencies
pip install -r requirements.txt

# Install the local package in development mode
pip install -e .

2. Start Weaviate (Vector Database)

# Start Weaviate using Docker Compose
docker-compose up -d weaviate

3. Configure API Keys

Create or edit a .env file in the root directory with the following:

# Google AI API Key
GOOGLE_API_KEY=your_google_api_key_here

# Vector Database Configuration
VECTOR_DB_BASE_URL=http://localhost:8081
VECTOR_DB_TYPE=weaviate
WEAVIATE_CLASS_NAME=Document

# Logging Configuration
LOG_LEVEL=INFO

You can obtain a Google AI API key from the Google AI Studio.

4. Indexing Documents

Memory-Efficient Indexing (Recommended)

For large documents, index with memory-efficient settings and specific page ranges:

# For PowerShell
conda activate arag_env; python index_pdfs.py --path "data/your_document.pdf" --chunk-size 150 --overlap 25 --batch-size 2 --start-page 1 --end-page 20

Index All Documents in a Directory

# For PowerShell
conda activate arag_env; python index_pdfs.py --path "data/" --chunk-size 150 --overlap 25 --batch-size 2

5. Query the System

# For PowerShell
conda activate arag_env; python run.py "Your question about the indexed documents?"

Troubleshooting

Memory Issues During Indexing

If you encounter Out of Memory (OOM) errors:

Reduce chunk-size (e.g., 150 chars)
Reduce batch-size (e.g., 2)
Process smaller page ranges with --start-page and --end-page
Increase system swap space

Docker Issues

If Weaviate doesn't start properly:

# Check Weaviate container logs
docker logs weaviate

# Restart Weaviate
docker-compose restart weaviate

Project Structure

.
├── arag/               # Core RAG system modules
│   ├── agents/         # Agent implementations
│   ├── core/           # Core functionality
│   │   ├── ai_client.py      # AI models integration
│   │   ├── memory.py         # Memory management
│   │   ├── orchestrator.py   # Query processing orchestration
│   │   └── vector_db.py      # Vector database client
│   └── utils/          # Utilities
├── data/               # Store your PDF documents here
├── logs/               # Log files
├── output/             # Query output and memory files
├── docker-compose.yaml # Docker configuration
├── Dockerfile          # For containerization
├── index_pdfs.py       # PDF indexing script
├── run.py              # Main query script
└── run_indexing.bat    # Windows batch script for indexing

Advanced Configuration

Chunking Settings

chunk-size: Size of text chunks (smaller = more memory-efficient)
overlap: Overlap between chunks to maintain context
batch-size: Number of chunks to process at once (smaller = more memory-efficient)

Page Range Processing

Process specific pages to:

Test indexing before committing to full document
Resume interrupted indexing
Update specific sections of documents

# Process pages 5-15 only
python index_pdfs.py --path "data/document.pdf" --start-page 5 --end-page 15

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
multi_rag_agent-main		multi_rag_agent-main
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-RAG Agent

Features

Requirements

Quick Start

1. Environment Setup

2. Start Weaviate (Vector Database)

3. Configure API Keys

4. Indexing Documents

Memory-Efficient Indexing (Recommended)

Index All Documents in a Directory

5. Query the System

Troubleshooting

Memory Issues During Indexing

Docker Issues

Project Structure

Advanced Configuration

Chunking Settings

Page Range Processing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-RAG Agent

Features

Requirements

Quick Start

1. Environment Setup

2. Start Weaviate (Vector Database)

3. Configure API Keys

4. Indexing Documents

Memory-Efficient Indexing (Recommended)

Index All Documents in a Directory

5. Query the System

Troubleshooting

Memory Issues During Indexing

Docker Issues

Project Structure

Advanced Configuration

Chunking Settings

Page Range Processing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages