RAG API - OpenWebUI Integration

🚀 Complete OpenAI-compatible RAG API for seamless OpenWebUI integration

This project provides a fully compatible OpenAI-style API that connects your document knowledge base to OpenWebUI, enabling document-based conversational AI with intelligent retrieval and source citations.

✨ Features

✅ OpenAI Compatible: Drop-in replacement for OpenAI API in OpenWebUI
✅ Streaming Support: Real-time response streaming
✅ Source Citations: Automatic document source references
✅ Multi-format Documents: PDF, DOCX, TXT, MD, CSV, JSON, and more
✅ Intelligent Chunking: Advanced text processing for better context
✅ Document Summarization: Auto-generated summaries for enhanced retrieval
✅ Health Monitoring: Built-in health checks and API monitoring
✅ Docker Support: Easy containerized deployment

🚀 Quick Start

Prerequisites

Python 3.11+
Ollama installed and running

Required Ollama models:

ollama pull llama3.2:1b
ollama pull nomic-embed-text

1. Setup the RAG API

Option A: Manual Configuration

# Setup virtual environment and dependencies
./manage_rag.sh setup

# Add your documents to the documents/ folder
mkdir -p documents
cp /path/to/your/documents/* documents/

# Ingest documents into vector database
./manage_rag.sh ingest

# Start the API server
./manage_rag.sh start

Option B: Docker Compose (To do)

version: '3.8'

services:
  openwebui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: openwebui
    ports:
      - "3000:8080"
    environment:
      - OPENAI_API_BASE_URLS=http://rag-api:5500/v1
      - OPENAI_API_KEYS=not-required
    depends_on:
      - rag-api
    networks:
      - rag-network

  rag-api:
    build: .
    container_name: rag-api
    ports:
      - "5500:5500"
    volumes:
      - ./vector_db:/app/vector_db
      - ./documents:/app/documents
    networks:
      - rag-network

networks:
  rag-network:
    driver: bridge

2. Test the API

# Check if everything is working
./manage_rag.sh test

# Or test manually
curl -s http://localhost:5500/health | jq
curl -s http://localhost:5500/v1/models | jq

3. Connect to OpenWebUI

Open OpenWebUI in your browser
Go to Settings → Connections → OpenAI API
Add a new API connection:
- API Base URL: http://localhost:5500/v1
- API Key: not-required
- Model: llama3.2:1b

Overview

The rag_api project has been completely redesigned with advanced document processing capabilities. It now supports ANY document type using intelligent detection, provides automatic summarization, and includes comprehensive metadata enhancement for optimal RAG performance.

🚀 New Features

Document Processing

Universal Document Support: Automatically detects and processes any document
Intelligent Chunking: Optimized text splitting with context-aware separators
Document Summarization: Automatic summarization using Ollama models with metadata enhancement
Comprehensive Metadata: Rich document metadata including file info, content statistics, and processing timestamps
Deduplication: Content-based hashing to prevent duplicate processing

Supported Document Types

PDFs: Native PDF processing with text extraction
Microsoft Office: Word (.docx/.doc), Excel (.xlsx/.xls), PowerPoint (.pptx/.ppt)
Text Formats: Plain text, Markdown, HTML, XML, RTF
Data Formats: CSV, JSON, JSONL
Code Files: Python, JavaScript, TypeScript, Java, C++, CSS, SQL, YAML, etc.
Email: EML, MSG files
Auto-Detection: Uses MIME type detection for unknown extensions

Vector Database Features

ChromaDB Integration: High-performance vector storage with cosine similarity
Ollama Embeddings: Local embedding generation with configurable models
Database Management: Initialize, update, and clear operations
Processing Reports: Detailed metrics and performance analysis

📦 Installation

Clone the Repository

git clone https://github.com/FlorentB974/rag_api.git
cd rag_api

Set Up Python Virtual Environment

python -m venv rag_env
source rag_env/bin/activate  # On Windows: rag_env\Scripts\activate

Install Dependencies

pip install -r requirements.txt

Install System Dependencies (for libmagic)

# macOS
brew install libmagic

# Ubuntu/Debian
sudo apt-get install libmagic1

# Windows - included with python-magic-bin (already in requirements)

🛠️ Usage

Advanced Document Processing

# Initialize new database with intelligent document processing
python vector_db.py --source /path/to/documents --db vector_db --init

# Add documents to existing database
python vector_db.py --source /path/to/documents --db vector_db

# Process with custom settings and summarization
python vector_db.py --source /path/to/documents --db vector_db --init \
  --chunk-size 1500 --chunk-overlap 300 \
  --summarize-model llama3.2:3b

# Disable summarization for faster processing
python vector_db.py --source /path/to/documents --db vector_db --init --no-summary

# Generate processing report
python vector_db.py --source /path/to/documents --db vector_db --init \
  --report processing_report.json

Advanced Options

The new system provides extensive configuration options:

python vector_db.py --help

Options include:

--chunk-size: Size of text chunks (default: 1024)
--chunk-overlap: Overlap between chunks (default: 200)
--summarize-model: Ollama model for summarization (default: llama3.2:1b)
--no-summary: Disable document summarization
--report: Path to save processing report

Direct RAG Utils Usage

from rag_utils import load_and_process_documents, DocumentProcessor

# Process documents with custom settings
documents, metrics = load_and_process_documents(
    source_path="/path/to/documents",
    summarize_model="llama3.2:1b",
    chunk_size=1024,
    chunk_overlap=200,
    enable_summarization=True
)

print(f"Processed {len(documents)} chunks from {metrics.successful_docs} documents")

Test Queries

Run the query script to test the setup:

python query.py

⚙️ Configuration

Environment Variables

Create a .env file in the project root (use .env.example as template):

# Vector Database Configuration
EMBED_MODEL_NAME=sentence-transformers/all-MiniLM-L6-v2
COLLECTION_NAME=my_documents

# Document Processing Configuration
SUMMARIZE_MODEL=llama3.2:1b

# Legacy Configuration (still supported)
VECTOR_DB_PATH=./vector_db
OLLAMA_MODEL=mistral

Ollama Models

Ensure you have Ollama installed and the required models pulled:

# Install Ollama (if not already installed)
curl -fsSL https://ollama.ai/install.sh | sh

# Pull embedding model
ollama pull sentence-transformers/all-MiniLM-L6-v2

# Pull summarization model
ollama pull llama3.2:1b

# Pull query model
ollama pull mistral

📊 Processing Reports

The new system generates detailed processing reports with comprehensive metrics:

{
  "processing_timestamp": "2025-09-09T12:00:00",
  "metrics": {
    "total_documents": 50,
    "successful_documents": 48,
    "failed_documents": 2,
    "total_chunks": 1247,
    "processing_time_seconds": 45.67,
    "success_rate": 0.96,
    "average_chunks_per_doc": 25.98
  },
  "configuration": {
    "chunk_size": 1024,
    "chunk_overlap": 200,
    "summarization_enabled": true,
    "summarize_model": "llama3.2:1b"
  }
}

🏗️ Architecture

Document Processing Pipeline

File Detection: Uses libmagic for MIME type detection
Loader Selection: Chooses optimal loader based on file type
Content Extraction: Extracts text and metadata
Intelligent Chunking: Context-aware text splitting
Summarization: Generates concise summaries using Ollama
Metadata Enhancement: Adds comprehensive metadata
Vector Storage: Stores in ChromaDB with embeddings

Key Components

rag_utils.py: ✨ NEW - Core document processing and utility functions
vector_db.py: 🔄 UPDATED - Vector database management with advanced features
query.py: Query interface for RAG operations
librechat_endpoint/: API endpoint for LibreChat integration

🔧 Advanced Features

Custom Document Processing

from rag_utils import DocumentProcessor

# Create processor with custom settings
processor = DocumentProcessor(
    summarize_model="llama3.2:3b",
    chunk_size=2048,
    chunk_overlap=400,
    enable_summarization=True
)

# Process single document
documents = processor.process_single_document(Path("document.pdf"))

# Batch process with metrics
documents, metrics = processor.process_documents("/path/to/docs")

Metadata-Rich Documents

Each processed document chunk now includes:

File Information: Size, type, timestamps, MIME type
Content Statistics: Word count, character count, content hash
Processing Info: Chunk index, total chunks, processing timestamp
Summarization: AI-generated summary (if enabled)
Deduplication: Content hash for duplicate detection

🚨 Troubleshooting

Common Issues

Missing libmagic: Install system dependencies

# macOS
brew install libmagic

# Ubuntu/Debian
sudo apt-get install libmagic1

Ollama Connection: Ensure Ollama is running
```
ollama serve
```

Memory Issues: Reduce chunk size or disable summarization

python vector_db.py --source docs --db vector_db --init --chunk-size 512 --no-summary

Import Errors: Ensure all dependencies are installed
```
pip install -r requirements.txt
```

Performance Optimization

For Large Document Collections:

Use --no-summary for faster processing
Increase --chunk-size to reduce total chunks
Use lighter embedding models

For Better Retrieval Quality:

Enable summarization with better models
Use smaller chunk sizes (512-1024)
Increase chunk overlap (200-400)

Contributing

Contributions are welcome! Please read our Contributing Guidelines for details on how to submit pull requests, report issues, or suggest improvements.

The new architecture makes it easy to:

Add support for new document types
Customize processing pipelines
Integrate additional AI models
Extend metadata extraction

License

This project is licensed under the MIT License. See the LICENSE file for details.

Support

If you encounter issues or have questions, please file an issue on the GitHub Issues page.

For the new advanced features, check the processing reports and logs for detailed debugging information.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github/workflows		.github/workflows
.env		.env
.env.example		.env.example
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
manage_rag.sh		manage_rag.sh
query.py		query.py
rag_utils.py		rag_utils.py
requirements.txt		requirements.txt
simple_openwebui_api.py		simple_openwebui_api.py
vector_db.py		vector_db.py

License

FlorentB974/rag_api

Folders and files

Latest commit

History

Repository files navigation

RAG API - OpenWebUI Integration

✨ Features

🚀 Quick Start

Prerequisites

1. Setup the RAG API

Option A: Manual Configuration

Option B: Docker Compose (To do)

2. Test the API

3. Connect to OpenWebUI

Table of Contents

Overview

🚀 New Features

Document Processing

Supported Document Types

Vector Database Features

📦 Installation

🛠️ Usage

Advanced Document Processing

Advanced Options

Direct RAG Utils Usage

Test Queries

⚙️ Configuration

Environment Variables

Ollama Models

📊 Processing Reports

🏗️ Architecture

Document Processing Pipeline

Key Components

🔧 Advanced Features

Custom Document Processing

Metadata-Rich Documents

🚨 Troubleshooting

Common Issues

Performance Optimization

Contributing

License

Support

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages