π Complete OpenAI-compatible RAG API for seamless OpenWebUI integration
This project provides a fully compatible OpenAI-style API that connects your document knowledge base to OpenWebUI, enabling document-based conversational AI with intelligent retrieval and source citations.
- β OpenAI Compatible: Drop-in replacement for OpenAI API in OpenWebUI
- β Streaming Support: Real-time response streaming
- β Source Citations: Automatic document source references
- β Multi-format Documents: PDF, DOCX, TXT, MD, CSV, JSON, and more
- β Intelligent Chunking: Advanced text processing for better context
- β Document Summarization: Auto-generated summaries for enhanced retrieval
- β Health Monitoring: Built-in health checks and API monitoring
- β Docker Support: Easy containerized deployment
-
Python 3.11+
-
Ollama installed and running
-
Required Ollama models:
ollama pull llama3.2:1b ollama pull nomic-embed-text
# Setup virtual environment and dependencies
./manage_rag.sh setup
# Add your documents to the documents/ folder
mkdir -p documents
cp /path/to/your/documents/* documents/
# Ingest documents into vector database
./manage_rag.sh ingest
# Start the API server
./manage_rag.sh startversion: '3.8'
services:
openwebui:
image: ghcr.io/open-webui/open-webui:main
container_name: openwebui
ports:
- "3000:8080"
environment:
- OPENAI_API_BASE_URLS=http://rag-api:5500/v1
- OPENAI_API_KEYS=not-required
depends_on:
- rag-api
networks:
- rag-network
rag-api:
build: .
container_name: rag-api
ports:
- "5500:5500"
volumes:
- ./vector_db:/app/vector_db
- ./documents:/app/documents
networks:
- rag-network
networks:
rag-network:
driver: bridge# Check if everything is working
./manage_rag.sh test
# Or test manually
curl -s http://localhost:5500/health | jq
curl -s http://localhost:5500/v1/models | jq- Open OpenWebUI in your browser
- Go to Settings β Connections β OpenAI API
- Add a new API connection:
- API Base URL:
http://localhost:5500/v1 - API Key:
not-required - Model:
llama3.2:1b
- API Base URL:
- Overview
- π New Features
- π¦ Installation
- π οΈ Usage
- βοΈ Configuration
- π Processing Reports
- ποΈ Architecture
- π§ Advanced Features
- π¨ Troubleshooting
- Contributing
- License
The rag_api project has been completely redesigned with advanced document processing capabilities. It now supports ANY document type using intelligent detection, provides automatic summarization, and includes comprehensive metadata enhancement for optimal RAG performance.
- Universal Document Support: Automatically detects and processes any document
- Intelligent Chunking: Optimized text splitting with context-aware separators
- Document Summarization: Automatic summarization using Ollama models with metadata enhancement
- Comprehensive Metadata: Rich document metadata including file info, content statistics, and processing timestamps
- Deduplication: Content-based hashing to prevent duplicate processing
- PDFs: Native PDF processing with text extraction
- Microsoft Office: Word (.docx/.doc), Excel (.xlsx/.xls), PowerPoint (.pptx/.ppt)
- Text Formats: Plain text, Markdown, HTML, XML, RTF
- Data Formats: CSV, JSON, JSONL
- Code Files: Python, JavaScript, TypeScript, Java, C++, CSS, SQL, YAML, etc.
- Email: EML, MSG files
- Auto-Detection: Uses MIME type detection for unknown extensions
- ChromaDB Integration: High-performance vector storage with cosine similarity
- Ollama Embeddings: Local embedding generation with configurable models
- Database Management: Initialize, update, and clear operations
- Processing Reports: Detailed metrics and performance analysis
-
Clone the Repository
git clone https://github.com/FlorentB974/rag_api.git cd rag_api -
Set Up Python Virtual Environment
python -m venv rag_env source rag_env/bin/activate # On Windows: rag_env\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt-
Install System Dependencies (for libmagic)
# macOS brew install libmagic # Ubuntu/Debian sudo apt-get install libmagic1 # Windows - included with python-magic-bin (already in requirements)
# Initialize new database with intelligent document processing
python vector_db.py --source /path/to/documents --db vector_db --init
# Add documents to existing database
python vector_db.py --source /path/to/documents --db vector_db
# Process with custom settings and summarization
python vector_db.py --source /path/to/documents --db vector_db --init \
--chunk-size 1500 --chunk-overlap 300 \
--summarize-model llama3.2:3b
# Disable summarization for faster processing
python vector_db.py --source /path/to/documents --db vector_db --init --no-summary
# Generate processing report
python vector_db.py --source /path/to/documents --db vector_db --init \
--report processing_report.jsonThe new system provides extensive configuration options:
python vector_db.py --helpOptions include:
--chunk-size: Size of text chunks (default: 1024)--chunk-overlap: Overlap between chunks (default: 200)--summarize-model: Ollama model for summarization (default: llama3.2:1b)--no-summary: Disable document summarization--report: Path to save processing report
from rag_utils import load_and_process_documents, DocumentProcessor
# Process documents with custom settings
documents, metrics = load_and_process_documents(
source_path="/path/to/documents",
summarize_model="llama3.2:1b",
chunk_size=1024,
chunk_overlap=200,
enable_summarization=True
)
print(f"Processed {len(documents)} chunks from {metrics.successful_docs} documents")Run the query script to test the setup:
python query.pyCreate a .env file in the project root (use .env.example as template):
# Vector Database Configuration
EMBED_MODEL_NAME=sentence-transformers/all-MiniLM-L6-v2
COLLECTION_NAME=my_documents
# Document Processing Configuration
SUMMARIZE_MODEL=llama3.2:1b
# Legacy Configuration (still supported)
VECTOR_DB_PATH=./vector_db
OLLAMA_MODEL=mistralEnsure you have Ollama installed and the required models pulled:
# Install Ollama (if not already installed)
curl -fsSL https://ollama.ai/install.sh | sh
# Pull embedding model
ollama pull sentence-transformers/all-MiniLM-L6-v2
# Pull summarization model
ollama pull llama3.2:1b
# Pull query model
ollama pull mistralThe new system generates detailed processing reports with comprehensive metrics:
{
"processing_timestamp": "2025-09-09T12:00:00",
"metrics": {
"total_documents": 50,
"successful_documents": 48,
"failed_documents": 2,
"total_chunks": 1247,
"processing_time_seconds": 45.67,
"success_rate": 0.96,
"average_chunks_per_doc": 25.98
},
"configuration": {
"chunk_size": 1024,
"chunk_overlap": 200,
"summarization_enabled": true,
"summarize_model": "llama3.2:1b"
}
}- File Detection: Uses
libmagicfor MIME type detection - Loader Selection: Chooses optimal loader based on file type
- Content Extraction: Extracts text and metadata
- Intelligent Chunking: Context-aware text splitting
- Summarization: Generates concise summaries using Ollama
- Metadata Enhancement: Adds comprehensive metadata
- Vector Storage: Stores in ChromaDB with embeddings
rag_utils.py: β¨ NEW - Core document processing and utility functionsvector_db.py: π UPDATED - Vector database management with advanced featuresquery.py: Query interface for RAG operationslibrechat_endpoint/: API endpoint for LibreChat integration
from rag_utils import DocumentProcessor
# Create processor with custom settings
processor = DocumentProcessor(
summarize_model="llama3.2:3b",
chunk_size=2048,
chunk_overlap=400,
enable_summarization=True
)
# Process single document
documents = processor.process_single_document(Path("document.pdf"))
# Batch process with metrics
documents, metrics = processor.process_documents("/path/to/docs")Each processed document chunk now includes:
- File Information: Size, type, timestamps, MIME type
- Content Statistics: Word count, character count, content hash
- Processing Info: Chunk index, total chunks, processing timestamp
- Summarization: AI-generated summary (if enabled)
- Deduplication: Content hash for duplicate detection
-
Missing libmagic: Install system dependencies
# macOS brew install libmagic # Ubuntu/Debian sudo apt-get install libmagic1
-
Ollama Connection: Ensure Ollama is running
ollama serve
-
Memory Issues: Reduce chunk size or disable summarization
python vector_db.py --source docs --db vector_db --init --chunk-size 512 --no-summary
-
Import Errors: Ensure all dependencies are installed
pip install -r requirements.txt
For Large Document Collections:
- Use
--no-summaryfor faster processing - Increase
--chunk-sizeto reduce total chunks - Use lighter embedding models
For Better Retrieval Quality:
- Enable summarization with better models
- Use smaller chunk sizes (512-1024)
- Increase chunk overlap (200-400)
Contributions are welcome! Please read our Contributing Guidelines for details on how to submit pull requests, report issues, or suggest improvements.
The new architecture makes it easy to:
- Add support for new document types
- Customize processing pipelines
- Integrate additional AI models
- Extend metadata extraction
This project is licensed under the MIT License. See the LICENSE file for details.
If you encounter issues or have questions, please file an issue on the GitHub Issues page.
For the new advanced features, check the processing reports and logs for detailed debugging information.