Skip to content

Local RAG pipeline using LlamaIndex with Ollama. Indexes documents and answers natural language questions, running entirely offline.

License

Notifications You must be signed in to change notification settings

JoshuaRamirez/VectorBot

Repository files navigation

Vector Bot

CI Security Test Coverage Tests License: MIT

Vector Bot is a fully local Retrieval-Augmented Generation (RAG) pipeline using LlamaIndex with Ollama. Ask natural language questions about your documents, with everything running offline on your computer.

📖 Documentation

Complete documentation is available in the docs/ directory:

Features

  • 100% Local: No cloud APIs, no telemetry, fully offline after installation
  • Multi-Environment: Development, production, and Docker configurations
  • Executable Distribution: Single-file deployment with no Python required
  • Document Support: PDF, Markdown, text, JSON, CSV files
  • Persistent Storage: Indexes are saved to disk for fast subsequent queries
  • Clean CLI: Simple command-line interface with doctor, ingest, and query commands
  • CI/CD Pipeline: Automated testing, building, and PyPI publishing
  • Comprehensive Testing: 324 passing tests with 99% code coverage
  • Security Scanning: Automated vulnerability detection and CodeQL analysis

Prerequisites

  1. Python 3.10+ installed

  2. Ollama installed and running

    # Check Ollama version
    ollama --version
    
    # Start Ollama server (if not running)
    ollama serve
    
    # List installed models
    ollama list
  3. At least one chat model installed in Ollama:

    # If you don't have any models, install one:
    ollama pull llama3.1

Quick Start

Choose one of the following install methods, then follow the same steps to verify, index, and query your documents:

Node.js Package (npm) — Recommended for most users

# Install globally
npm install -g @joshuaramirez/vector-bot

# Run commands
vector-bot doctor
vector-bot ingest
vector-bot query "What is this document about?"

# Or run without installing
npx @joshuaramirez/vector-bot --help

Python Package (pip)

# Install from PyPI
pip install vector-bot

Download Executable (manual)


After installing with any method above:

  1. Install Ollama and a chat model:

    # Install Ollama from https://ollama.ai
    ollama pull llama3.1
    ollama pull nomic-embed-text
  2. Verify setup:

    vector-bot doctor
  3. Add documents and index:

    mkdir docs
    cp your-files.pdf docs/
    vector-bot ingest
  4. Ask questions:

    vector-bot query "What is this document about?"

See User Guide for complete instructions.

From Source

# Clone and enter the project directory
cd vector-bot

# Copy environment template
cp .env.example .env

# Edit .env and set OLLAMA_CHAT_MODEL to one of your installed models
# For example: OLLAMA_CHAT_MODEL=llama3.1

# Install the package
pip install -e .

# Or with development dependencies
pip install -e ".[dev]"

Example Workflow

# 1. Check system status
vector-bot doctor

# 2. Add your documents
mkdir docs
cp *.pdf docs/
cp *.md docs/

# 3. Index documents
vector-bot ingest

# 4. Ask questions
vector-bot query "What are the main topics covered?"
vector-bot query "Summarize the key findings" --show-sources
vector-bot query "What does the document say about security?" --k 8

# 5. Use different environments
vector-bot --env production ingest
vector-bot --env development query "How do I deploy this?"

Windows Instructions

For Windows users without make, use these Python commands directly:

# Install
pip install -e .

# Check health
python -m rag.cli doctor

# Ingest documents
python -m rag.cli ingest

# Query
python -m rag.cli query "Your question here"

# Run smoke test
python scripts/rag_smoke.py

# Run unit tests
pytest tests/ -v

Configuration

Edit .env or set environment variables:

  • DOCS_DIR: Directory containing documents (default: ./docs)
  • INDEX_DIR: Directory for storing index (default: ./index_storage)
  • OLLAMA_BASE_URL: Ollama server URL (default: http://localhost:11434)
  • OLLAMA_CHAT_MODEL: Chat model to use (no default - uses auto-detection)
  • OLLAMA_EMBED_MODEL: Embedding model (default: nomic-embed-text)
  • SIMILARITY_TOP_K: Number of similar chunks to retrieve (default: 4)

Project Structure

.
├── src/rag/          # Main package (vector-bot)
│   ├── cli.py        # CLI interface
│   ├── config.py     # Configuration management
│   ├── ingest.py     # Document ingestion
│   ├── query.py      # Query engine
│   └── ollama_check.py # Ollama health checks
├── scripts/          # Utility scripts
│   └── rag_smoke.py  # Smoke test
├── tests/            # Unit tests
├── docs/             # Your documents go here
└── index_storage/    # Persisted vector index

Troubleshooting

Ollama Server Not Running

# Start Ollama
ollama serve

# Check if it's running (should show version info)
curl http://localhost:11434/api/version

No Models Found

# List available models
ollama list

# Pull a model (but NOT required if you already have one)
ollama pull llama3.1

Port Conflicts

If Ollama is running on a different port, update .env:

OLLAMA_BASE_URL=http://localhost:YOUR_PORT

Large Files Skipped

Files over 20MB are automatically skipped during ingestion. Split large documents or adjust the limit in the code if needed.

Missing Embedding Model

The default embedding model is nomic-embed-text. If not installed:

ollama pull nomic-embed-text

# Or use an alternative like mxbai-embed-large
ollama pull mxbai-embed-large
# Then update .env: OLLAMA_EMBED_MODEL=mxbai-embed-large

Important Notes

  • No Auto-Pull for Chat Models: This tool will NEVER automatically download chat models. It only uses models you've already installed.
  • Idempotent Ingestion: Re-running ingest is safe and won't duplicate data.
  • Fully Offline: After pip install, no internet connection is required.
  • Local Only: All operations use localhost - no external API calls.

Multi-Environment Support

The application supports different deployment environments:

# Development (default)
vector-bot doctor

# Production deployment
vector-bot --env production doctor

# Docker deployment  
vector-bot --env docker doctor

# Show current configuration
vector-bot --config-info --env production

See Deployment Guide for detailed multi-environment setup.

Building Executable

To create a standalone executable:

# Build executable (all platforms)
make build-exe

# Or manually
python build_executable.py

This creates a single executable file in dist/vector-bot (or dist/vector-bot.exe on Windows) that includes all dependencies and configuration files. The target system only needs:

  • Ollama installed and running
  • No Python installation required

Development

# Run all tests
make test
pytest tests/ -v

# Run unit tests only
pytest tests/unit/ -v

# Run with coverage
pytest tests/ --cov=src/rag --cov-report=html

# Type checking
mypy src/

# Linting
ruff check src/

# Run security checks
safety check
bandit -r src/

# Use the test runner
python run_tests.py

# Clean generated files (including build artifacts)
make clean

Testing

The project includes comprehensive testing:

  • 114 unit tests passing (99.1% pass rate)
  • 99% code coverage across all modules
  • 100% mocked external dependencies - tests run offline
  • Professional test structure following best practices
  • CI-ready - all tests run in under 20 seconds

See Testing Documentation for detailed testing documentation.

License

MIT

About

Local RAG pipeline using LlamaIndex with Ollama. Indexes documents and answers natural language questions, running entirely offline.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •