Skip to content

lechgu/tichy

Repository files navigation

Tichy

A self-contained, privacy-focused RAG (Retrieval-Augmented Generation) system in Go. All data stays local - nothing is sent to external LLM providers.

Requirements

  • Go 1.24.4+
  • Docker and Docker Compose
  • NVIDIA GPU with CUDA support (required for llama.cpp inference with default docker-compose.yml)
    • For CPU-only inference, use ghcr.io/ggerganov/llama.cpp:server image and remove the runtime: nvidia and NVIDIA environment variables from the llm and embeddings services
  • GGUF Models:
    • Main LLM model (e.g., Gemma 3 12B)
    • Embedding model (e.g., nomic-embed-text v1.5)

Quick Start

1. Prepare Models

Place your GGUF models in a directory of your choice (e.g., ~/models/llama/):

mkdir -p ~/models/llama
# Copy your models to:
# ~/models/llama/google_gemma-3-12b-it-Q8_0.gguf
# ~/models/llama/nomic-embed-text-v1.5.Q8_0.gguf

Update the volume paths in docker-compose.yml if using a different location.

2. Start Services

Start PostgreSQL, LLM server, and embeddings server:

docker compose up -d

Verify services are running:

docker compose ps

3. Configure Environment

Copy and configure the environment file:

cp examples/insurellm/.env .env
# Edit .env if needed to adjust URLs, ports, or chunk sizes

4. Build and Run

Build the application:

make build

Or use Docker to run commands without building locally:

docker compose run --rm tichy db up
docker compose run --rm tichy ingest --source /mnt/cwd/examples/insurellm/knowledge-base/ --mode text

Initialize the database:

./tichy db up

Ingest documents:

./tichy ingest --source ./examples/insurellm/knowledge-base/ --mode text

5. Start Chatting

Start an interactive chat session:

./tichy chat

Or with markdown rendering:

./tichy chat --markdown

Usage Examples

Ingest Documents

./tichy ingest --source ./path/to/documents/ --mode text

Interactive Chat

./tichy chat
> When InsureLLM was founded?

Generate Tests

./tichy tests generate --num 20 --output tests.json

Evaluate RAG Performance

./tichy tests evaluate --input tests.json

Services

  • PostgreSQL + pgvector: Vector database (port 5432)
  • LLM Server: llama.cpp inference server (port 8080)
  • Embeddings Server: llama.cpp embeddings server (port 8081)

Configuration

Key environment variables in .env:

  • DATABASE_URL: PostgreSQL connection string
  • LLM_SERVER_URL: LLM inference endpoint
  • EMBEDDING_SERVER_URL: Embeddings endpoint
  • SYSTEM_PROMPT_TEMPLATE: Path to system prompt template
  • CHUNK_SIZE: Document chunk size (default: 500)
  • CHUNK_OVERLAP: Chunk overlap (default: 100)
  • TOP_K: Number of results to retrieve (default: 10)

Acknowledgments

The example insurance knowledge base in examples/insurellm/ is derived from the dataset provided by LLM Engineering course.

License

BSD 3-Clause - see LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages