A self-contained, privacy-focused RAG (Retrieval-Augmented Generation) system in Go. All data stays local - nothing is sent to external LLM providers.
- Go 1.24.4+
- Docker and Docker Compose
- NVIDIA GPU with CUDA support (required for llama.cpp inference with default docker-compose.yml)
- For CPU-only inference, use
ghcr.io/ggerganov/llama.cpp:serverimage and remove theruntime: nvidiaand NVIDIA environment variables from the llm and embeddings services
- For CPU-only inference, use
- GGUF Models:
- Main LLM model (e.g., Gemma 3 12B)
- Embedding model (e.g., nomic-embed-text v1.5)
Place your GGUF models in a directory of your choice (e.g., ~/models/llama/):
mkdir -p ~/models/llama
# Copy your models to:
# ~/models/llama/google_gemma-3-12b-it-Q8_0.gguf
# ~/models/llama/nomic-embed-text-v1.5.Q8_0.ggufUpdate the volume paths in docker-compose.yml if using a different location.
Start PostgreSQL, LLM server, and embeddings server:
docker compose up -dVerify services are running:
docker compose psCopy and configure the environment file:
cp examples/insurellm/.env .env
# Edit .env if needed to adjust URLs, ports, or chunk sizesBuild the application:
make buildOr use Docker to run commands without building locally:
docker compose run --rm tichy db up
docker compose run --rm tichy ingest --source /mnt/cwd/examples/insurellm/knowledge-base/ --mode textInitialize the database:
./tichy db upIngest documents:
./tichy ingest --source ./examples/insurellm/knowledge-base/ --mode textStart an interactive chat session:
./tichy chatOr with markdown rendering:
./tichy chat --markdown./tichy ingest --source ./path/to/documents/ --mode text./tichy chat
> When InsureLLM was founded?./tichy tests generate --num 20 --output tests.json./tichy tests evaluate --input tests.json- PostgreSQL + pgvector: Vector database (port 5432)
- LLM Server: llama.cpp inference server (port 8080)
- Embeddings Server: llama.cpp embeddings server (port 8081)
Key environment variables in .env:
DATABASE_URL: PostgreSQL connection stringLLM_SERVER_URL: LLM inference endpointEMBEDDING_SERVER_URL: Embeddings endpointSYSTEM_PROMPT_TEMPLATE: Path to system prompt templateCHUNK_SIZE: Document chunk size (default: 500)CHUNK_OVERLAP: Chunk overlap (default: 100)TOP_K: Number of results to retrieve (default: 10)
The example insurance knowledge base in examples/insurellm/ is derived from the dataset provided by LLM Engineering course.
BSD 3-Clause - see LICENSE for details.