Skip to content

mirsaidl/go-rag-api

Repository files navigation

RAG System (Go + Qdrant + Novita)

Retrieval-Augmented Generation stack written in Go. It ingests markdown notes into Qdrant, uses Novita-hosted OpenAI-compatible APIs for embeddings and LLM completions, and serves both REST + SSE endpoints plus a lightweight browser UI.

Highlights

  • End-to-end Go implementation (API, ingestion CLI, services) with Gin.
  • Streaming /chat/stream endpoint built on Server-Sent Events.
  • Qdrant vector store bootstrap + upserts with cosine similarity.
  • Markdown-aware ingestion with recursive character splitting (LangChainGo) and deduplicated chunk IDs.
  • Minimal frontend (frontend/) for manual testing and demoing the RAG loop.
  • Dockerfile + Compose stack for production-style deployment.
  • Swagger docs (/swagger/index.html) generated via swag init.

Architecture

Markdown → Ingestion CLI → Embeddings (Novita) → Qdrant
                                                        ↘
Browser / API client → Gin API → Retrieval → Context → Novita LLM → Answer (streamed or blocking)

Repository Layout

backend/            HTTP API, config, services
cmd/ingest/         CLI entrypoint for bulk ingestion
frontend/           Vanilla JS UI served at /ui/
Dockerfile          Production image (API + UI)
docker-compose.yml  API + Qdrant dev/prod stack

Quick Start

  1. Prerequisites
    • Go 1.24+
    • Docker (for Qdrant or containerized runs)
    • Novita account + API key (or compatible OpenAI API)
  2. Clone + install deps
    git clone https://github.com/mirsaidl/go-rag-api && cd go-rag-api
    go mod download
  3. Configure environment Create .env (copy .env.example) with:
    APP_NAME=RAG System
    APP_PORT=8080
    NOVITA_API_KEY=your_key
    NOVITA_EMBEDDING_MODEL=baai/bge-m3
    NOVITA_LLM_MODEL=openai/gpt-oss-120b
    NOVITA_BASE_URL=https://api.novita.ai/openai/v1
    QDRANT_HOST=localhost
    QDRANT_PORT=6334
    COLLECTION_NAME=rag_collection
    VECTOR_SIZE=1024
    TOP_K=10
  4. Run Qdrant locally
    docker run -d --name qdrant \
      -p 6333:6333 -p 6334:6334 \
      -v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
      qdrant/qdrant
  5. Start the API
    go run ./backend
  6. Open the UI
    • Browser: http://localhost:8080/ui/
    • Swagger: http://localhost:8080/swagger/index.html
    • Health check: curl http://localhost:8080/health

Environment Variables

Key Description Default
APP_NAME / APP_PORT Branding + port for Gin server RAG System / 8080
NOVITA_API_KEY API key for embeddings + chat required
NOVITA_EMBEDDING_MODEL Novita embedding model id baai/bge-m3
NOVITA_LLM_MODEL Novita LLM id used for chat openai/gpt-oss-120b
NOVITA_BASE_URL OpenAI-compatible base URL https://api.novita.ai/openai/v1
QDRANT_HOST / QDRANT_PORT Qdrant endpoint the API hits localhost / 6334
COLLECTION_NAME Qdrant collection rag_collection
VECTOR_SIZE Vector dimensionality; must match model 1024
TOP_K Retrieval depth per query 10
DEBUG Toggle verbose logs in config true

Ingestion CLI (cmd/ingest)

Recursive markdown ingestor that chunks files, calls Novita for embeddings, and upserts straight into Qdrant.

go run ./cmd/ingest \
  --input ./data \
  --chunk-size 800 \
  --chunk-overlap 200 \
  --batch-size 16 \
  --timeout 5m

Flags:

Flag Purpose Default
--input Directory with .md/.markdown/.mdx files ./docs
--chunk-size Characters per chunk 2000
--chunk-overlap Characters of overlap 200
--batch-size Qdrant upsert batch size 16
--timeout Hard stop for ingestion run 5m

Under the hood:

  • Uses textsplitter.NewRecursiveCharacter for vocabulary-aware chunking.
  • Deduplicates on a SHA-256 derived chunk id (source|index|text).
  • Automatically creates the target collection (cosine distance) if missing.

API Surface

Method Path Description
GET /health Liveness probe with app metadata.
POST /embed Returns raw embedding vector for any text.
POST /retrieve Retrieves TOP_K passages from Qdrant (formatted string payload).
POST /chat Blocking RAG call: retrieval + Novita answer.
POST /chat/stream SSE stream: emits context, token, and final events.

Swagger annotations live inline in backend/main.go. Regenerate docs after changing handlers:

~/go/bin/swag init --parseDependency --parseInternal

For swagger access go to he /swagger/index.html

Frontend UI

Served from /ui/ by the same Gin app:

  • Ask button hits /chat.
  • Ask & Stream uses the SSE endpoint and renders token deltas.
  • Side pane shows the retrieved context verbatim, helping debug relevance.

Docker & Compose

Single image

docker build -t rag-system:prod .
docker run -d --name rag-api \
  --env-file .env \
  -p 8080:8080 \
  rag-system:prod

Ensure QDRANT_HOST resolves from inside the container (e.g. host.docker.internal, proxied hostname, or Compose service name).

docker-compose

docker-compose.yml wires the API + Qdrant with persistent storage:

export QDRANT_STORAGE_DIR=/absolute/path/to/qdrant_storage
docker compose up -d
  • API → http://localhost:8080
  • Qdrant REST UI → http://localhost:6333
  • Stop + clean (containers only): docker compose down
  • Data remains in ${QDRANT_STORAGE_DIR}

Development Workflow

  • Run API locally: go run ./backend
  • Lint/Test: go test ./... (add packages as the project grows)
  • Swagger refresh: swag init --parseDependency --parseInternal
  • Hot reload: rely on gin or your editor’s debugger; no special scripts included.

Troubleshooting

  • NOVITA_API_KEY is not configured: ensure .env is loaded; backend/config uses godotenv.
  • embedding provider returned 4xx/5xx: check model ids, quota, or base URL.
  • Qdrant connection failures: confirm ports 6333/6334 are open and the collection vector size matches the embedding model’s dimension.
  • Empty context in answers: ingestion may not have run or embeddings exceeded rate limits.

About

Retrieval-Augmented Generation stack written in Go. It ingests markdown notes into Qdrant, uses Novita-hosted OpenAI-compatible APIs for embeddings and LLM completions, and serves both REST + SSE endpoints plus a lightweight browser UI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors