RAG System (Go + Qdrant + Novita)

Retrieval-Augmented Generation stack written in Go. It ingests markdown notes into Qdrant, uses Novita-hosted OpenAI-compatible APIs for embeddings and LLM completions, and serves both REST + SSE endpoints plus a lightweight browser UI.

Highlights

End-to-end Go implementation (API, ingestion CLI, services) with Gin.
Streaming /chat/stream endpoint built on Server-Sent Events.
Qdrant vector store bootstrap + upserts with cosine similarity.
Markdown-aware ingestion with recursive character splitting (LangChainGo) and deduplicated chunk IDs.
Minimal frontend (frontend/) for manual testing and demoing the RAG loop.
Dockerfile + Compose stack for production-style deployment.
Swagger docs (/swagger/index.html) generated via swag init.

Architecture

Markdown → Ingestion CLI → Embeddings (Novita) → Qdrant
                                                        ↘
Browser / API client → Gin API → Retrieval → Context → Novita LLM → Answer (streamed or blocking)

Repository Layout

backend/            HTTP API, config, services
cmd/ingest/         CLI entrypoint for bulk ingestion
frontend/           Vanilla JS UI served at /ui/
Dockerfile          Production image (API + UI)
docker-compose.yml  API + Qdrant dev/prod stack

Quick Start

Prerequisites
- Go 1.24+
- Docker (for Qdrant or containerized runs)
- Novita account + API key (or compatible OpenAI API)

Clone + install deps

git clone https://github.com/mirsaidl/go-rag-api && cd go-rag-api
go mod download

Configure environment Create .env (copy .env.example) with:

APP_NAME=RAG System
APP_PORT=8080
NOVITA_API_KEY=your_key
NOVITA_EMBEDDING_MODEL=baai/bge-m3
NOVITA_LLM_MODEL=openai/gpt-oss-120b
NOVITA_BASE_URL=https://api.novita.ai/openai/v1
QDRANT_HOST=localhost
QDRANT_PORT=6334
COLLECTION_NAME=rag_collection
VECTOR_SIZE=1024
TOP_K=10

Run Qdrant locally

docker run -d --name qdrant \
  -p 6333:6333 -p 6334:6334 \
  -v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
  qdrant/qdrant

Start the API
```
go run ./backend
```
Open the UI
- Browser: http://localhost:8080/ui/
- Swagger: http://localhost:8080/swagger/index.html
- Health check: curl http://localhost:8080/health

Environment Variables

Key	Description	Default
`APP_NAME` / `APP_PORT`	Branding + port for Gin server	`RAG System` / `8080`
`NOVITA_API_KEY`	API key for embeddings + chat	required
`NOVITA_EMBEDDING_MODEL`	Novita embedding model id	`baai/bge-m3`
`NOVITA_LLM_MODEL`	Novita LLM id used for chat	`openai/gpt-oss-120b`
`NOVITA_BASE_URL`	OpenAI-compatible base URL	`https://api.novita.ai/openai/v1`
`QDRANT_HOST` / `QDRANT_PORT`	Qdrant endpoint the API hits	`localhost` / `6334`
`COLLECTION_NAME`	Qdrant collection	`rag_collection`
`VECTOR_SIZE`	Vector dimensionality; must match model	`1024`
`TOP_K`	Retrieval depth per query	`10`
`DEBUG`	Toggle verbose logs in config	`true`

Ingestion CLI (`cmd/ingest`)

Recursive markdown ingestor that chunks files, calls Novita for embeddings, and upserts straight into Qdrant.

go run ./cmd/ingest \
  --input ./data \
  --chunk-size 800 \
  --chunk-overlap 200 \
  --batch-size 16 \
  --timeout 5m

Flags:

Flag	Purpose	Default
`--input`	Directory with `.md/.markdown/.mdx` files	`./docs`
`--chunk-size`	Characters per chunk	`2000`
`--chunk-overlap`	Characters of overlap	`200`
`--batch-size`	Qdrant upsert batch size	`16`
`--timeout`	Hard stop for ingestion run	`5m`

Under the hood:

Uses textsplitter.NewRecursiveCharacter for vocabulary-aware chunking.
Deduplicates on a SHA-256 derived chunk id (source|index|text).
Automatically creates the target collection (cosine distance) if missing.

API Surface

Method	Path	Description
`GET /health`	Liveness probe with app metadata.
`POST /embed`	Returns raw embedding vector for any text.
`POST /retrieve`	Retrieves `TOP_K` passages from Qdrant (formatted string payload).
`POST /chat`	Blocking RAG call: retrieval + Novita answer.
`POST /chat/stream`	SSE stream: emits `context`, `token`, and `final` events.

Swagger annotations live inline in backend/main.go. Regenerate docs after changing handlers:

~/go/bin/swag init --parseDependency --parseInternal

For swagger access go to he /swagger/index.html

Frontend UI

Served from /ui/ by the same Gin app:

Ask button hits /chat.
Ask & Stream uses the SSE endpoint and renders token deltas.
Side pane shows the retrieved context verbatim, helping debug relevance.

Docker & Compose

Single image

docker build -t rag-system:prod .
docker run -d --name rag-api \
  --env-file .env \
  -p 8080:8080 \
  rag-system:prod

Ensure QDRANT_HOST resolves from inside the container (e.g. host.docker.internal, proxied hostname, or Compose service name).

docker-compose

docker-compose.yml wires the API + Qdrant with persistent storage:

export QDRANT_STORAGE_DIR=/absolute/path/to/qdrant_storage
docker compose up -d

API → http://localhost:8080
Qdrant REST UI → http://localhost:6333
Stop + clean (containers only): docker compose down
Data remains in ${QDRANT_STORAGE_DIR}

Development Workflow

Run API locally: go run ./backend
Lint/Test: go test ./... (add packages as the project grows)
Swagger refresh: swag init --parseDependency --parseInternal
Hot reload: rely on gin or your editor’s debugger; no special scripts included.

Troubleshooting

NOVITA_API_KEY is not configured: ensure .env is loaded; backend/config uses godotenv.
embedding provider returned 4xx/5xx: check model ids, quota, or base URL.
Qdrant connection failures: confirm ports 6333/6334 are open and the collection vector size matches the embedding model’s dimension.
Empty context in answers: ingestion may not have run or embeddings exceeded rate limits.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
backend		backend
cmd/ingest		cmd/ingest
frontend		frontend
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG System (Go + Qdrant + Novita)

Highlights

Architecture

Repository Layout

Quick Start

Environment Variables

Ingestion CLI (`cmd/ingest`)

API Surface

Frontend UI

Docker & Compose

Single image

docker-compose

Development Workflow

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG System (Go + Qdrant + Novita)

Highlights

Architecture

Repository Layout

Quick Start

Environment Variables

Ingestion CLI (cmd/ingest)

API Surface

Frontend UI

Docker & Compose

Single image

docker-compose

Development Workflow

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Ingestion CLI (`cmd/ingest`)

Packages