Retrieval-Augmented Generation stack written in Go. It ingests markdown notes into Qdrant, uses Novita-hosted OpenAI-compatible APIs for embeddings and LLM completions, and serves both REST + SSE endpoints plus a lightweight browser UI.
- End-to-end Go implementation (API, ingestion CLI, services) with Gin.
- Streaming
/chat/streamendpoint built on Server-Sent Events. - Qdrant vector store bootstrap + upserts with cosine similarity.
- Markdown-aware ingestion with recursive character splitting (LangChainGo) and deduplicated chunk IDs.
- Minimal frontend (
frontend/) for manual testing and demoing the RAG loop. - Dockerfile + Compose stack for production-style deployment.
- Swagger docs (
/swagger/index.html) generated viaswag init.
Markdown → Ingestion CLI → Embeddings (Novita) → Qdrant
↘
Browser / API client → Gin API → Retrieval → Context → Novita LLM → Answer (streamed or blocking)
backend/ HTTP API, config, services
cmd/ingest/ CLI entrypoint for bulk ingestion
frontend/ Vanilla JS UI served at /ui/
Dockerfile Production image (API + UI)
docker-compose.yml API + Qdrant dev/prod stack
- Prerequisites
- Go 1.24+
- Docker (for Qdrant or containerized runs)
- Novita account + API key (or compatible OpenAI API)
- Clone + install deps
git clone https://github.com/mirsaidl/go-rag-api && cd go-rag-api go mod download
- Configure environment
Create
.env(copy.env.example) with:APP_NAME=RAG System APP_PORT=8080 NOVITA_API_KEY=your_key NOVITA_EMBEDDING_MODEL=baai/bge-m3 NOVITA_LLM_MODEL=openai/gpt-oss-120b NOVITA_BASE_URL=https://api.novita.ai/openai/v1 QDRANT_HOST=localhost QDRANT_PORT=6334 COLLECTION_NAME=rag_collection VECTOR_SIZE=1024 TOP_K=10
- Run Qdrant locally
docker run -d --name qdrant \ -p 6333:6333 -p 6334:6334 \ -v "$(pwd)/qdrant_storage:/qdrant/storage:z" \ qdrant/qdrant - Start the API
go run ./backend
- Open the UI
- Browser:
http://localhost:8080/ui/ - Swagger:
http://localhost:8080/swagger/index.html - Health check:
curl http://localhost:8080/health
- Browser:
| Key | Description | Default |
|---|---|---|
APP_NAME / APP_PORT |
Branding + port for Gin server | RAG System / 8080 |
NOVITA_API_KEY |
API key for embeddings + chat | required |
NOVITA_EMBEDDING_MODEL |
Novita embedding model id | baai/bge-m3 |
NOVITA_LLM_MODEL |
Novita LLM id used for chat | openai/gpt-oss-120b |
NOVITA_BASE_URL |
OpenAI-compatible base URL | https://api.novita.ai/openai/v1 |
QDRANT_HOST / QDRANT_PORT |
Qdrant endpoint the API hits | localhost / 6334 |
COLLECTION_NAME |
Qdrant collection | rag_collection |
VECTOR_SIZE |
Vector dimensionality; must match model | 1024 |
TOP_K |
Retrieval depth per query | 10 |
DEBUG |
Toggle verbose logs in config | true |
Recursive markdown ingestor that chunks files, calls Novita for embeddings, and upserts straight into Qdrant.
go run ./cmd/ingest \
--input ./data \
--chunk-size 800 \
--chunk-overlap 200 \
--batch-size 16 \
--timeout 5mFlags:
| Flag | Purpose | Default |
|---|---|---|
--input |
Directory with .md/.markdown/.mdx files |
./docs |
--chunk-size |
Characters per chunk | 2000 |
--chunk-overlap |
Characters of overlap | 200 |
--batch-size |
Qdrant upsert batch size | 16 |
--timeout |
Hard stop for ingestion run | 5m |
Under the hood:
- Uses
textsplitter.NewRecursiveCharacterfor vocabulary-aware chunking. - Deduplicates on a SHA-256 derived chunk id (
source|index|text). - Automatically creates the target collection (cosine distance) if missing.
| Method | Path | Description |
|---|---|---|
GET /health |
Liveness probe with app metadata. | |
POST /embed |
Returns raw embedding vector for any text. | |
POST /retrieve |
Retrieves TOP_K passages from Qdrant (formatted string payload). |
|
POST /chat |
Blocking RAG call: retrieval + Novita answer. | |
POST /chat/stream |
SSE stream: emits context, token, and final events. |
Swagger annotations live inline in backend/main.go. Regenerate docs after changing handlers:
~/go/bin/swag init --parseDependency --parseInternalFor swagger access go to he /swagger/index.html
Served from /ui/ by the same Gin app:
- Ask button hits
/chat. - Ask & Stream uses the SSE endpoint and renders token deltas.
- Side pane shows the retrieved context verbatim, helping debug relevance.
docker build -t rag-system:prod .
docker run -d --name rag-api \
--env-file .env \
-p 8080:8080 \
rag-system:prodEnsure QDRANT_HOST resolves from inside the container (e.g. host.docker.internal, proxied hostname, or Compose service name).
docker-compose.yml wires the API + Qdrant with persistent storage:
export QDRANT_STORAGE_DIR=/absolute/path/to/qdrant_storage
docker compose up -d- API →
http://localhost:8080 - Qdrant REST UI →
http://localhost:6333 - Stop + clean (containers only):
docker compose down - Data remains in
${QDRANT_STORAGE_DIR}
- Run API locally:
go run ./backend - Lint/Test:
go test ./...(add packages as the project grows) - Swagger refresh:
swag init --parseDependency --parseInternal - Hot reload: rely on
ginor your editor’s debugger; no special scripts included.
NOVITA_API_KEY is not configured: ensure.envis loaded;backend/configusesgodotenv.embedding provider returned 4xx/5xx: check model ids, quota, or base URL.- Qdrant connection failures: confirm ports
6333/6334are open and the collection vector size matches the embedding model’s dimension. - Empty context in answers: ingestion may not have run or embeddings exceeded rate limits.