A CLI-based Retrieval-Augmented Generation (RAG) tool for Markdown documents. Index a directory of Markdown files into a local DuckDB database, then ask natural-language questions answered by a local LLM via an OpenAI-compatible API (e.g., LM Studio).
Japanese and English mixed documents are fully supported with NFKC normalization and mixed-script token estimation.
- Index: recursively scans a directory for
*.mdfiles, chunks them, embeds with a local model, and stores vectors in DuckDB. - Ask: embeds your question, finds the most relevant chunks, expands context with adjacent chunks, and streams an LLM-generated answer.
- Serve: starts a local HTTP API server (
POST /api/askSSE,GET /api/status) with an embedded Web UI that renders Markdown answers in the browser. - Incremental updates: re-indexing only re-processes files whose content has changed (SHA-256 hash check).
- Japanese support: Unicode NFKC normalization, Japanese sentence boundaries for chunking, mixed CJK/ASCII token estimation.
- Context window expansion: each vector-search hit is expanded by ±N adjacent chunks for continuity; overlapping spans are deduplicated automatically.
| Tool | Version | Purpose |
|---|---|---|
| Go | 1.26+ | Build toolchain |
| make | any | Build automation |
| cc (clang/gcc) | any | CGo (required by go-duckdb) |
| LM Studio | any | Local LLM inference server |
See docs/setup.md for full setup instructions.
git clone <repo-url> lite-rag
cd lite-rag
# Copy and configure
mkdir -p ~/.config/lite-rag
cp config.example.toml ~/.config/lite-rag/config.toml
$EDITOR ~/.config/lite-rag/config.toml # set api.base_url, api.api_key, models.*
# Install git hooks and linter
make setup
# Build
make build
# Index a directory of Markdown files
./dist/lite-rag index --dir /path/to/docs
# Index a single file
./dist/lite-rag index --file /path/to/doc.md
# Ask a question (CLI)
./dist/lite-rag ask "How do I configure the retry policy?"
# Or start the Web UI server
./dist/lite-rag serve # opens at http://127.0.0.1:8080Copy config.example.toml to ~/.config/lite-rag/config.toml and adjust:
[api]
base_url = "http://localhost:1234/v1" # LM Studio default
api_key = "lm-studio" # arbitrary; LM Studio does not validate
[models]
embedding = "nomic-ai/nomic-embed-text-v1.5-GGUF"
chat = "openai/gpt-oss-20b"
[database]
path = "./lite-rag.db"
[retrieval]
top_k = 5 # vector search hits
context_window = 1 # adjacent chunks to expand around each hit
chunk_size = 512
chunk_overlap = 64
query_rewrite = false # enable LLM-assisted query rewriting (improves recall, +~2 s/query)
[server]
addr = "127.0.0.1:8080" # listen address for `serve`
log_level = "info" # info | debug | warn | errorEnvironment variables override file settings:
| Variable | Overrides |
|---|---|
LITE_RAG_API_BASE_URL |
api.base_url |
LITE_RAG_API_KEY |
api.api_key |
LITE_RAG_EMBEDDING_MODEL |
models.embedding |
LITE_RAG_CHAT_MODEL |
models.chat |
LITE_RAG_DB_PATH |
database.path |
lite-rag [--config <path>] [--db <path>] <command>
Commands:
index --dir <directory> Index all *.md files under a directory
--file <file> Index a single file (any extension)
ask <question> Answer a question using the indexed documents
serve Start the HTTP API server with embedded Web UI
docs Manage indexed documents
reindex Re-embed documents after changing the embedding model
version Print version information
Global flags:
--config <path> Config file path (default: ~/.config/lite-rag/config.toml)
--db <path> Database file path (overrides config database.path)
# Index all *.md files under a directory (recursive)
./dist/lite-rag index --dir ./docs
# Index a single file (any extension)
./dist/lite-rag index --file ./docs/notes.md
./dist/lite-rag index --file ./release-notes.txt--dir: walks the directory recursively; processes only*.mdfiles.--file: indexes the specified file directly, regardless of extension.- Skips files whose SHA-256 hash matches the stored value (no re-embedding).
--dirper-file errors are logged and do not abort the overall run;--fileerrors are returned immediately.
# Re-embed all documents after changing models.embedding in the config
./dist/lite-rag reindex
# Target a specific database
./dist/lite-rag --db ./project.db reindex- Finds all documents whose stored
embedding_modeldiffers from the current config. - Re-embeds them using the chunk text already in the database — source files do not need to exist on disk.
- Updates only the embedding vectors and
embedding_model; chunk content and file hashes are unchanged.
./dist/lite-rag ask "What is the default chunk size?"
./dist/lite-rag --config /etc/lite-rag.toml ask "Installation steps?"
./dist/lite-rag --db ./project-b.db ask "What is the default chunk size?"
# JSON output (answer + sources as a single JSON object)
./dist/lite-rag ask --json "What is the default chunk size?"- Embeds the question, searches DuckDB for the top-K similar chunks.
- Expands each hit by ±
context_windowadjacent chunks. - Streams the LLM answer to stdout.
- Set
query_rewrite = trueto enable multilingual query rewriting: the LLM rewrites the query into both Japanese and English declarative statements, and three parallel vector searches (original + JA + EN) are merged, improving recall across multilingual document collections (improves score in ~88% of queries; adds ~2 s latency). --json: buffers the full answer and outputs a single JSON object. Progress messages are suppressed; stdout contains only valid JSON.
{
"answer": "The default chunk size is 512 tokens.",
"sources": [
{"file_path": "docs/README.md", "heading_path": "Configuration", "score": 0.872}
]
}./dist/lite-rag serve # listen on 127.0.0.1:8080 (default)
./dist/lite-rag serve --addr 0.0.0.0:9090Starts a local HTTP server. Open http://127.0.0.1:8080 in a browser.
| Endpoint | Method | Description |
|---|---|---|
/api/ask |
POST | SSE-streamed answer ({"query":"..."}) |
/api/status |
GET | Health check and version |
/ |
GET | Embedded Web UI |
Manage documents stored in the index database.
# List all indexed documents (text table)
./dist/lite-rag docs list
# List as JSON (machine-readable)
./dist/lite-rag docs list --json
# Show reconstructed content of a document by ID
./dist/lite-rag docs show <document-id>
# Delete a document and all its chunks
./dist/lite-rag docs delete <document-id><document-id> is the 64-character SHA-256 hex shown in docs list.
# Current platform
make build
# All darwin platforms (macOS host)
make cross-build-darwin
# Linux platforms (requires podman or docker, or run on a Linux host)
make cross-build-linuxBinaries are placed in dist/. See docs/setup.md for cross-compilation details.
make test # run all tests
make vet # go vet
make lint # golangci-lint
make check # full quality gate: vet + lint + test + buildGit hooks installed by make setup run make check automatically before each
commit and push.
See docs/design/architecture.md for the full design.
index command
└─ Indexer: walk → normalize → chunk → embed → DuckDB
ask command
└─ Retriever: embed query → SimilarChunks → AdjacentChunks → LLM Chat
MIT
日本語ドキュメント: docs/ja/README.md