Local-first legal RAG over EU directives, with Article-level citations and source graph traversal.
LEX lets you ask natural-language questions over dense EU legislation and get grounded answers backed by specific Articles, Recitals, and citation links.
uv run lex ask "What is the purpose of the European Electronic Communications Code?"A: The European Electronic Communications Code establishes a harmonised
framework for the regulation of electronic communications networks,
electronic communications services, associated facilities, and associated
services in the Union. It sets out tasks for national regulatory authorities
and other competent authorities, and creates procedures to ensure consistent
application of the regulatory framework throughout the internal market.
Sources: Art. 1(1) · Art. 1(2) · Art. 3(1)
| Ask and answer | Citations and source graph |
|---|---|
![]() |
![]() |
The chat interface answers legal questions with source-backed citations. The citation panel and graph expose where the answer came from and how the cited parts of the directive relate to one another.
EU directives are cross-referenced documents. Definitions, obligations, exceptions, and operative scope are often spread across Articles, Recitals, and referenced instruments.
LEX treats legal answering as a retrieval problem first and a generation problem second: retrieve the relevant legal text, rerank it, traverse citation links, then generate an answer grounded in the selected sources.
- Answers questions over EU directives
- Grounds answers in Article-level and Recital-level citations
- Uses hybrid dense/BM25 retrieval, reranking, HyDE, and citation graph traversal
- Runs locally with quantised models, Qdrant, Redis, FastAPI, and Chainlit
- Exposes one typed engine through CLI, API, UI, worker, and tests
query
│
├─ classify (definition / procedural / negative / general)
│
├─ HyDE expansion via LLM [skipped for definition + negative]
│ "What would a directive say to answer this?"
│
├─ BGE-M3 dense embed (1024-dim CLS)
├─ BM25 sparse (IDF-weighted, same tokenizer vocabulary)
│
├─ Qdrant hybrid search, RRF fusion → top-20 candidates
│
├─ BGE-reranker-v2-m3 cross-encoder → top-5
│ + chunk-type boost (article 1.0 · recital 0.85)
│ + one-hop citation graph traversal
│
├─ RAG prompt → LLM → streaming answer
│
└─ two-stage retrieval: extract article refs from draft
→ fetch missing articles → regenerate if context enriched
Every entry point, including CLI, API, UI, tests, and workers, builds a typed
Command and submits it to a central Engine. Handlers are pure functions
with explicit typed I/O. The entire system contract fits in one file
(commands.py). Adding a new operation means adding one handler file and one
routing case.
For a detailed account of non-trivial decisions, see
ENGINEERING_REPORT.md.
┌─────────┐
IngestCmd │ │
───────────►│ Queued │ (status stored in Redis)
│ │
└────┬────┘
│ worker picks up
┌────▼────┐
│Fetching │──── network error ────┐
└────┬────┘ │
│ XML received │
┌────▼────┐ │
│ Parsing │──── malformed XML ────┤
└────┬────┘ │
│ nodes extracted │
┌────▼────┐ │
│Chunking │ │
└────┬────┘ │
│ chunks created │
┌────▼──────┐ │
│ Embedding │ │
└────┬──────┘ │
│ vectors computed │
┌────▼────┐ │
│ Writing │──── Qdrant error ─────┤
└────┬────┘ │
│ stored ┌───▼────┐
┌────▼────┐ │ Failed │
│ Done │ └────────┘
└─────────┘
(pub/sub notification)
| Requirement | Notes |
|---|---|
| Python 3.12+ | Managed by uv, no manual install needed |
| uv | curl -LsSf https://astral.sh/uv/install.sh | sh |
| Docker + Compose v2 | Qdrant vector DB and Redis job queue |
| NVIDIA GPU, 8 GB VRAM | Recommended. CPU fallback available. |
| NVIDIA Container Toolkit | Linux GPU + Docker production path only |
Docker Compose v2 ships with Docker Desktop. On Ubuntu with docker.io:
sudo mkdir -p /usr/local/lib/docker/cli-plugins
sudo curl -SL https://github.com/docker/compose/releases/latest/download/docker-compose-linux-x86_64 \
-o /usr/local/lib/docker/cli-plugins/docker-compose
sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-composegit clone https://github.com/spartacoos/LEX.git && cd LEX
# Pick ONE extra matching your platform:
uv sync --extra llm-mlx # macOS Apple Silicon
uv sync --extra llm-llamacpp # Linux + NVIDIA GPU
uv sync --extra llm-llamacpp-cpu # Linux CPU-only / WSL
# Linux/WSL: expose venv CUDA libraries, once per shell
source scripts/env.shuv run lex config # interactive wizard, writes .env
uv run lex config --list # show available profiles
uv run lex config --profile qwen35-9b-gpu --judge gemma4-31b-gpuProfiles live in profiles/*.yaml. Each encodes a hardware/model
combination. Adding a new model means adding a new YAML file.
| Profile | Model | VRAM | Notes |
|---|---|---|---|
gemma4-e2b-cpu |
Gemma 4 E2B Q4 | CPU | Works everywhere |
gemma4-e2b-gpu |
Gemma 4 E2B Q4 | 4 GB | Fastest GPU option |
gemma4-e4b-gpu |
Gemma 4 E4B Q4 | 8 GB | Recommended default |
gemma4-e4b-mlx |
Gemma 4 E4B 4bit | Metal | Recommended on Mac |
gemma4-31b-gpu |
Gemma 4 31B Q4 | 24 GB | Best quality local |
qwen35-9b-gpu |
Qwen 3.5 9B Q4 | 8 GB | Strong reasoning |
remote-openai |
Any remote model | N/A | OpenAI / Together / Groq |
Individual settings can be overridden without touching the profile:
LEX_LLM__N_GPU_LAYERS=10 uv run lex serve-llm # partial GPU offload
LEX_LLM__CTX_SIZE=4096 uv run lex serve-llm # reduce if OOMThree long-running processes, three terminals:
# Terminal 1: embedding + reranker models, loaded once and served continuously
uv run lex serve-models
uv run lex serve-models --reranker-device cuda # if VRAM is free
# Terminal 2: LLM
uv run lex serve-llm
uv run lex serve-llm --profile qwen35-9b-gpu # override profile
uv run lex serve-llm --gpu-layers 0 # force CPU
# Terminal 3: services, ingestion, and UI
docker compose up -d
uv run lex services # verify all three are healthy
uv run lex ingest 32018L1972 # ingest the EECC, about 6 min on CPU
uv run lex ui # http://localhost:8100lex config [--profile P] [--judge P] [--list]
lex serve-models [--embedding-device D] [--reranker-device D]
lex serve-llm [--profile P] [--port N] [--gpu-layers N] [--ctx-size N]
lex services health-check all dependencies
lex smoke verify config, print sample commands
lex ingest <CELEX_ID> [--language en] [--source cellar|local]
lex search "<query>" [--top-k N] [--celex-id X] [--article N]
lex ask "<question>" [--celex-id X] [--article N]
lex serve [--host H] [--port N] FastAPI server
lex worker Redis ingestion worker
lex ui [--port N] Chainlit chat UI
# Fast: one model for both RAG and judging
uv run pytest -m eval -v
# Recommended: dedicated judge model
# Terminal 1: RAG LLM on port 8080, already running
uv run lex serve-llm --profile gemma4-31b-gpu --port 8081 # terminal 2
export LEX_EVAL_JUDGE__BASE_URL=http://localhost:8081/v1
export LEX_EVAL_JUDGE__MODEL=gemma-4-31b-it
uv run pytest -m eval -v # terminal 3
# Remote judge, fastest
export LEX_EVAL_JUDGE__BASE_URL=https://api.openai.com/v1
export LEX_EVAL_JUDGE__MODEL=gpt-4o-mini
export OPENAI_API_KEY=sk-...
uv run pytest -m eval -vReports are written to tests/reports/eval-YYYYMMDD-HHMMSS.{csv,md}.
The table below shows two recent eval runs. The targets are aspirational for a quantised local model. A 31B or API-backed model is expected to improve the structured citation and faithfulness metrics.
| Metric | Target | Qwen 3.5 9B (29 q) | Gemma 4 E4B (24 q) | Readout |
|---|---|---|---|---|
| context_precision | > 0.80 | 0.574 | 0.827 | Gemma retrieves more tightly |
| context_recall | > 0.80 | 0.897 | 0.792 | Qwen retrieves more broadly |
| faithfulness | > 0.90 | 0.802 | 0.814 | below target |
| answer_relevancy | > 0.85 | 0.902 | 0.977 | strong |
| citation_correctness | > 0.90 | 0.514 | 0.505 | main bottleneck |
These results are strong for a local 9B-class setup. Qwen 3.5 9B reaches high context recall and answer relevancy, which means the retrieval pipeline is usually finding the right legal material and the generated answer usually addresses the question asked. Gemma 4 E4B trades some recall for higher precision, retrieving a tighter context set.
The main weakness is citation completeness, not retrieval availability. The expected answers for some cross-reference questions require citing many Articles at once. The relevant content is often present in the retrieved context, but smaller models struggle to extract and format the full expected set of references reliably.
The hardest category is broad cross-reference enumeration, for example questions like "Which articles concern end-user rights?" where the expected answer spans a whole Title. This is where larger models and deeper citation graph traversal should help most.
For a quantised model running on consumer hardware, the current profile is already useful: high recall, strong answer relevancy, explicit citations, and a path to improve faithfulness and citation correctness by changing model profile rather than changing the core architecture.
uvx ty check src/lex/ # zero errors expectedConfiguration lives in ty.toml. The type checker catches undefined names,
missing imports, and incorrect argument types across the pipeline before
runtime. This is the class of error that caused RetrieveFilter and
_chunk_uuid to surface as runtime failures during development.
- Add a variant to
Commandincommands.py - Write a handler
handle_<name>(cmd, deps) -> Result - Add a routing case in
engine.py - Add a CLI command in
cli.pythat builds the command and callsengine.submit
Existing handlers do not need to be modified.
profiles/ Hardware/model profiles (YAML)
src/lex/
commands.py System contract: all Command + Result types
engine.py Dispatcher
config.py Settings (pydantic-settings, profile-aware)
profile.py YAML profile loader
sources.py EUR-Lex CELLAR fetcher + local file adapter
ingestion.py Parse → chunk → embed → write
retrieval.py Hybrid search + rerank + citation graph
generation.py HyDE + RAG prompt + two-stage retrieval
model_server.py Unix-socket model daemon
worker.py Redis ingestion queue consumer
api.py FastAPI + SSE streaming
cli.py typer CLI
ui.py Chainlit chat UI + Plotly citation graph
tracing.py Langfuse shim (no-op when unconfigured)
tests/
test_lex.py Smoke tests + parametrised eval
conftest.py CSV/Markdown report writer
gold_standard.json 30 curated Q/A pairs
ENGINEERING_REPORT.md Design decisions, failure modes, limitations
Corpus boundaries. The EECC references other instruments, including the
BEREC Regulation, ePrivacy Directive, and Directive 2014/61/EU, for key
definitions and obligations. Questions whose answers live in those instruments
receive partial answers or grounded refusals. Each additional instrument is a
single lex ingest <CELEX_ID> away.
Annexes not ingested. The EECC's technical annexes live in separate CELLAR
DOC files. The current fetcher retrieves the main act only (DOC_2).
Temporal validity. LEX indexes a specific version of a directive. Amendments are not tracked yet.
HyDE quality is model-dependent. The query expansion step that bridges vocabulary gaps, for example "SMP" to "significant market power", works well with Qwen 3.5 9B but degrades with smaller models. A validity guard catches and discards bad expansions, falling back to the original query.
Reranker latency on CPU. With an 8 GB GPU fully occupied by the LLM, the reranker runs on CPU, about 5 seconds per query. Moving to a larger GPU or a smaller LLM resolves this.
Planned improvements, in priority order:
- Annex ingestion: fetch all DOC files, not just
DOC_2 - Corpus expansion: BEREC Regulation, ePrivacy, Directive 2014/61/EU
- Full directive graph UI: persistent sidebar showing the complete article cross-reference network
- Amendment tracking: fetch consolidated versions by date
libcudart.so.12: cannot open shared object file
Run source scripts/env.sh before any uv run command on Linux.
LLM endpoint unreachable
Run uv run lex serve-llm in a separate terminal first.
Model server not running
Run uv run lex serve-models in a separate terminal. Without it, each CLI
command cold-starts the embedding models, adding about 9 seconds of overhead.
CUDA out of memory on serve-models
The GPU is fully occupied by the LLM. Run uv run lex serve-models without
--reranker-device cuda. CPU is the safe default.
Evaluation LLM outputted an invalid JSON
The judge model is too small. Use a 13B+ model or a remote API.
docker compose not found on Ubuntu with docker.io
See the Docker Compose v2 install snippet in Prerequisites.
hf: command not found
uv pip install huggingface_hub[cli]llama-cpp-python fails to install
Try uv sync --extra llm-llamacpp-cpu first to verify the environment, then
switch to uv sync --extra llm-llamacpp.
MIT License. See LICENSE.


