GraphRAG Citation Explorer
Trace every AI answer back to its source through the knowledge graph.
Screenshots • Quick Start • Architecture • Features • API • Corpus • Deploy
TraceGraph is a full-stack GraphRAG application that demonstrates how graph-based retrieval augmented generation outperforms traditional vector RAG for knowledge-intensive tasks. It provides an interactive knowledge graph visualization with real-time citation tracing, allowing users to see exactly why an AI system produced a given answer.
|
Why GraphRAG? Standard RAG retrieves document chunks via vector similarity — it finds text that looks like the question. This breaks down when answers require connecting information across multiple documents or providing auditable citation trails. GraphRAG adds a structured knowledge graph layer: entities, relationships, and community hierarchies that enable multi-hop reasoning, citation grounding, and traceability — requirements mandated by the EU AI Act for high-risk AI systems. |
What TraceGraph Does
|
More features
| Feature | Details |
|---|---|
| Resizable Panels | Drag-to-resize graph, AI response, and citation trail — comfortable reading at any viewport size |
| Demo Mode | Frontend works without backend — ships with sample graph data (24 entities, 29 relations) |
| Entity Types | Concepts, technologies, organizations, regulations, persons, documents — each with distinct color |
| Real-time API | FastAPI backend with async LightRAG, Swagger docs at /docs |
| Docker Ready | Single docker compose up for full stack |
| OpenAI Compatible | Any OpenAI-compatible API for LLM and embeddings (GPT-4o-mini, Llama via Ollama, etc.) |
graph TB
subgraph Frontend ["Frontend — Next.js 16 / React 19"]
UI[Page Layout] --> QP[Query Panel]
UI --> GV["Graph Viewer<br/><sub>react-force-graph-2d</sub>"]
UI --> AP[Answer Panel]
UI --> CT[Citation Trail]
end
subgraph Backend ["Backend — FastAPI / Python"]
API[REST API] --> GE[GraphRAG Engine]
GE --> LR[LightRAG v1.4]
LR --> EE[Entity Extraction]
LR --> CD[Community Detection]
LR --> HR[Hybrid Retrieval]
end
subgraph Storage ["Persistence"]
GM[(GraphML)]
VDB[(Vector DB)]
KV[(KV Stores)]
end
subgraph External ["External"]
OAI[OpenAI API]
end
Frontend -- "HTTP/JSON :8000" --> Backend
LR --> GM & VDB & KV
EE & HR -- "API calls" --> OAI
style Frontend fill:#1e1b4b,stroke:#6366f1,color:#e2e8f0
style Backend fill:#022c22,stroke:#10b981,color:#e2e8f0
style Storage fill:#1c1917,stroke:#78716c,color:#e2e8f0
style External fill:#172554,stroke:#3b82f6,color:#e2e8f0
Data flow — ingestion + query
sequenceDiagram
participant U as User
participant FE as Frontend
participant API as FastAPI
participant LR as LightRAG
participant KG as Knowledge Graph
participant LLM as OpenAI
rect rgb(30, 27, 75)
Note over U,LLM: Document Ingestion
U->>API: POST /ingest-corpus
API->>LR: ainsert(document)
LR->>LLM: Extract entities & relations
LLM-->>LR: Entities + Relations
LR->>KG: Build graph (NetworkX)
LR->>LR: Detect communities (Leiden)
KG-->>API: Graph persisted
end
rect rgb(2, 44, 34)
Note over U,LLM: Query with Citation
U->>FE: Enter question
FE->>API: POST /query {mode: "hybrid"}
API->>LR: aquery(text, mode=hybrid)
LR->>KG: Graph traversal (local)
LR->>KG: Community summaries (global)
LR->>LLM: Generate grounded answer
LLM-->>LR: Response
LR-->>API: Answer + graph context
API->>API: Extract citation chains
API-->>FE: {answer, citations, graph}
FE-->>U: Visual answer + citation trail
end
| Layer | Technology | Why |
|---|---|---|
| Frontend | Next.js 16, React 19, TypeScript | Latest App Router, RSC-ready, strict types |
| Styling | Tailwind CSS 4 | CSS variable design system, dark theme |
| Graph Viz | react-force-graph-2d | Canvas-based, handles 200+ nodes at 60fps |
| Backend | FastAPI, Python 3.10+ | Async-first, auto-generated OpenAPI docs |
| GraphRAG | LightRAG 1.4 (HKUDS) | Proven OSS GraphRAG with hybrid retrieval |
| LLM | GPT-4o-mini (configurable) | Entity extraction + answer generation |
| Embeddings | text-embedding-3-small | 1536-dim vectors for semantic search |
| Graph Store | NetworkX + GraphML | In-memory graph with file persistence |
| Vector Store | Nano Vector DB | Lightweight cosine similarity search |
| Panels | react-resizable-panels v4 | Drag-to-resize graph/sidebar/citations |
| Containerization | Docker Compose | Single-command full stack deployment |
| Hosting | Vercel + Render | Frontend CDN + Backend Docker container |
| Requirement | Version |
|---|---|
| Python | 3.10+ |
| Node.js | 20+ |
| OpenAI API Key | Required |
# 1. Clone
git clone https://github.com/soneeee22000/tracegraph.git
cd tracegraph
# 2. Backend
cd backend
cp .env.example .env # Add your OpenAI API key
pip install -r requirements.txt
# 3. Ingest corpus (extracts ~177 entities, ~2 min, ~$0.15)
python -c "
import asyncio
from app.graphrag import engine
async def ingest():
await engine.initialize()
results = await engine.ingest_corpus('./corpus')
print(f'Ingested {len(results)} documents')
asyncio.run(ingest())
"
# 4. Start backend
python -m uvicorn app.main:app --host 0.0.0.0 --port 8000
# 5. Frontend (new terminal)
cd ../frontend
npm install
npm run devDemo Mode: The frontend works without a backend — ships with sample graph data so you can explore the UI instantly.
Docker (alternative)
cp backend/.env.example backend/.env
# Edit backend/.env with your OpenAI API key
docker compose upInteractive Swagger docs available at
http://localhost:8000/docs
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Service health + graph statistics |
GET |
/graph |
Full knowledge graph (177 nodes, 124 edges) |
GET |
/docs |
OpenAPI / Swagger UI |
POST |
/query |
Query with citation tracing |
POST |
/compare |
Naive RAG vs GraphRAG side-by-side |
POST |
/ingest |
Ingest a single document |
POST |
/ingest-corpus |
Batch ingest all corpus/*.txt |
Example request + response
Request:
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"query": "How does GraphRAG reduce hallucinations?", "mode": "hybrid"}'Response structure:
{
"answer": "GraphRAG reduces hallucinations by...",
"mode": "hybrid",
"citations": [
{
"source_document": "01_graphrag_overview.txt",
"chunk_text": "Graph-Based Retrieval Augmented Generation...",
"entity_chain": ["GraphRAG", "Knowledge Graph", "Structured Grounding"],
"relevance_score": 1.0
}
],
"graph": { "nodes": [...], "edges": [...] },
"entity_count": 177,
"relationship_count": 124
}| Mode | Strategy | Best For |
|---|---|---|
hybrid |
Graph traversal + vector search | General questions (recommended) |
local |
Entity neighborhood traversal | Specific topic deep-dives |
global |
Community summary search | Broad thematic overviews |
naive |
Vector similarity only | Baseline comparison |
12 curated documents spanning healthcare AI and GraphRAG infrastructure — strategically chosen to demonstrate cross-document reasoning in regulated domains.
| # | Document | Domain | Key Entities |
|---|---|---|---|
| 01 | GraphRAG Overview | AI Infrastructure | GraphRAG, Leiden Algorithm, Multi-hop Reasoning |
| 02 | Knowledge Graphs in Healthcare | Healthcare AI | UMLS, SNOMED CT, Clinical Decision Support |
| 03 | Vaccine Safety Monitoring | Pharmacovigilance | VAERS, Brighton Collaboration, AEFI |
| 04 | Entity Extraction & NLP | Information Extraction | NER, Relation Extraction, Entity Resolution |
| 05 | Citation-Grounded AI | AI Safety | FActScore, Citation Recall, Faithfulness |
| 06 | Graph Databases for AI | Database Technology | Neo4j, LightRAG, FalkorDB |
| 07 | LLM Hallucination in Enterprise | Enterprise AI | Deloitte Survey, Confidence Scoring |
| 08 | Hybrid Retrieval Architectures | Information Retrieval | RRF, Cross-encoder Re-ranking |
| 09 | Community Detection | Graph Algorithms | Leiden, Louvain, Modularity |
| 10 | EU AI Act Compliance | AI Regulation | Article 13, Article 14, Traceability |
| 11 | Clinical Trials Analysis | Drug Development | ClinicalTrials.gov, Pistoia Alliance |
| 12 | AI Safety & Grounding | Trustworthy AI | HITL, Formal Verification |
After ingestion: ~177 entities across 6 types, ~124 relationships
graph LR
subgraph trad ["Traditional RAG"]
direction LR
D1["Docs"] --> C1["Chunk"] --> E1["Embed"] --> V1["Vector DB"]
Q1["Query"] --> E1b["Embed"] --> V1
V1 -->|"Top-K"| L1["LLM"] --> A1["Answer"]
end
subgraph graphrag ["GraphRAG"]
direction LR
D2["Docs"] --> EX["Extract Entities"] --> KG["Knowledge Graph"]
D2 --> C2["Chunk"] --> E2["Embed"] --> V2["Vector DB"]
Q2["Query"] --> HS["Hybrid Search"]
KG --> HS
V2 --> HS
HS --> L2["LLM"] --> A2["Answer + Citations"]
end
style trad fill:#1c1917,stroke:#78716c,color:#a8a29e
style graphrag fill:#1e1b4b,stroke:#6366f1,color:#c7d2fe
The key insight: GraphRAG doesn't just find text that looks similar — it traverses a structured knowledge graph to discover related information across documents, then grounds every claim in verifiable entity chains.
tracegraph/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI routes + CORS
│ │ ├── graphrag.py # LightRAG engine wrapper
│ │ ├── models.py # Pydantic schemas
│ │ └── config.py # Env-based settings
│ ├── corpus/ # 12 source documents (.txt)
│ ├── graph_store/ # Pre-ingested: GraphML + vector DBs
│ ├── requirements.txt
│ └── Dockerfile
├── frontend/
│ ├── src/
│ │ ├── app/
│ │ │ ├── page.tsx # Landing page (13 sections)
│ │ │ └── explorer/
│ │ │ └── page.tsx # Graph explorer (resizable panels)
│ │ ├── components/
│ │ │ ├── landing/ # 13 landing page sections
│ │ │ ├── ui/ # shadcn/ui components
│ │ │ ├── graph-viewer.tsx # Force-directed graph
│ │ │ ├── query-panel.tsx # Search modes + compare toggle
│ │ │ ├── answer-panel.tsx # AI response + comparison view
│ │ │ └── citation-trail.tsx # Source provenance viewer
│ │ ├── lib/ # API client, colors, sample data
│ │ └── types/ # TypeScript interfaces
│ ├── public/ # Favicons (SVG, PNG, ICO)
│ ├── package.json
│ └── Dockerfile
├── docs/
│ ├── LANDING-PAGE.md # Landing page design specification
│ └── assets/ # Logo SVG
├── docker-compose.yml
├── render.yaml # Render blueprint
├── LICENSE
└── README.md
Environment variables
| Variable | Description | Default |
|---|---|---|
LLM_MODEL |
OpenAI model for completions | gpt-4o-mini |
LLM_API_KEY |
OpenAI API key | required |
LLM_API_BASE |
API base URL | https://api.openai.com/v1 |
EMBEDDING_MODEL |
Embedding model | text-embedding-3-small |
EMBEDDING_API_KEY |
Embedding API key | required |
EMBEDDING_API_BASE |
Embedding API base URL | https://api.openai.com/v1 |
WORKING_DIR |
Graph storage path | ./graph_store |
CORPUS_DIR |
Corpus path | ./corpus |
CORS_ORIGINS |
Allowed origins | http://localhost:3000 |
NEXT_PUBLIC_API_URL |
Backend URL (frontend) | http://localhost:8000 |
| Component | URL | Platform |
|---|---|---|
| Landing Page | tracegraph.vercel.app | Vercel (Hobby) |
| Graph Explorer | tracegraph.vercel.app/explorer | Vercel (Hobby) |
| Backend API | tracegraph-ls2t.onrender.com | Render (Starter) |
| API Docs | tracegraph-ls2t.onrender.com/docs | Swagger UI |
| Platform | Command | Notes |
|---|---|---|
| Docker | docker compose up -d |
Full stack, self-hosted |
| Vercel | cd frontend && npx vercel --prod |
Set NEXT_PUBLIC_API_URL env var |
| Render | Connect repo, set root to backend |
Docker runtime, set env vars |
| Metric | Value |
|---|---|
| Corpus | 12 documents, ~15,000 words |
| Entities extracted | 177 |
| Relationships extracted | 124 |
| Ingestion time | ~2-3 minutes |
| Ingestion cost | ~$0.15 (OpenAI) |
| Query latency | 3-8 seconds (hybrid) |
| Frontend build | < 4 seconds |
| Graph rendering | 60fps @ 177 nodes |
| Lighthouse score | 95+ (performance) |
| Check | Status |
|---|---|
API keys in .env (gitignored) |
Passed |
| CORS restricted to allowed origins | Passed |
| Input validation (Pydantic) | Passed |
| No raw SQL / injection vectors | Passed |
| No secrets in git history | Passed |
| Dependencies auditable | Passed |
MIT — Pyae Sone, 2026
Built with LightRAG • Next.js 16 • FastAPI • react-force-graph

