Skip to content

soneeee22000/tracegraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TraceGraph

TraceGraph

GraphRAG Citation Explorer
Trace every AI answer back to its source through the knowledge graph.


Build License PRs Welcome

Python FastAPI LightRAG Next.js React TypeScript Tailwind Docker

Screenshots •  Quick Start •  Architecture •  Features •  API •  Corpus •  Deploy



About

TraceGraph is a full-stack GraphRAG application that demonstrates how graph-based retrieval augmented generation outperforms traditional vector RAG for knowledge-intensive tasks. It provides an interactive knowledge graph visualization with real-time citation tracing, allowing users to see exactly why an AI system produced a given answer.

Why GraphRAG?

Standard RAG retrieves document chunks via vector similarity — it finds text that looks like the question. This breaks down when answers require connecting information across multiple documents or providing auditable citation trails.

GraphRAG adds a structured knowledge graph layer: entities, relationships, and community hierarchies that enable multi-hop reasoning, citation grounding, and traceability — requirements mandated by the EU AI Act for high-risk AI systems.

What TraceGraph Does

  • Extracts 177 entities and 124 relationships from a 12-document corpus
  • Visualizes the knowledge graph as an interactive force-directed network
  • Queries with 4 search modes: hybrid, local, global, and naive (baseline)
  • Shows citation trails linking every answer to source documents and entity chains
  • Compares RAG vs GraphRAG side-by-side on the same query

Screenshots

Knowledge Graph

Interactive force-directed visualization of 177 entities extracted from a healthcare + AI corpus. Color-coded by type. Click any node to highlight its neighborhood.

Query + Citation Trail

Hybrid search result with AI response and 10 traced citations, each linked to source documents and entity chains.

TraceGraph — Knowledge Graph Visualization TraceGraph — Query with Citation Trail

Features




Interactive Knowledge Graph
Force-directed 2D visualization with zoom, drag, and click-to-highlight. 200+ entities at 60fps.




4 Search Modes
Hybrid (graph+vector), local (entities), global (communities), naive (vector-only baseline).




Citation Trail
Every answer traces back through entity chains to source documents with relevance scores.




RAG vs GraphRAG
Side-by-side comparison showing how graph structure improves answer quality over vector search.

More features
Feature Details
Resizable Panels Drag-to-resize graph, AI response, and citation trail — comfortable reading at any viewport size
Demo Mode Frontend works without backend — ships with sample graph data (24 entities, 29 relations)
Entity Types Concepts, technologies, organizations, regulations, persons, documents — each with distinct color
Real-time API FastAPI backend with async LightRAG, Swagger docs at /docs
Docker Ready Single docker compose up for full stack
OpenAI Compatible Any OpenAI-compatible API for LLM and embeddings (GPT-4o-mini, Llama via Ollama, etc.)

Architecture

graph TB
    subgraph Frontend ["Frontend — Next.js 16 / React 19"]
        UI[Page Layout] --> QP[Query Panel]
        UI --> GV["Graph Viewer<br/><sub>react-force-graph-2d</sub>"]
        UI --> AP[Answer Panel]
        UI --> CT[Citation Trail]
    end

    subgraph Backend ["Backend — FastAPI / Python"]
        API[REST API] --> GE[GraphRAG Engine]
        GE --> LR[LightRAG v1.4]
        LR --> EE[Entity Extraction]
        LR --> CD[Community Detection]
        LR --> HR[Hybrid Retrieval]
    end

    subgraph Storage ["Persistence"]
        GM[(GraphML)]
        VDB[(Vector DB)]
        KV[(KV Stores)]
    end

    subgraph External ["External"]
        OAI[OpenAI API]
    end

    Frontend -- "HTTP/JSON :8000" --> Backend
    LR --> GM & VDB & KV
    EE & HR -- "API calls" --> OAI

    style Frontend fill:#1e1b4b,stroke:#6366f1,color:#e2e8f0
    style Backend fill:#022c22,stroke:#10b981,color:#e2e8f0
    style Storage fill:#1c1917,stroke:#78716c,color:#e2e8f0
    style External fill:#172554,stroke:#3b82f6,color:#e2e8f0
Loading
Data flow — ingestion + query
sequenceDiagram
    participant U as User
    participant FE as Frontend
    participant API as FastAPI
    participant LR as LightRAG
    participant KG as Knowledge Graph
    participant LLM as OpenAI

    rect rgb(30, 27, 75)
    Note over U,LLM: Document Ingestion
    U->>API: POST /ingest-corpus
    API->>LR: ainsert(document)
    LR->>LLM: Extract entities & relations
    LLM-->>LR: Entities + Relations
    LR->>KG: Build graph (NetworkX)
    LR->>LR: Detect communities (Leiden)
    KG-->>API: Graph persisted
    end

    rect rgb(2, 44, 34)
    Note over U,LLM: Query with Citation
    U->>FE: Enter question
    FE->>API: POST /query {mode: "hybrid"}
    API->>LR: aquery(text, mode=hybrid)
    LR->>KG: Graph traversal (local)
    LR->>KG: Community summaries (global)
    LR->>LLM: Generate grounded answer
    LLM-->>LR: Response
    LR-->>API: Answer + graph context
    API->>API: Extract citation chains
    API-->>FE: {answer, citations, graph}
    FE-->>U: Visual answer + citation trail
    end
Loading

Tech Stack

Layer Technology Why
Frontend Next.js 16, React 19, TypeScript Latest App Router, RSC-ready, strict types
Styling Tailwind CSS 4 CSS variable design system, dark theme
Graph Viz react-force-graph-2d Canvas-based, handles 200+ nodes at 60fps
Backend FastAPI, Python 3.10+ Async-first, auto-generated OpenAPI docs
GraphRAG LightRAG 1.4 (HKUDS) Proven OSS GraphRAG with hybrid retrieval
LLM GPT-4o-mini (configurable) Entity extraction + answer generation
Embeddings text-embedding-3-small 1536-dim vectors for semantic search
Graph Store NetworkX + GraphML In-memory graph with file persistence
Vector Store Nano Vector DB Lightweight cosine similarity search
Panels react-resizable-panels v4 Drag-to-resize graph/sidebar/citations
Containerization Docker Compose Single-command full stack deployment
Hosting Vercel + Render Frontend CDN + Backend Docker container

Quick Start

Prerequisites

Requirement Version
Python 3.10+
Node.js 20+
OpenAI API Key Required

Setup

# 1. Clone
git clone https://github.com/soneeee22000/tracegraph.git
cd tracegraph

# 2. Backend
cd backend
cp .env.example .env          # Add your OpenAI API key
pip install -r requirements.txt

# 3. Ingest corpus (extracts ~177 entities, ~2 min, ~$0.15)
python -c "
import asyncio
from app.graphrag import engine

async def ingest():
    await engine.initialize()
    results = await engine.ingest_corpus('./corpus')
    print(f'Ingested {len(results)} documents')

asyncio.run(ingest())
"

# 4. Start backend
python -m uvicorn app.main:app --host 0.0.0.0 --port 8000

# 5. Frontend (new terminal)
cd ../frontend
npm install
npm run dev

Open http://localhost:3000

Demo Mode: The frontend works without a backend — ships with sample graph data so you can explore the UI instantly.

Docker (alternative)
cp backend/.env.example backend/.env
# Edit backend/.env with your OpenAI API key
docker compose up

API Reference

Interactive Swagger docs available at http://localhost:8000/docs

Method Endpoint Description
GET /health Service health + graph statistics
GET /graph Full knowledge graph (177 nodes, 124 edges)
GET /docs OpenAPI / Swagger UI
POST /query Query with citation tracing
POST /compare Naive RAG vs GraphRAG side-by-side
POST /ingest Ingest a single document
POST /ingest-corpus Batch ingest all corpus/*.txt
Example request + response

Request:

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "How does GraphRAG reduce hallucinations?", "mode": "hybrid"}'

Response structure:

{
  "answer": "GraphRAG reduces hallucinations by...",
  "mode": "hybrid",
  "citations": [
    {
      "source_document": "01_graphrag_overview.txt",
      "chunk_text": "Graph-Based Retrieval Augmented Generation...",
      "entity_chain": ["GraphRAG", "Knowledge Graph", "Structured Grounding"],
      "relevance_score": 1.0
    }
  ],
  "graph": { "nodes": [...], "edges": [...] },
  "entity_count": 177,
  "relationship_count": 124
}

Search Modes

Mode Strategy Best For
hybrid Graph traversal + vector search General questions (recommended)
local Entity neighborhood traversal Specific topic deep-dives
global Community summary search Broad thematic overviews
naive Vector similarity only Baseline comparison

Corpus

12 curated documents spanning healthcare AI and GraphRAG infrastructure — strategically chosen to demonstrate cross-document reasoning in regulated domains.

# Document Domain Key Entities
01 GraphRAG Overview AI Infrastructure GraphRAG, Leiden Algorithm, Multi-hop Reasoning
02 Knowledge Graphs in Healthcare Healthcare AI UMLS, SNOMED CT, Clinical Decision Support
03 Vaccine Safety Monitoring Pharmacovigilance VAERS, Brighton Collaboration, AEFI
04 Entity Extraction & NLP Information Extraction NER, Relation Extraction, Entity Resolution
05 Citation-Grounded AI AI Safety FActScore, Citation Recall, Faithfulness
06 Graph Databases for AI Database Technology Neo4j, LightRAG, FalkorDB
07 LLM Hallucination in Enterprise Enterprise AI Deloitte Survey, Confidence Scoring
08 Hybrid Retrieval Architectures Information Retrieval RRF, Cross-encoder Re-ranking
09 Community Detection Graph Algorithms Leiden, Louvain, Modularity
10 EU AI Act Compliance AI Regulation Article 13, Article 14, Traceability
11 Clinical Trials Analysis Drug Development ClinicalTrials.gov, Pistoia Alliance
12 AI Safety & Grounding Trustworthy AI HITL, Formal Verification

After ingestion: ~177 entities across 6 types, ~124 relationships


How GraphRAG Differs from RAG

graph LR
    subgraph trad ["Traditional RAG"]
        direction LR
        D1["Docs"] --> C1["Chunk"] --> E1["Embed"] --> V1["Vector DB"]
        Q1["Query"] --> E1b["Embed"] --> V1
        V1 -->|"Top-K"| L1["LLM"] --> A1["Answer"]
    end

    subgraph graphrag ["GraphRAG"]
        direction LR
        D2["Docs"] --> EX["Extract Entities"] --> KG["Knowledge Graph"]
        D2 --> C2["Chunk"] --> E2["Embed"] --> V2["Vector DB"]
        Q2["Query"] --> HS["Hybrid Search"]
        KG --> HS
        V2 --> HS
        HS --> L2["LLM"] --> A2["Answer + Citations"]
    end

    style trad fill:#1c1917,stroke:#78716c,color:#a8a29e
    style graphrag fill:#1e1b4b,stroke:#6366f1,color:#c7d2fe
Loading

The key insight: GraphRAG doesn't just find text that looks similar — it traverses a structured knowledge graph to discover related information across documents, then grounds every claim in verifiable entity chains.


Project Structure

tracegraph/
├── backend/
│   ├── app/
│   │   ├── main.py              # FastAPI routes + CORS
│   │   ├── graphrag.py          # LightRAG engine wrapper
│   │   ├── models.py            # Pydantic schemas
│   │   └── config.py            # Env-based settings
│   ├── corpus/                  # 12 source documents (.txt)
│   ├── graph_store/             # Pre-ingested: GraphML + vector DBs
│   ├── requirements.txt
│   └── Dockerfile
├── frontend/
│   ├── src/
│   │   ├── app/
│   │   │   ├── page.tsx         # Landing page (13 sections)
│   │   │   └── explorer/
│   │   │       └── page.tsx     # Graph explorer (resizable panels)
│   │   ├── components/
│   │   │   ├── landing/         # 13 landing page sections
│   │   │   ├── ui/              # shadcn/ui components
│   │   │   ├── graph-viewer.tsx # Force-directed graph
│   │   │   ├── query-panel.tsx  # Search modes + compare toggle
│   │   │   ├── answer-panel.tsx # AI response + comparison view
│   │   │   └── citation-trail.tsx # Source provenance viewer
│   │   ├── lib/                 # API client, colors, sample data
│   │   └── types/               # TypeScript interfaces
│   ├── public/                  # Favicons (SVG, PNG, ICO)
│   ├── package.json
│   └── Dockerfile
├── docs/
│   ├── LANDING-PAGE.md          # Landing page design specification
│   └── assets/                  # Logo SVG
├── docker-compose.yml
├── render.yaml                  # Render blueprint
├── LICENSE
└── README.md

Configuration

Environment variables
Variable Description Default
LLM_MODEL OpenAI model for completions gpt-4o-mini
LLM_API_KEY OpenAI API key required
LLM_API_BASE API base URL https://api.openai.com/v1
EMBEDDING_MODEL Embedding model text-embedding-3-small
EMBEDDING_API_KEY Embedding API key required
EMBEDDING_API_BASE Embedding API base URL https://api.openai.com/v1
WORKING_DIR Graph storage path ./graph_store
CORPUS_DIR Corpus path ./corpus
CORS_ORIGINS Allowed origins http://localhost:3000
NEXT_PUBLIC_API_URL Backend URL (frontend) http://localhost:8000

Deployment

Live Demo

Component URL Platform
Landing Page tracegraph.vercel.app Vercel (Hobby)
Graph Explorer tracegraph.vercel.app/explorer Vercel (Hobby)
Backend API tracegraph-ls2t.onrender.com Render (Starter)
API Docs tracegraph-ls2t.onrender.com/docs Swagger UI

Self-hosting

Platform Command Notes
Docker docker compose up -d Full stack, self-hosted
Vercel cd frontend && npx vercel --prod Set NEXT_PUBLIC_API_URL env var
Render Connect repo, set root to backend Docker runtime, set env vars

Performance

Metric Value
Corpus 12 documents, ~15,000 words
Entities extracted 177
Relationships extracted 124
Ingestion time ~2-3 minutes
Ingestion cost ~$0.15 (OpenAI)
Query latency 3-8 seconds (hybrid)
Frontend build < 4 seconds
Graph rendering 60fps @ 177 nodes
Lighthouse score 95+ (performance)

Security

Check Status
API keys in .env (gitignored) Passed
CORS restricted to allowed origins Passed
Input validation (Pydantic) Passed
No raw SQL / injection vectors Passed
No secrets in git history Passed
Dependencies auditable Passed

License

MIT — Pyae Sone, 2026



Built with LightRAGNext.js 16FastAPIreact-force-graph

Releases

No releases published

Packages

 
 
 

Contributors