Graxon

First Open-Source Hybrid RAG to eliminate hallucinations through a persistent Knowledge Graph layer.

Graxon combines dense vector search, sparse retrieval, and a structured Knowledge Graph to deliver accurate, traceable, and context-aware answers — at scale, across multiple organizations, projects, and documents.

Overview

Traditional RAG systems rely purely on vector similarity, which can retrieve plausible but semantically incorrect chunks. Graxon solves this by layering a persistent Knowledge Graph on top of hybrid vector retrieval — connecting chunks through typed, weighted edges that capture real semantic relationships.

Key properties:

Multi-Tenant — isolated workspaces per organization
Multi-Project — scoped retrieval within projects
Multi-Document — fine-grained document management
Hybrid Retrieval — dense vectors + sparse BM25 + graph traversal
Hallucination Reduction — graph-grounded answers anchored to structured knowledge

Images

Videos

UI

https://drive.google.com/file/d/1Luv6NVNh1e1VJPmGp_eXtB1fy4n7NVey/view?usp=drive_link

Graph DB

https://drive.google.com/file/d/1fi_lNDTBxRy3jGuS0spnw0dNXraPAbwf/view?usp=sharing

Architecture

Orgs
 └── Projects
      └── Documents
           └── Chunks

Each document is chunked, processed through a multi-agent pipeline, stored in a vector database and a knowledge graph, and wired with semantic edges for retrieval.

Infrastructure

Component	Role
Qdrant	Vector database — dense + sparse embeddings
Neo4j	Graph database — chunk nodes, semantic edges
PostgreSQL	Primary relational database
PgBouncer	PostgreSQL connection pooler
MinIO	Object storage — raw document files
RabbitMQ	Message broker — async pipeline orchestration
Redis	In-memory cache — sessions, queues, fast lookups

Data Model

Hierarchy

Organization
  └── Project
        └── Document
              └── Chunk

Graph Node: `Chunk`

Each chunk is stored as a node in Neo4j and a vector in Qdrant. Nodes are connected by typed, weighted edges:

Edge Type	Description
`PREV` / `NEXT`	Sequential order within the document
`HAS_TAG`	LLM-extracted semantic tags
`HAS_KEYWORD`	TF-IDF significant keywords
`HAS_PHRASE`	Shared n-gram phrases
`HAS_ENTITY`	Named entities (NER)
`HAS_CONCEPT`	Extracted noun phrases / concepts
`HAS_ACRONYM`	Acronym-to-definition links
`VECTOR_SIMILAR`	High cosine similarity between chunk embeddings

All edges carry a weight for ranked graph traversal during retrieval.

Images

Ingestion Pipeline

Graxon uses LangGraph to orchestrate a parallel multi-agent pipeline at ingestion time.

Document
   │
   ▼
Chunking
   │
   ▼
LangGraph Pipeline
   ├── LLM Agent          → tags, inter-chunk relations
   ├── Embedding Agent    → dense vectors (OpenAI / Gemini / Voyage)
   ├── Sparse Agent       → sparse vectors via FastEmbed (BM25 / Qdrant sparse)
   └── Lexical Engine     → entities, concepts, keywords, phrases, acronyms
   │
   ├── Vector Store Agent
   │     └── Qdrant ← dense + sparse embeddings
   │
   ├── Graph DB Agent
   │     └── Neo4j ← chunk nodes
   │                 ├── PREV / NEXT edges
   │                 ├── HAS_TAG, HAS_KEYWORD, HAS_PHRASE ...
   │                 └── (all with weights)
   │
   └── Vector Similarity Sync
         └── Top-K similar chunks from Qdrant
               └── Neo4j ← VECTOR_SIMILAR edges (with cosine weight)

Images

Agents

LLM Agent Sends chunks to an LLM to extract semantic tags and inter-chunk relationships. Results become typed edges in the knowledge graph.

Embedding Agent Generates dense vector embeddings using pluggable providers — OpenAI, Google Gemini, or Voyage AI. Stored in Qdrant for ANN search.

Sparse Embedding Agent Generates sparse vectors via FastEmbed for BM25-style retrieval and Qdrant's sparse vector support. Enables lexical precision alongside semantic recall.

Lexical Engine SpaCy-powered linguistic analysis that extracts structured signals from each chunk. See Lexical Engine below.

Vector Similarity Sync

After all chunks are stored in Qdrant, Graxon runs a post-ingestion pass: for each chunk, it fetches the top-K most similar chunks by embedding cosine similarity and writes VECTOR_SIMILAR edges into Neo4j with the similarity score as the edge weight. This bridges the vector and graph layers.

Lexical Engine

Graxon uses SpaCy as its lexical engine to extract structured linguistic signals from chunks, which are then converted into graph edges.

Entity Extraction (NER)

Detects shared named entities — people, organizations, products, and technologies — across chunks. Creates strong semantic links between chunks discussing the same real-world subject, improving graph-based retrieval accuracy.

Concept Extraction (Noun Phrases)

Extracts meaningful noun phrases and technical concepts shared between chunks. Connects semantically related ideas even when exact keywords differ, improving topic grouping and contextual understanding.

TF-IDF Keyword Linking

Uses TF-IDF scoring to detect rare but informative keywords appearing across multiple chunks, while filtering common noise words. Highlights statistically important terms that strengthen semantic relationships.

Phrase Bridge Detection

Detects exact shared n-gram phrases between chunks to capture repeated terminology and strong lexical overlap. Especially useful for technical, scientific, and domain-specific documents where repeated phrases carry important meaning.

Acronym Resolution

Detects acronym definitions and links them to later acronym usage throughout the document. Improves long-document comprehension by connecting abbreviated references back to their original meaning.

Edge Construction

Converts all detected lexical relationships — entities, concepts, keywords, phrases, and acronyms — into typed, weighted graph edges connecting related chunks in Neo4j.

Edge Deduplication

Removes duplicate or weaker relationships while preserving the strongest semantic connections. Keeps the graph cleaner and more efficient to traverse during retrieval and ranking.

Resilient Ingestion & Checkpointing

Most RAG pipelines work great in a notebook. At enterprise scale, they fall apart.

Rate limits spike. Workers crash. If your ingestion fails at page 800 of a 1,000-page document, an all-or-nothing architecture forces a full restart — burning engineering time and duplicate LLM tokens.

Graxon is built with an infrastructure-first mindset to make ingestion deterministic, fault-tolerant, and resumable by design.

Zero-Loss Macro & Micro Checkpointing

Graxon treats ingestion like a transaction log, decoupling graph state and persisting checkpoints across two layers:

Layer	Store	Role
Micro checkpoints	Redis	Hot in-memory state tracking per chunk
Macro checkpoints	MinIO (S3)	Cold artifact backups per document/batch

If an API provider or worker node crashes mid-ingestion, Graxon hot-boots and resumes from the exact chunk it left off — no full restart, no wasted tokens.

Ironclad Idempotency & Atomicity

Resuming a failed pipeline usually introduces duplicate vectors or corrupted graph linkages. Graxon guarantees a zero-duplicate footprint by design:

Qdrant

Deterministic uuid5 hashing seeded by structured chunk_ids
Ensures every vector upsert is fully idempotent — re-ingesting a chunk overwrites cleanly, never duplicates

Neo4j

Single-transaction bulk uploads via optimized Cypher UNWIND clauses
Strict ON CREATE / ON MATCH state isolation preserves temporal metadata and guarantees ACID consistency

Multi-Engine Parallel Fan-Out

Rather than sequential processing, Graxon uses LangGraph to run a parallelized scatter-gather pipeline across all storage layers simultaneously:

Dense vectors → Qdrant (deep semantic similarity)
Sparse vectors → FastEmbed / BM25 → Qdrant (lexical exact matching)
Knowledge Graph → Neo4j (chunk nodes, entity tags, structural edges)
Lexical Analysis → SpaCy (natural textual topology)

All four engines process each chunk concurrently — maximizing throughput and minimizing ingestion latency at scale.

Query Pipeline

Graxon uses a LangGraph-orchestrated query pipeline with 3 query types and 2 depth levels, giving fine-grained control over retrieval quality vs. speed.

Flow Overview

__start__
    │
    ▼
supervisor_agent
    │
    ▼
query_expansion_agent
    ├── embedding_agent          (dense embedding of expanded query)
    └── sparse_embedding_agent   (sparse / BM25 embedding)
    │
    ▼
vector_database_agent            (hybrid retrieval from Qdrant)
    │
    ├── [expert] ──► expert_query_agent ──► expert_query_reranker_agent ──► expert_query_answer_agent
    ├── [quick]  ──► quick_query_agent  ──► quick_query_reranker_agent  ──► quick_query_answer_agent
    └── [smart]  ──► smart_query_agent  ──► smart_query_reranker_agent  ──► smart_query_answer_agent
    │
    ▼
__end__  (answer + metadata / sources)

Every query — regardless of mode or depth — begins with:

Query expansion via LLM
Dense + sparse embedding of the expanded query
Hybrid retrieval from Qdrant (dense + BM25 vectors)

Query Types & Depth

Quick

Lightweight retrieval with immediate document context.

Depth	Retrieval
Standard	Vector DB chunks + `PREV` / `NEXT` neighbors from Neo4j
Advanced	Same as Standard

Smart

Adds graph-based semantic expansion on top of Quick.

Depth	Retrieval
Standard	Quick (Standard) + `VECTOR_SIMILAR` chunks from Neo4j for each retrieved chunk
Advanced	Smart (Standard) + `PREV` / `NEXT` neighbors for each `VECTOR_SIMILAR` chunk

Expert

Full hybrid retrieval combining vector, graph, and lexical signals with a unified chunk scoring system.

Depth	Retrieval
Standard	Smart (Advanced) + embedding comparison of query against Tags, Keywords, Concepts, Entities, Phrases, Acronyms filtered by `EQ_GTE_LANE_WEIGHT_THRESHOLD`
Advanced	Expert (Standard) + Lexical Engine picks best lanes and top `EQ_MAX_LANE_ENTITY` matches per lane

Expert Chunk Scoring

Every chunk_id accumulates a score across all retrieval signals:

Signal	Score
Present in Vector DB results	`++`
Present as `PREV` / `NEXT` or `VECTOR_SIMILAR` neighbor	`++`
Matched via query–tag / keyword / concept embedding	`++`
Matched via Lexical Engine lane (Expert Advanced only)	`++`

Top EQ_MAX_CHUNKS chunks by final score are forwarded to the answer agent.

Reranking & Answer Generation

After retrieval, every mode runs:

Reranker agent — reranks the retrieved chunk set, selects Top-K
Answer agent — passes expanded query + context window to LLM
Response — returns the answer with full metadata and sources

Configuration

Variable	Description
`EQ_GTE_LANE_WEIGHT_THRESHOLD`	Minimum similarity score for tag / keyword / concept lane matching
`EQ_MAX_CHUNKS`	Maximum chunks selected after expert scoring
`EQ_MAX_LANE_ENTITY`	Top entities picked per lane by the Lexical Engine (Expert Advanced only)

Multipart Upload & Resume

Graxon's zero-loss checkpointing extends all the way to the browser.

Most upload implementations are fire-and-forget — if the connection drops at 95%, you start over. Graxon's upload system is session-aware and part-level resumable on both the frontend and backend, mirroring the same checkpoint philosophy as the ingestion pipeline.

bucket: {org_id}
key: pro*{project_id}/doc*{document_id}/{filename}

Each organization gets its own bucket. Each document gets its own isolated path within the project scope.

Upload Flow

Browser
│
├── 1. Check local session (documentId, uploadId, key, completedParts)
│
├── 2. POST /multipart/init  (if no session)
│         └── Backend: auto-creates org bucket if missing
│                       creates S3 multipart upload
│                       returns uploadId + key
│                       session persisted on frontend
│
├── 3. For each part:
│         ├── Skip if already in completedParts  ──► advance progress
│         ├── GET presigned URL from backend (1hr expiry)
│         └── PUT chunk directly to MinIO  (no data through app server)
│
├── 4. POST /multipart/complete
│         └── Backend: sorts parts by PartNumber (S3 requirement)
│                       calls S3 complete_multipart_upload
│                       registers document via DocumentService
│                       triggers ingestion pipeline
│
└── 5. Session deleted on success  /  preserved on failure for retry

Backend: `MinioUploadClient`

multipart_upload_init Ensures the org bucket exists (creates it if not), registers a new multipart upload with MinIO, and returns the upload_id and key to the frontend.

get_multipart_presigned_url Generates a presigned upload_part URL per part with a 1-hour expiry. The browser uploads directly to MinIO — no file data passes through the application server.

complete_multipart_upload Sorts completed parts by PartNumber (required by S3/MinIO), finalizes the multipart upload.

Frontend: Session-Aware Resume

Session Persistence Before every upload, the frontend checks for an existing session — documentId, uploadId, key, and all completedParts. If a previous attempt was interrupted, the session is still there.

Part-Level Resume The file is split into fixed-size chunks. For each part, the frontend checks whether it was already uploaded. If so, it skips it and advances the progress bar. Only missing parts are re-uploaded.

Local Part Tracking Completed parts are tracked in a local array during the session to avoid stale store reads. Parts are also persisted to the store for cross-session recovery.

Failure & Retry Behavior

Scenario	Behavior
Connection drops mid-upload	Session preserved, retry resumes from last completed part
Browser tab closed	Session persisted, upload resumes on next open
Part upload fails	Error surfaced, session intact, retry skips completed parts
Upload completes successfully	Session deleted, document handed to `DocumentService` for ingestion

Getting Started

Clone the repository

git clone https://github.com/Graxon-rag/graxon.git
cd graxon

Execution Choices

1. Local Development (Native)

Best if you prefer to run the app directly on your host machine without containerization.

Create a .env from .env.example
```
cp .env.example .env
```

Create a Virtual Env and enable it

python -m venv .venv
source .venv/bin/activate

Install all dependencies
```
pip install -r requirements.txt
```
Up all the engines/ databases/ store
```
docker compose up -d
```
Run the migration
```
alembic upgrade head
```
Run the server
```
 chmod +x dev.sh
 ./dev.sh
```

2. Docker Development (With Container HMR)

Best for keeping your host machine clean while maintaining instant hot-reloading (Hot Module Replacement) as you change your code.

Create a .env.docker from .env.docker.example
```
cp .env.docker.example .env.docker
```

Build the image

docker compose -f docker-compose-dev.yaml build

Run Container

docker compose -f docker-compose-dev.yaml up -d

The server will be accessible at http://localhost:8888

To view live container logs:

docker compose -f docker-compose-dev.yaml logs -f

3. Docker Production

Run the images from docker-hub

Create a .env.docker from .env.docker.example
```
cp .env.docker.example .env.docker
```

Run Container

docker compose -f docker-compose-prod.yaml up -d

The server will be accessible at http://localhost:8888

To view live container logs:

docker compose -f docker-compose-prod.yaml logs -f

Stopping Containers

If you are running either of the Docker variations, you can spin down the environments using:

For Development:
docker compose -f docker-compose-dev.yaml down
For Production:
docker compose -f docker-compose-prod.yaml down

Swagger

After running the Server you can read the swagger

http://localhost:8888/docs

Username: admin
Password: admin

Docker Hub

https://hub.docker.com/u/graxon

Adding New Migrations

When you make changes to SQLAlchemy models, generate a new migration:

alembic revision --autogenerate -m "your_migration_description"

Then apply it:

alembic upgrade head

Only modify tables listed in GRAXON_OWNED_TABLES inside migrations/env.py. Do not add tables owned by other services.

Rolling Back Migrations

Roll back the last migration:

alembic downgrade -1

Roll back to a specific revision:

alembic downgrade <revision_id>

Checking Migration Status

alembic current   # shows current revision
alembic history   # shows full migration history

Seeding

Seeding runs automatically on first startup — no manual step needed.

It inserts:

Default organization (dev)
LLM models (OpenAI, Claude, Gemini, DeepSeek)
Embedding models (OpenAI, Voyage, Gemini)
Reranker models
Sparse text models
Default Neo4j organization node

If you need to re-seed (e.g. after wiping the database), delete the seed_tracker table row:

DELETE FROM seed_tracker;

Then restart the server.

Spacy

spacy download en_core_web_sm

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
.github		.github
app		app
img		img
migrations		migrations
.dockerignore		.dockerignore
.env.docker.example		.env.docker.example
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
LICENSE		LICENSE
README.md		README.md
alembic.ini		alembic.ini
checkpoint2_logs.json		checkpoint2_logs.json
checkpoint_logs.json		checkpoint_logs.json
dev.sh		dev.sh
docker-compose-dev.yaml		docker-compose-dev.yaml
docker-compose-prod.yaml		docker-compose-prod.yaml
docker-compose.yaml		docker-compose.yaml
entrypoint.sh		entrypoint.sh
qdrant_config.yaml		qdrant_config.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Graxon

Table of Contents

Overview

Images

Videos

UI

Graph DB

Architecture

Infrastructure

Data Model

Hierarchy

Graph Node: Chunk

Images

Ingestion Pipeline

Images

Agents

Vector Similarity Sync

Lexical Engine

Entity Extraction (NER)

Concept Extraction (Noun Phrases)

TF-IDF Keyword Linking

Phrase Bridge Detection

Acronym Resolution

Edge Construction

Edge Deduplication

Resilient Ingestion & Checkpointing

Zero-Loss Macro & Micro Checkpointing

Ironclad Idempotency & Atomicity

Multi-Engine Parallel Fan-Out

Query Pipeline

Flow Overview

Query Types & Depth

Quick

Smart

Expert

Expert Chunk Scoring

Reranking & Answer Generation

Configuration

Multipart Upload & Resume

Upload Flow

Backend: MinioUploadClient

Frontend: Session-Aware Resume

Failure & Retry Behavior

Getting Started

Clone the repository

Execution Choices

1. Local Development (Native)

2. Docker Development (With Container HMR)

To view live container logs:

3. Docker Production

To view live container logs:

Stopping Containers

For Development:

For Production:

Swagger

Docker Hub

Adding New Migrations

Rolling Back Migrations

Checking Migration Status

Seeding

Spacy

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Graph Node: `Chunk`

Backend: `MinioUploadClient`

Packages