First Open-Source Hybrid RAG to eliminate hallucinations through a persistent Knowledge Graph layer.
Graxon combines dense vector search, sparse retrieval, and a structured Knowledge Graph to deliver accurate, traceable, and context-aware answers — at scale, across multiple organizations, projects, and documents.
- Overview
- Images
- Videos
- Architecture
- Infrastructure
- Data Model
- Ingestion Pipeline
- Lexical Engine
- Resilient Ingestion & Checkpointing
- Query Pipeline
- Multipart Upload & Resume
- Getting Started
- Execution Choices
- Swagger
- Docker Hub
Traditional RAG systems rely purely on vector similarity, which can retrieve plausible but semantically incorrect chunks. Graxon solves this by layering a persistent Knowledge Graph on top of hybrid vector retrieval — connecting chunks through typed, weighted edges that capture real semantic relationships.
Key properties:
- Multi-Tenant — isolated workspaces per organization
- Multi-Project — scoped retrieval within projects
- Multi-Document — fine-grained document management
- Hybrid Retrieval — dense vectors + sparse BM25 + graph traversal
- Hallucination Reduction — graph-grounded answers anchored to structured knowledge
https://drive.google.com/file/d/1Luv6NVNh1e1VJPmGp_eXtB1fy4n7NVey/view?usp=drive_link
https://drive.google.com/file/d/1fi_lNDTBxRy3jGuS0spnw0dNXraPAbwf/view?usp=sharing
Orgs
└── Projects
└── Documents
└── Chunks
Each document is chunked, processed through a multi-agent pipeline, stored in a vector database and a knowledge graph, and wired with semantic edges for retrieval.
| Component | Role |
|---|---|
| Qdrant | Vector database — dense + sparse embeddings |
| Neo4j | Graph database — chunk nodes, semantic edges |
| PostgreSQL | Primary relational database |
| PgBouncer | PostgreSQL connection pooler |
| MinIO | Object storage — raw document files |
| RabbitMQ | Message broker — async pipeline orchestration |
| Redis | In-memory cache — sessions, queues, fast lookups |
Organization
└── Project
└── Document
└── Chunk
Each chunk is stored as a node in Neo4j and a vector in Qdrant. Nodes are connected by typed, weighted edges:
| Edge Type | Description |
|---|---|
PREV / NEXT |
Sequential order within the document |
HAS_TAG |
LLM-extracted semantic tags |
HAS_KEYWORD |
TF-IDF significant keywords |
HAS_PHRASE |
Shared n-gram phrases |
HAS_ENTITY |
Named entities (NER) |
HAS_CONCEPT |
Extracted noun phrases / concepts |
HAS_ACRONYM |
Acronym-to-definition links |
VECTOR_SIMILAR |
High cosine similarity between chunk embeddings |
All edges carry a weight for ranked graph traversal during retrieval.
Graxon uses LangGraph to orchestrate a parallel multi-agent pipeline at ingestion time.
Document
│
▼
Chunking
│
▼
LangGraph Pipeline
├── LLM Agent → tags, inter-chunk relations
├── Embedding Agent → dense vectors (OpenAI / Gemini / Voyage)
├── Sparse Agent → sparse vectors via FastEmbed (BM25 / Qdrant sparse)
└── Lexical Engine → entities, concepts, keywords, phrases, acronyms
│
├── Vector Store Agent
│ └── Qdrant ← dense + sparse embeddings
│
├── Graph DB Agent
│ └── Neo4j ← chunk nodes
│ ├── PREV / NEXT edges
│ ├── HAS_TAG, HAS_KEYWORD, HAS_PHRASE ...
│ └── (all with weights)
│
└── Vector Similarity Sync
└── Top-K similar chunks from Qdrant
└── Neo4j ← VECTOR_SIMILAR edges (with cosine weight)
LLM Agent Sends chunks to an LLM to extract semantic tags and inter-chunk relationships. Results become typed edges in the knowledge graph.
Embedding Agent Generates dense vector embeddings using pluggable providers — OpenAI, Google Gemini, or Voyage AI. Stored in Qdrant for ANN search.
Sparse Embedding Agent Generates sparse vectors via FastEmbed for BM25-style retrieval and Qdrant's sparse vector support. Enables lexical precision alongside semantic recall.
Lexical Engine SpaCy-powered linguistic analysis that extracts structured signals from each chunk. See Lexical Engine below.
After all chunks are stored in Qdrant, Graxon runs a post-ingestion pass: for each chunk, it fetches the top-K most similar chunks by embedding cosine similarity and writes VECTOR_SIMILAR edges into Neo4j with the similarity score as the edge weight. This bridges the vector and graph layers.
Graxon uses SpaCy as its lexical engine to extract structured linguistic signals from chunks, which are then converted into graph edges.
Detects shared named entities — people, organizations, products, and technologies — across chunks. Creates strong semantic links between chunks discussing the same real-world subject, improving graph-based retrieval accuracy.
Extracts meaningful noun phrases and technical concepts shared between chunks. Connects semantically related ideas even when exact keywords differ, improving topic grouping and contextual understanding.
Uses TF-IDF scoring to detect rare but informative keywords appearing across multiple chunks, while filtering common noise words. Highlights statistically important terms that strengthen semantic relationships.
Detects exact shared n-gram phrases between chunks to capture repeated terminology and strong lexical overlap. Especially useful for technical, scientific, and domain-specific documents where repeated phrases carry important meaning.
Detects acronym definitions and links them to later acronym usage throughout the document. Improves long-document comprehension by connecting abbreviated references back to their original meaning.
Converts all detected lexical relationships — entities, concepts, keywords, phrases, and acronyms — into typed, weighted graph edges connecting related chunks in Neo4j.
Removes duplicate or weaker relationships while preserving the strongest semantic connections. Keeps the graph cleaner and more efficient to traverse during retrieval and ranking.
Most RAG pipelines work great in a notebook. At enterprise scale, they fall apart.
Rate limits spike. Workers crash. If your ingestion fails at page 800 of a 1,000-page document, an all-or-nothing architecture forces a full restart — burning engineering time and duplicate LLM tokens.
Graxon is built with an infrastructure-first mindset to make ingestion deterministic, fault-tolerant, and resumable by design.
Graxon treats ingestion like a transaction log, decoupling graph state and persisting checkpoints across two layers:
| Layer | Store | Role |
|---|---|---|
| Micro checkpoints | Redis | Hot in-memory state tracking per chunk |
| Macro checkpoints | MinIO (S3) | Cold artifact backups per document/batch |
If an API provider or worker node crashes mid-ingestion, Graxon hot-boots and resumes from the exact chunk it left off — no full restart, no wasted tokens.
Resuming a failed pipeline usually introduces duplicate vectors or corrupted graph linkages. Graxon guarantees a zero-duplicate footprint by design:
Qdrant
- Deterministic
uuid5hashing seeded by structuredchunk_ids - Ensures every vector upsert is fully idempotent — re-ingesting a chunk overwrites cleanly, never duplicates
Neo4j
- Single-transaction bulk uploads via optimized Cypher
UNWINDclauses - Strict
ON CREATE/ON MATCHstate isolation preserves temporal metadata and guarantees ACID consistency
Rather than sequential processing, Graxon uses LangGraph to run a parallelized scatter-gather pipeline across all storage layers simultaneously:
- Dense vectors → Qdrant (deep semantic similarity)
- Sparse vectors → FastEmbed / BM25 → Qdrant (lexical exact matching)
- Knowledge Graph → Neo4j (chunk nodes, entity tags, structural edges)
- Lexical Analysis → SpaCy (natural textual topology)
All four engines process each chunk concurrently — maximizing throughput and minimizing ingestion latency at scale.
Graxon uses a LangGraph-orchestrated query pipeline with 3 query types and 2 depth levels, giving fine-grained control over retrieval quality vs. speed.
__start__
│
▼
supervisor_agent
│
▼
query_expansion_agent
├── embedding_agent (dense embedding of expanded query)
└── sparse_embedding_agent (sparse / BM25 embedding)
│
▼
vector_database_agent (hybrid retrieval from Qdrant)
│
├── [expert] ──► expert_query_agent ──► expert_query_reranker_agent ──► expert_query_answer_agent
├── [quick] ──► quick_query_agent ──► quick_query_reranker_agent ──► quick_query_answer_agent
└── [smart] ──► smart_query_agent ──► smart_query_reranker_agent ──► smart_query_answer_agent
│
▼
__end__ (answer + metadata / sources)
Every query — regardless of mode or depth — begins with:
- Query expansion via LLM
- Dense + sparse embedding of the expanded query
- Hybrid retrieval from Qdrant (dense + BM25 vectors)
Lightweight retrieval with immediate document context.
| Depth | Retrieval |
|---|---|
| Standard | Vector DB chunks + PREV / NEXT neighbors from Neo4j |
| Advanced | Same as Standard |
Adds graph-based semantic expansion on top of Quick.
| Depth | Retrieval |
|---|---|
| Standard | Quick (Standard) + VECTOR_SIMILAR chunks from Neo4j for each retrieved chunk |
| Advanced | Smart (Standard) + PREV / NEXT neighbors for each VECTOR_SIMILAR chunk |
Full hybrid retrieval combining vector, graph, and lexical signals with a unified chunk scoring system.
| Depth | Retrieval |
|---|---|
| Standard | Smart (Advanced) + embedding comparison of query against Tags, Keywords, Concepts, Entities, Phrases, Acronyms filtered by EQ_GTE_LANE_WEIGHT_THRESHOLD |
| Advanced | Expert (Standard) + Lexical Engine picks best lanes and top EQ_MAX_LANE_ENTITY matches per lane |
Every chunk_id accumulates a score across all retrieval signals:
| Signal | Score |
|---|---|
| Present in Vector DB results | ++ |
Present as PREV / NEXT or VECTOR_SIMILAR neighbor |
++ |
| Matched via query–tag / keyword / concept embedding | ++ |
| Matched via Lexical Engine lane (Expert Advanced only) | ++ |
Top EQ_MAX_CHUNKS chunks by final score are forwarded to the answer agent.
After retrieval, every mode runs:
- Reranker agent — reranks the retrieved chunk set, selects Top-K
- Answer agent — passes expanded query + context window to LLM
- Response — returns the answer with full metadata and sources
| Variable | Description |
|---|---|
EQ_GTE_LANE_WEIGHT_THRESHOLD |
Minimum similarity score for tag / keyword / concept lane matching |
EQ_MAX_CHUNKS |
Maximum chunks selected after expert scoring |
EQ_MAX_LANE_ENTITY |
Top entities picked per lane by the Lexical Engine (Expert Advanced only) |
Graxon's zero-loss checkpointing extends all the way to the browser.
Most upload implementations are fire-and-forget — if the connection drops at 95%, you start over. Graxon's upload system is session-aware and part-level resumable on both the frontend and backend, mirroring the same checkpoint philosophy as the ingestion pipeline.
- bucket: {org_id}
- key: pro*{project_id}/doc*{document_id}/{filename}
Each organization gets its own bucket. Each document gets its own isolated path within the project scope.
Browser
│
├── 1. Check local session (documentId, uploadId, key, completedParts)
│
├── 2. POST /multipart/init (if no session)
│ └── Backend: auto-creates org bucket if missing
│ creates S3 multipart upload
│ returns uploadId + key
│ session persisted on frontend
│
├── 3. For each part:
│ ├── Skip if already in completedParts ──► advance progress
│ ├── GET presigned URL from backend (1hr expiry)
│ └── PUT chunk directly to MinIO (no data through app server)
│
├── 4. POST /multipart/complete
│ └── Backend: sorts parts by PartNumber (S3 requirement)
│ calls S3 complete_multipart_upload
│ registers document via DocumentService
│ triggers ingestion pipeline
│
└── 5. Session deleted on success / preserved on failure for retry
multipart_upload_init
Ensures the org bucket exists (creates it if not), registers a new multipart upload with MinIO, and returns the upload_id and key to the frontend.
get_multipart_presigned_url
Generates a presigned upload_part URL per part with a 1-hour expiry. The browser uploads directly to MinIO — no file data passes through the application server.
complete_multipart_upload
Sorts completed parts by PartNumber (required by S3/MinIO), finalizes the multipart upload.
Session Persistence
Before every upload, the frontend checks for an existing session — documentId, uploadId, key, and all completedParts. If a previous attempt was interrupted, the session is still there.
Part-Level Resume The file is split into fixed-size chunks. For each part, the frontend checks whether it was already uploaded. If so, it skips it and advances the progress bar. Only missing parts are re-uploaded.
Local Part Tracking Completed parts are tracked in a local array during the session to avoid stale store reads. Parts are also persisted to the store for cross-session recovery.
| Scenario | Behavior |
|---|---|
| Connection drops mid-upload | Session preserved, retry resumes from last completed part |
| Browser tab closed | Session persisted, upload resumes on next open |
| Part upload fails | Error surfaced, session intact, retry skips completed parts |
| Upload completes successfully | Session deleted, document handed to DocumentService for ingestion |
git clone https://github.com/Graxon-rag/graxon.git
cd graxonBest if you prefer to run the app directly on your host machine without containerization.
- Create a
.envfrom.env.examplecp .env.example .env
- Create a
Virtual Envandenableitpython -m venv .venv source .venv/bin/activate - Install all dependencies
pip install -r requirements.txt
- Up all the engines/ databases/ store
docker compose up -d
- Run the migration
alembic upgrade head
- Run the server
chmod +x dev.sh ./dev.sh
Best for keeping your host machine clean while maintaining instant hot-reloading (Hot Module Replacement) as you change your code.
-
Create a
.env.dockerfrom.env.docker.examplecp .env.docker.example .env.docker
-
Build the image
docker compose -f docker-compose-dev.yaml build
-
Run Container
docker compose -f docker-compose-dev.yaml up -d
The server will be accessible at http://localhost:8888
docker compose -f docker-compose-dev.yaml logs -f
Run the images from docker-hub
-
Create a
.env.dockerfrom.env.docker.examplecp .env.docker.example .env.docker
-
Run Container
docker compose -f docker-compose-prod.yaml up -d
The server will be accessible at http://localhost:8888
docker compose -f docker-compose-prod.yaml logs -f
If you are running either of the Docker variations, you can spin down the environments using:
- docker compose -f docker-compose-dev.yaml down
- docker compose -f docker-compose-prod.yaml down
After running the Server you can read the swagger
- Username: admin
- Password: admin
https://hub.docker.com/u/graxon
When you make changes to SQLAlchemy models, generate a new migration:
alembic revision --autogenerate -m "your_migration_description"Then apply it:
alembic upgrade headOnly modify tables listed in
GRAXON_OWNED_TABLESinsidemigrations/env.py. Do not add tables owned by other services.
Roll back the last migration:
alembic downgrade -1Roll back to a specific revision:
alembic downgrade <revision_id>alembic current # shows current revision
alembic history # shows full migration historySeeding runs automatically on first startup — no manual step needed.
It inserts:
- Default organization (
dev) - LLM models (OpenAI, Claude, Gemini, DeepSeek)
- Embedding models (OpenAI, Voyage, Gemini)
- Reranker models
- Sparse text models
- Default Neo4j organization node
If you need to re-seed (e.g. after wiping the database), delete the seed_tracker table row:
DELETE FROM seed_tracker;Then restart the server.
spacy download en_core_web_sm





