HyperRAG

HyperRAG: Reasoning N-ary Facts over Hypergraphs for Retrieval Augmented Generation

Accepted at The Web Conference (WWW) 2026

Overview

HyperRAG addresses a fundamental limitation of conventional RAG systems: the inability to capture N-ary facts — relationships that involve more than two entities simultaneously (e.g., "Person A received Award B from Organization C in Year D").

Instead of decomposing such facts into binary edges (losing relational context), HyperRAG encodes them as hyperedges in a hypergraph, where a single edge can connect any number of nodes. This structure enables:

Faithful representation of complex, multi-entity facts without information loss
Structured multi-hop reasoning by traversing hyperedges across the graph
Precise retrieval via a trained MLP-based retriever (HyperRetriever) that scores candidate hyperedges given a query

The system consists of two core modules:

Module	Role	Key Mechanism
HyperMemory	Memory-Guided Beam Retriever	Leverages the LLM’s parametric memory to guide beam search over n-ary facts without extra training.
HyperRetriever	Learnable Relational Retriever	Uses a trained MLP to fuse structural and semantic signals for adaptive, query-aware chain extraction.

The codebase builds on HyperGraphRAG.

Installation

conda create -n hyperrag python=3.11.13
conda activate hyperrag
pip install -r requirements.txt

Configuration

Create a config.json file in the project root with your API credentials:

{
    "openai_api_key": "YOUR_OPENAI_API_KEY"
}

Datasets

WikiTopics (Closed Domain)

WikiTopics is a closed-domain multi-hop QA dataset organized into 11 topic domains:

Domain	Key	Domain	Key
Art	`art`	Infrastructure	`infra`
Award	`award`	Location	`loc`
Education	`edu`	Organization	`org`
Health	`health`	People	`people`
Science	`sci`	Sport	`sport`
Taxonomy	`tax`

Each domain provides both a Knowledge Graph (KG) version and a Natural Language (NLG) version. The main method uses the NLG version; the KG version is used for ablation studies.

Version	Download
Full Dataset — KG	🔗 WikiTopics KG
Full Dataset — NLG	🔗 WikiTopics NLG
Sampled Dataset (1%)	`dataset/wikitopics_test_sampled`

After downloading the full WikiTopics dataset, place it in the dataset/ folder:

dataset/
├── open_domain_dataset/
├── open_domain_splitted/
├── wikitopics_test_sampled/
└── WikiTopicsQE_NLG/          <-- place the full WikiTopics dataset here

Open Domain

Split	Path
Full dataset	`dataset/open_domain_dataset`
Pre-split for training/testing	`dataset/open_domain_splitted`

Open domain includes: 2WikiMultiHopQA, HotpotQA, and MuSiQue.

Project Structure

.
├── dataset/
│   ├── open_domain_dataset/
│   ├── open_domain_splitted/
│   ├── wikitopics_test_sampled/
│   └── WikiTopicsQE_NLG/
├── evaluate/
│   ├── qa_eval_EM_F1.py              # Open domain evaluation
│   └── qa_eval_MRR_HIT.py            # Closed domain evaluation
├── HyperMemory/                      # Graph construction + memory-based QA (WikiTopics)
├── HyperMemory_open/                 # Memory-based QA (open domain)
├── HyperMemory_token/                # Token efficiency variant
├── HyperRetriever/                   # MLP retriever QA (WikiTopics)
├── HyperRetriever_open/              # MLP retriever QA (open domain)
├── HyperRetriever_token/             # Token efficiency variant
├── HyperRetriever_token_kg/          # Token efficiency variant (KG input)
└── results/                          # Auto-generated inference outputs

Pipeline: WikiTopics (Closed Domain)

Replace {DOMAIN} with one of the 11 domain keys (e.g., art, award, edu, ...).

Step 1 — Build the Hypergraph

This step constructs the hypergraph from the WikiTopics NLG corpus. The hypergraph encodes N-ary facts as hyperedges and is shared by both HyperMemory and HyperRetriever. Outputs are written to an expr/ directory in the project root.

cd HyperMemory
python wikitopics_construct.py {DOMAIN}

Output: expr/{DOMAIN}/ — contains the hypergraph structure, node/edge embeddings, and associated index files used in all downstream steps.

Step 2 — HyperMemory Inference

Run question answering directly over the constructed hypergraph using the memory-based approach. No additional training is required.

# From HyperMemory/
python wikitopics_query.py {DOMAIN}

Output: results/HyperMemory/{DOMAIN}_output.jsonl

Step 3 — HyperRetriever Training

HyperRetriever improves retrieval precision by training a lightweight MLP on top of hypergraph embeddings. Complete both sub-steps before running inference.

3a. Prepare Training Data

cd HyperRetriever
python retrieve/prepare.py {DOMAIN}

3b. Train the MLP

python retrieve/train.py {DOMAIN}

Output: A trained MLP checkpoint saved under expr/ for the specified domain.

Step 4 — HyperRetriever Inference

Run question answering using the trained retriever.

# From HyperRetriever/
python wikitopics_query.py {DOMAIN}

Output: results/HyperRetriever/{DOMAIN}_output.jsonl

Pipeline: Open Domain

The open domain pipeline follows the same logic as WikiTopics. Use the modules ending with _open.

Evaluation

All inference scripts automatically write results to results/{MODULE}/, with filenames ending in _output.jsonl.

Closed Domain — WikiTopics

Use MRR (Mean Reciprocal Rank) and Hit Rate to evaluate answer ranking quality:

python evaluate/qa_eval_MRR_HIT.py --model {OUTPUT_FOLDER} {DATASET}

Example:

python evaluate/qa_eval_MRR_HIT.py --model HyperMemory art

Open Domain — 2Wiki / HotpotQA / MuSiQue

Use Exact Match (EM) and F1 Score to evaluate answer extraction quality:

python evaluate/qa_eval_EM_F1.py --model_name {OUTPUT_FOLDER} {DATASET}

Metric Summary

Dataset Type	Metric	Script
Closed domain (WikiTopics)	MRR, Hit Rate	`qa_eval_MRR_HIT.py`
Open domain (2Wiki, HotpotQA, MuSiQue)	Exact Match, F1	`qa_eval_EM_F1.py`

License

This project is licensed under the MIT License — see the LICENSE file for details.

Citation

If you use HyperRAG in your research, please cite:

@inproceedings{lien2026hyperrag,
    title={HyperRAG: Reasoning N-ary Facts over Hypergraphs for Retrieval Augmented Generation},
    author={Wen-Sheng Lien, Yu-Kai Chan, Hao-Lung Hsiao, Bo-Kai Ruan, Meng-Fen Chiang, Chien-An Chen, Yi-Ren Yeh and Hong-Han Shuai},
    booktitle={The Web Conference (WWW)},
    year={2026}
}

TODO

arXiv Paper link
Installation Instructions
Integrate token counter function into modules

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HyperRAG

Overview

Table of Contents

Installation

Configuration

Datasets

WikiTopics (Closed Domain)

Open Domain

Project Structure

Pipeline: WikiTopics (Closed Domain)

Step 1 — Build the Hypergraph

Step 2 — HyperMemory Inference

Step 3 — HyperRetriever Training

3a. Prepare Training Data

3b. Train the MLP

Step 4 — HyperRetriever Inference

Pipeline: Open Domain

Evaluation

Closed Domain — WikiTopics

Open Domain — 2Wiki / HotpotQA / MuSiQue

Metric Summary

License

Citation

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
HyperMemory		HyperMemory
HyperMemory_open		HyperMemory_open
HyperMemory_token		HyperMemory_token
HyperRetriever		HyperRetriever
HyperRetriever_open		HyperRetriever_open
HyperRetriever_token		HyperRetriever_token
HyperRetriever_token_kg		HyperRetriever_token_kg
dataset		dataset
evaluate		evaluate
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

HyperRAG

Overview

Table of Contents

Installation

Configuration

Datasets

WikiTopics (Closed Domain)

Open Domain

Project Structure

Pipeline: WikiTopics (Closed Domain)

Step 1 — Build the Hypergraph

Step 2 — HyperMemory Inference

Step 3 — HyperRetriever Training

3a. Prepare Training Data

3b. Train the MLP

Step 4 — HyperRetriever Inference

Pipeline: Open Domain

Evaluation

Closed Domain — WikiTopics

Open Domain — 2Wiki / HotpotQA / MuSiQue

Metric Summary

License

Citation

TODO

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages