Skip to content

iSathyam31/PaperGraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔬 PaperGraph

GraphRAG over arXiv — explore, ingest, and query scientific research through a structured knowledge graph.

PaperGraph is a high-performance GraphRAG (Retrieval-Augmented Generation over Knowledge Graphs) system. It transforms flat research papers into a multi-dimensional knowledge graph of Entities and Relationships, enabling deep reasoning that standard vector-based RAG cannot achieve.


🏗️ System Architecture

PaperGraph follows a two-stage architecture: Automated Ingestion and Hybrid Retrieval.

Stage 1 — Ingestion Pipeline

  +-------------+       +-------------------+       +-----------------------+
  |  arXiv API  | ----> | Python Ingestion  | ----> | LLM Extraction (GPT)  |
  +-------------+       +---------+---------+       +-----------+-----------+
                                  |                             |
                                  v                             v
                        +---------+---------+       +-----------+-----------+
                        |  Neo4j AuraDB     | <---- |   Knowledge Graph     |
                        +---------+---------+       +-----------------------+
                                  |
                                  v
                        +---------+---------+
                        |  Vector Index     |
                        +-------------------+

Stage 2 — Retrieval Pipeline (Hybrid GraphRAG)

      [ User Question ]
              |
              v
     +-----------------+
     |  Query Router   |
     +--------+--------+
              |
      /-------+-------\
      |               |
  [ GLOBAL ]      [ LOCAL ]
      |               |
  [ Cypher ]    [ Hybrid Search ]
      |               |
  [ Stats  ]    [ Graph Expand ]
      |               |
      \-------+-------/
              |
              v
     +-----------------+
     |  GPT-4o Answer  |
     +--------+--------+
              |
              v
         [ UI App ]

📂 Project Structure

The project is split into a robust Python backend and a premium React frontend.

papergraph/
├── backend/                # FastAPI Application
│   ├── main.py             # API Entry Point
│   ├── ingestion/          # arXiv fetching & processing
│   ├── graph/              # Neo4j connections & LLM extraction
│   └── retrieval/          # Hybrid GraphRAG logic
├── frontend/               # Next.js Application
│   ├── src/app/            # Pages & UI Components
│   ├── package.json        # Node.js dependencies
│   └── public/             # Static Assets
├── .env                    # Environment variables (Azure, Neo4j)
├── requirements.txt        # Python dependencies
└── README.md

---

## 🚀 Getting Started

### 1. Backend Setup
1. Create a `.env` file (see `.env.example`).
2. Install dependencies: `pip install -r requirements.txt`
3. Start the API: `uvicorn backend.main:app --reload`

### 2. Frontend Setup
1. Navigate to `/frontend`.
2. Install dependencies: `npm install`
3. Start the UI: `npm run dev`

### 3. Data Ingestion
PaperGraph is built for scale. 
*   **Current State:** Successfully ingested **50 documents** as a baseline.
*   **Scalability:** You can easily ingest **500, 5,000, or more** documents by adjusting the `--max` parameter in the ingestion script.
```bash
python -m backend.ingestion.pipeline --max 100 --category cs.AI

💡 Key Considerations

  • Knowledge Density: GraphRAG performs best when papers are within the same domain (e.g., cs.AI), allowing for rich link extraction between authors and methods.
  • Cost Efficiency: Entity extraction uses one LLM call per abstract. Ingesting 50 documents costs ~$0.10 with GPT-4o.
  • Neo4j Aura: Using the free tier of Neo4j Aura provides plenty of space for ~1,000 research papers and their associated entities. Note: Free tier instances may expire after 14 days of inactivity.

📊 Graph Schema

Node Description Relationships
Paper Research Articles AUTHORED_BY, CITES, PROPOSES
Author Researchers AFFILIATED_WITH
Method Algorithms/Models USED_BY, PROPOSED_IN
Institution Universities/Labs AFFILIATES

Built with Neo4j Aura · OpenAI GPT-4o · FastAPI · Next.js


🔗 Demo Video: You can see the application walkthrough here

About

PaperGraph is a GraphRAG-based system that converts research papers into a structured knowledge graph for deeper semantic search and exploration. Discover relationships between concepts, authors, and ideas across scientific papers.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors