Skip to content

buua436/PAPERFlow

Repository files navigation

PAPERFlow

An AI-native paper workspace for search, library management, parsing, analysis, and writing.

Python FastAPI Vue MySQL Redis MinerU


Overview

PAPERFlow is a research workflow system built around a simple idea:

  • search papers from external sources
  • organize them into reusable libraries
  • download and parse PDFs into Markdown
  • analyze papers into structured outputs
  • prepare content for downstream writing and knowledge workflows

The current implementation already includes a working frontend, backend APIs, asynchronous task execution, paper library management, arXiv search, MinerU-based parsing, Markdown preview, and LLM-powered analysis.

Highlights

  • Paper Search Search arXiv papers with direct search or LLM-assisted query rewriting.

  • Paper Libraries Manage multiple libraries while reusing the same global paper/file assets.

  • Async Task Pipeline Parse, analyze, and translate papers through background tasks backed by Redis workers.

  • MinerU Integration Parse downloaded PDFs into Markdown and asset folders for preview.

  • Structured Analysis Convert papers into structured research summaries instead of only plain-text notes.

  • LLM Configuration Center Configure providers, models, and module bindings from the UI.

Product Snapshot

Search -> Add to Library -> Download PDF -> Parse to Markdown -> Preview -> Analyze -> Reuse in Writing

Architecture

flowchart LR
    UI[Vue 3 Frontend] --> API[FastAPI Backend]
    API --> DB[(MySQL)]
    API --> Redis[(Redis Queue)]
    Worker[Background Worker] --> Redis
    Worker --> DB
    Worker --> Workspace[workspace/ files]
    API --> AGENTFlow[AGENTFlow]
    AGENTFlow --> LLM[LLM Providers]
    AGENTFlow --> MinerU[MinerU Parser]
    API --> Arxiv[arXiv Connector]
Loading

Repository Structure

PAPERFlow/
+-- src/paperflow/                 # FastAPI app, services, models, workers
+-- frontend/                      # Vue 3 + Vite frontend
+-- doc/                           # Design docs
+-- workspace/                     # Downloaded PDFs, parsed Markdown, generated assets
+-- docker-compose.yml             # MySQL + Redis + backend + worker + frontend
+-- Dockerfile                     # Backend image
+-- pyproject.toml                 # Python dependencies
+-- README.md

Core Modules

1. Paper Search

  • arXiv search integration via agentflow.connectors.arxiv_connector
  • advanced search and query planning
  • optional LLM-based query rewriting
  • add search results directly into a paper library

2. Paper Libraries

  • multiple libraries
  • shared global paper records
  • library-level tags and notes
  • remove-from-library without deleting shared files

3. Parsing and Preview

  • download PDF into local workspace
  • parse PDF through MinerU
  • keep Markdown assets available for preview
  • render formulas and images in the frontend
  • collapsible preview by heading hierarchy

4. Paper Analysis

  • background analysis task
  • structured result output
  • overview, findings, methods, datasets, metrics, limitations, and reliability fields

5. LLM Center

  • provider/model configuration
  • encrypted API key storage
  • module binding and fallback resolution

Tech Stack

Backend

  • FastAPI
  • SQLAlchemy
  • MySQL
  • Redis
  • Uvicorn
  • AGENTFlow 0.1.1

Frontend

  • Vue 3
  • Vite
  • marked
  • marked-katex-extension
  • katex
  • lucide-vue-next

Requirements

  • Python 3.11+
  • Node.js 18+
  • MySQL 8+
  • Redis 7+
  • uv for Python environment management

Installation

1. Clone the repository

git clone https://github.com/buua436/PAPERFlow.git
cd PAPERFlow

2. Create environment and install backend dependencies

uv sync

This project uses buua-agentflow[llms,mineru-pipeline]==0.1.1 from TestPyPI.
The source is already configured in pyproject.toml.

3. Install frontend dependencies

cd frontend
npm install
cd ..

Environment Configuration

Copy .env.example to .env and adjust the values for your local environment.

cp .env.example .env

Important settings:

PAPERFLOW_DATABASE_URL=mysql+pymysql://root:password@127.0.0.1:3306/paperflow?charset=utf8mb4
PAPERFLOW_API_KEY_SECRET=change-me-for-production
PAPERFLOW_CORS_ALLOW_ORIGINS=http://127.0.0.1:5173,http://localhost:5173
PAPERFLOW_REDIS_URL=redis://127.0.0.1:6379/0
PAPERFLOW_USE_REDIS_WORKER=true
PAPERFLOW_USE_AGENTFLOW_MINERU=true

Quick Start

Option A: Full stack with Docker Compose

docker compose up --build

Services:

  • Frontend: http://127.0.0.1:5173
  • Backend API: http://127.0.0.1:8000
  • OpenAPI: http://127.0.0.1:8000/docs
  • MySQL: 127.0.0.1:3306
  • Redis: 127.0.0.1:6379

Option B: Local development

Start infrastructure:

docker compose up -d mysql redis

Start backend:

uv run uvicorn paperflow.main:app --app-dir src --reload

Start worker:

uv run python -m paperflow.worker

Start frontend:

cd frontend
npm run dev

Default Docker Credentials

database: paperflow
user: paperflow
password: paperflow123
root password: password

API Endpoints

General

  • GET /api/health
  • GET /docs

LLM Admin

  • GET /api/admin/llm-configs
  • POST /api/admin/llm-configs
  • GET /api/admin/llm-bindings
  • POST /api/admin/llm-bindings
  • GET /api/admin/llm-resolve/{module_name}

Paper Search

  • POST /api/papers/search

Libraries

  • GET /api/libraries
  • POST /api/libraries
  • GET /api/libraries/{library_id}/papers
  • POST /api/libraries/{library_id}/papers
  • GET /api/libraries/{library_id}/papers/{paper_id}

Tasks

  • POST /api/libraries/{library_id}/papers/{paper_id}/parse
  • POST /api/libraries/{library_id}/papers/{paper_id}/analyze
  • POST /api/libraries/{library_id}/papers/{paper_id}/translate
  • GET /api/libraries/tasks

Workspace Layout

Generated files are stored under workspace/.

Typical layout:

workspace/
+-- downloads/papers/{paper_id}/source.pdf
+-- parses/papers/{paper_id}/
   +-- source.md
   +-- translation.zh-CN.md
   +-- mineru/
      +-- source/auto/images/...

Current Status

Implemented

  • frontend multi-page workspace UI
  • arXiv search
  • library CRUD
  • add/remove paper from library
  • Redis-backed task dispatch
  • MinerU parsing pipeline integration
  • Markdown preview with formulas and images
  • structured paper analysis
  • task monitor page under settings

In Progress

  • richer task progress feedback
  • stronger paper translation workflow
  • writing module end-to-end integration
  • more robust parsing/analysis retry controls

Development Notes

  • The backend uses Redis for async task dispatch and a dedicated worker process.
  • MinerU parsing is enabled through AGENTFlow integration.
  • Library membership and shared paper files are separated by design.
  • A paper can belong to multiple libraries, but files are reused globally.

Roadmap

  • stronger writing workspace
  • richer library note system
  • export and citation workflows
  • better task progress and retry UI
  • improved translation workflow
  • collaborative workflows

Documentation

License

This repository currently does not declare a separate open-source license file yet.

Acknowledgements

About

AI-native research workspace for paper search, library organization, parsing, structured analysis, and writing workflows.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors