PaperCut — ArXiv Agent

Cut through the paper. PaperCut analyses ML research papers, extracts a structured research contract, generates a PyTorch implementation scaffold and CUDA optimisation blueprint, then lets you explore the paper through a streaming chat interface.

Features

Scope validation — Gemini Flash quickly checks whether a paper is an ML research paper before spending tokens on deeper analysis
Structured extraction — Gemini Pro extracts a ResearchContract (model type, architecture, training recipe, datasets, etc.)
Code generation — Node 2 generates a PyTorch scaffold; Node 3 produces a CUDA/hardware blueprint
Streaming pipeline — live progress updates via Server-Sent Events as each node completes
Streaming chat — RAG-style Q&A about the paper with token-level streaming
GCS persistence — PDFs and generated artefacts are stored in Google Cloud Storage; sessions survive server restarts
VSCode-style code viewer — file tree + syntax-highlighted code panel with line-wrap and font-size controls, resizable
ArXiv URL support — paste an /abs/ URL and PaperCut converts it to the PDF automatically
Global drag-and-drop — drop a PDF anywhere on the page to start a new analysis
Session management — rename, delete, and search across sessions; inline double-click rename in sidebar
Toast notifications — non-blocking success/error/info toasts throughout the UI

Pipeline

Node 0  Gemini 2.5 Flash       →  scope validate  →  PASS / FAIL
              ↓ PASS only
Node 1  Gemini 2.5 Pro         →  extract ResearchContract JSON
              ↓
Node 2  Qwen2.5-Coder 32B      →  PyTorch scaffold (model.py, train.py, dataset.py, config.yaml)
              ↓
Node 3  Nemotron Super 49B     →  annotated CUDA C++ stubs + bottleneck analysis

Node	Model	Role
Node 0	Gemini 2.5 Flash (Google)	Scope validator — strict PASS/FAIL for ML papers
Node 1	Gemini 2.5 Pro (Google)	Architecture ingestor — extracts typed JSON blueprint
Node 2	Qwen2.5-Coder 32B (NVIDIA NIM)	PyTorch scaffold generator
Node 3	Nemotron Super 49B (NVIDIA NIM)	CUDA stub + bottleneck analysis generator

The graph is compiled with LangGraph in backend/graph.py. Node 2 and Node 3 call NVIDIA's hosted NIM API (OpenAI-compatible) via the nat/ package. FastAPI routes in backend/routes/ stream the full pipeline via SSE.

Directory layout

ArXiv_Agent/
├── backend/
│   ├── main.py                  # FastAPI app, CORS, lifespan
│   ├── config.py                # Env vars, model strings
│   ├── graph.py                 # LangGraph pipeline
│   ├── app_state.py             # In-memory session store + artefact upload helpers
│   ├── session_store.py         # GCS session persistence
│   ├── nodes/
│   │   ├── node0_validator.py   # Gemini Flash scope check
│   │   ├── node1_ingestor.py    # Gemini Pro extraction
│   │   ├── node2_client.py      # Bridge → nodes/node2_pytorch_architect.py
│   │   └── node3_client.py      # Bridge → nodes/node3_hardware_blueprint.py
│   ├── routes/
│   │   ├── sessions.py          # CRUD + artifact endpoints
│   │   ├── pipeline.py          # /run/pipeline/stream SSE
│   │   ├── chat.py              # /chat/stream SSE
│   │   └── health.py            # GET /health
│   ├── services/
│   │   ├── pipeline_runtime.py  # Shared codegen node runner + SSE helpers
│   │   ├── pipeline_state.py    # Applies node results into SessionData
│   │   └── session_views.py     # Shapes SessionData into API responses
│   ├── chat/
│   │   └── chat_handler.py      # Context builder + Gemini streaming chat
│   ├── schemas/                 # Pydantic models
│   └── tools/
│       ├── gemini_client.py     # google-genai SDK wrapper
│       └── artifact_store.py    # GCS artifact upload/download
├── nodes/                       # Real Node 2 & 3 implementations
│   ├── node2_pytorch_architect.py   # Qwen2.5-Coder scaffold generator
│   └── node3_hardware_blueprint.py  # Nemotron CUDA stub generator
├── nat/                         # NVIDIA NIM API client package
│   ├── nat_client.py            # HTTP caller, NATError/NATTimeoutError/NATAuthError
│   ├── nat_config.py            # Reads NVIDIA_API_KEY, model names from .env
│   └── __init__.py              # Exports make_nat_caller_code/reason factories
├── contracts/
│   └── architecture_blueprint_schema.json
├── outputs/                     # Local artefact root (gitignored)
│   └── sessions/{session_id}/
│       ├── pytorch_scaffold/
│       └── hardware_blueprint/
└── frontend/
    ├── src/
    │   ├── App.jsx              # Root state, routing
    │   ├── api/client.js        # All fetch/SSE calls
    │   ├── components/
    │   │   ├── Sidebar.jsx      # Session list, search, rename
    │   │   ├── UploadPanel.jsx  # PDF upload + ArXiv URL
    │   │   ├── PipelineProgress.jsx
    │   │   ├── ChatView.jsx     # Messages, starter questions
    │   │   ├── ChatMessage.jsx  # Markdown, copy button
    │   │   ├── CodePanel.jsx    # File tree + syntax viewer
    │   │   ├── MessageInput.jsx
    │   │   ├── SplashScreen.jsx
    │   │   └── Toast.jsx
    │   ├── hooks/useToast.js
    │   └── index.css
    └── package.json

Prerequisites

Tool	Version
Python	3.11+
Node.js	18+
Google Cloud SDK	any recent
GCP project	with Vertex AI + GCS enabled

Setup

1. Clone and create the virtual environment

git clone <repo-url>
cd PaperCut
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Configure environment variables

Create a .env file at the repo root:

# Google Cloud
GCP_PROJECT_ID=your-project-id
GCP_BUCKET_NAME=your-gcs-bucket
GCP_REGION=us-central1
GOOGLE_APPLICATION_CREDENTIALS=./gcp-key.json

# Gemini (defaults shown)
GEMINI_FLASH_MODEL=gemini-2.5-flash
GEMINI_PRO_MODEL=gemini-2.5-pro

# NVIDIA NIM
NVIDIA_API_KEY=nvapi-...
NVIDIA_API_BASE=https://integrate.api.nvidia.com/v1
NAT_MODEL_CODE=qwen/qwen2.5-coder-32b-instruct
NAT_MODEL_REASON=nvidia/nemotron-super-49b-v1
NAT_TIMEOUT=120

Place your GCP service-account key at gcp-key.json (gitignored). The key needs Storage Object Admin and Vertex AI User roles.

3. Create the GCS bucket

gcloud storage buckets create gs://your-gcs-bucket --location=us-central1

4. Install frontend dependencies

cd frontend
npm install

Running

Backend

# from repo root
uvicorn backend.main:app --reload

Server starts on http://localhost:8000. Interactive API docs at http://localhost:8000/docs.

Frontend

cd frontend
npm run dev

Vite dev server starts on http://localhost:5173.

API reference

Method	Path	Description
`GET`	`/health`	Health check
`GET`	`/sessions`	List all sessions
`POST`	`/sessions/upload`	Upload PDF, create session
`GET`	`/sessions/{id}`	Get session detail
`PATCH`	`/sessions/{id}/name`	Rename session
`DELETE`	`/sessions/{id}`	Delete session + GCS artefacts
`GET`	`/sessions/{id}/history`	Chat message history
`GET`	`/sessions/{id}/artifacts`	List generated files
`GET`	`/sessions/{id}/artifacts/{group}/{file}`	Download file content
`POST`	`/run/pipeline/stream`	Run full pipeline (SSE)
`POST`	`/run/node0`	Run scope validation only
`POST`	`/run/node1`	Run extraction only
`POST`	`/run/node2`	Run scaffold generation only
`POST`	`/run/node3`	Run CUDA blueprint only
`POST`	`/chat/stream`	Chat with paper context (SSE)

SSE event shapes

Pipeline stream (/run/pipeline/stream):

data: {"type":"node_start","node":0}
data: {"type":"node_done","node":0,"scope_valid":true}
data: {"type":"node_done","node":1,"model_type":"transformer"}
data: {"type":"done","scope_valid":true}
data: {"type":"error","message":"..."}

Chat stream (/chat/stream):

data: {"type":"status","text":"Thinking..."}
data: {"type":"token","text":"<chunk>"}
data: {"type":"done"}

GCS layout

gs://{bucket}/
└── papers/
    └── {session_id}/
        ├── paper.pdf
        ├── session.json
        ├── scaffold/        # Node 2 output files
        └── hardware/        # Node 3 output files

Scope validation

Two-layer validation prevents wasting Pro tokens on non-ML papers:

Layer 1 — Node 0 (Flash): Strict prompt requiring the paper's primary contribution to be a new ML/AI model or architecture. Papers that merely apply ML as a tool, surveys, and non-ML papers all fail.
Layer 2 — Node 1 (Pro): If Node 0 was not pre-run, the extraction model self-validates. If Node 0 already passed, extraction is forced — no second-guessing — preventing nondeterministic scope disagreement.

Key data models

ResearchContract (backend/schemas/contract.py) — structured extraction from Node 1. Contains model type, architecture details, training recipe, datasets, evaluation metrics, and novelty claims.

AgentState (backend/schemas/state.py) — LangGraph TypedDict threaded through all nodes. Holds pdf_gcs_uri, scope_valid, scope_reason, blueprint, scaffold_code, cuda_blueprint, session_id, and error.

ScopeValidationResult (backend/schemas/validator.py) — Node 0 output: result (PASS/FAIL) and reason string.

Testing

# Run all tests
pytest

# Run a specific test file
pytest backend/tests/test_node0.py -v

Notes

Sessions are loaded from GCS on server start, so they survive restarts.
Node 2 runs Qwen2.5-Coder 32B via NVIDIA NIM; Node 3 runs Nemotron Super 49B. Both are called through the nat/ package which handles timeouts, retries, and auth errors uniformly.
Node 3 retries up to 2 times on NATError — Nemotron occasionally returns null content on the first attempt.
All Gemini JSON responses are passed through strip_markdown_fences() before json.loads() to handle code-fenced model responses.
All I/O is non-blocking: generate_content_async for Gemini, asyncio.to_thread for GCS and Node 2/3 (CPU-bound), fire-and-forget asyncio.create_task for background artefact uploads.
Chat context includes the blueprint JSON plus up to 3000 chars each from model.py, train.py, dataset.py, config.yaml. Files are read from local outputs/ first, falling back to GCS download.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
backend		backend
contracts		contracts
frontend		frontend
nat		nat
nodes		nodes
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaperCut — ArXiv Agent

Features

Pipeline

Directory layout

Prerequisites

Setup

1. Clone and create the virtual environment

2. Configure environment variables

3. Create the GCS bucket

4. Install frontend dependencies

Running

Backend

Frontend

API reference

SSE event shapes

GCS layout

Scope validation

Key data models

Testing

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PaperCut — ArXiv Agent

Features

Pipeline

Directory layout

Prerequisites

Setup

1. Clone and create the virtual environment

2. Configure environment variables

3. Create the GCS bucket

4. Install frontend dependencies

Running

Backend

Frontend

API reference

SSE event shapes

GCS layout

Scope validation

Key data models

Testing

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages