This dataset contains 23 research papers, each provided in two formats:
- PDF file — the readable paper.
- tar.gz archive — LaTex source code from arXiv.
This collection is curated as part of the Meta PaperBench dataset initiative, which aims to provide standardized research paper corpora for evaluation, benchmarking, and reproducible experimentation. Some papers may only include a PDF if no supplementary material was available.
dataset/
│
├── paper_01/
│ ├── paper_01.pdf
│ └── paper_01.tar.gz
│
├── paper_02/
│ ├── paper_02.pdf
│ └── paper_02.tar.gz
│
...
│
├── paper_23/
│ ├── paper_23.pdf
│ └── paper_23.tar.gz
│
└── README.md
The following papers in the dataset do not include a .tar.gz archive and contain only the PDF version:
| Data | File Name | Paper Name |
|---|---|---|
| bridging-data-gaps | icml2024.pdf |
Bridging Data Gaps in Diffusion Models with Adversarial Noise-Based Transfer Learning |
Each .pdf file contains the complete research paper.
Each .tar.gz archive may contain one or more of:
- LaTeX or manuscript source files
- Figures and images
- Other supplementary materials
- 23 PDF files
- Up to 22 tar.gz files
- Structured into 23 directories