A Python script that reads a Google Scholar profile URL, fetches all publications, and:
- generates a
publications.bibfile containing full bibliometric information plus the abstract of every paper; - downloads every available open-access PDF;
- downloads the LaTeX source from arXiv (or a compatible repository) for each paper that has one, and stores it in a dedicated folder.
Python 3.10+ and the packages listed in requirements.txt:
pip install -r requirements.txtpython fetch_publications.py "https://scholar.google.com/citations?user=XXXXXXXXX"| Flag | Default | Description |
|---|---|---|
--output-dir DIR |
. |
Directory where publications.bib and downloaded files are written |
--no-pdf |
– | Skip open-access PDF downloads |
--no-source |
– | Skip LaTeX source downloads |
--delay SECONDS |
2.0 |
Pause between Scholar requests (reduce if your IP is rate-limited) |
python fetch_publications.py \
"https://scholar.google.com/citations?user=AbCdEfGhIjK" \
--output-dir ./my_publications \
--delay 3This will create:
my_publications/
├── publications.bib # all entries with abstracts
├── 2023_Deep_Learning_for/
│ ├── paper.pdf # open-access PDF (if available)
│ └── main.tex # arXiv LaTeX source (if available)
├── 2021_Efficient_Neural/
│ └── paper.pdf
└── …
- Google Scholar aggressively rate-limits automated access. If you
encounter errors, try increasing
--delayor use a proxy (see thescholarlydocumentation). - Only publications with an arXiv (or similar) link will have their LaTeX source downloaded.
- BibTeX entries use the
abstractfield, which is supported by most modern LaTeX bibliography backends (BibLaTeX, natbib, etc.).