An intelligent citation assistant that analyzes your LaTeX document, searches academic databases, and generates a complete BibTeX file.
Report Bug
·
Request Feature
·
中文文档
Table of Contents
CiteBot automates the tedious process of finding and formatting references for academic papers. Give it your .tex file and a target number of references — it handles the rest: parsing your document, understanding what you're writing about, searching multiple academic databases in parallel, ranking results by relevance, and generating a ready-to-use .bib file.
- Multi-File Project Support — Pass your main
.texfile or a project directory and CiteBot automatically tracks\input{}/\include{}to parse the entire project. Generates one unified.biband inserts citations into each chapter file - LaTeX Parsing — Extracts title, abstract, sections, and existing citations from
.texfiles (supports\chapter,\section, Chinese documents) - LLM-Powered Keyword Extraction — Uses DeepSeek/OpenAI to understand document semantics and extract precise English academic terms; per-chapter chunked extraction for large projects (100+ keywords). Falls back to NLP ensemble (KeyBERT + YAKE + spaCy) when no LLM API is configured
- Multi-Source Search — Queries OpenAlex, Semantic Scholar, PubMed, arXiv, and BioRxiv in parallel via OpenCite
- Smart Ranking — Composite scoring: keyword overlap (40%), citation count (25%), recency (20%), abstract similarity (15%)
- Deduplication — DOI-based and fuzzy title matching to eliminate duplicates
- BibTeX Generation — Fetches authoritative BibTeX via DOI content negotiation with metadata fallback
- Citation Insertion — Optionally inserts
\cite{}commands into your document (writes to.cited.tex, never overwrites the original). For multi-file projects, each chapter gets its own.cited.tex
conda create -n citebot python=3.11 -y
conda activate citebot
git clone https://github.com/Hayden727/CiteBot.git
cd CiteBot
pip install -e .Copy the example environment file and fill in your API keys:
cp .env.example .env| Variable | Purpose | Required |
|---|---|---|
DEEPSEEK_API_KEY |
LLM keyword extraction (great for non-English docs) | Recommended |
OPENAI_API_KEY |
Alternative LLM (set OPENAI_BASE_URL + OPENAI_MODEL for compatible APIs) |
Optional |
SEMANTIC_SCHOLAR_API_KEY |
Semantic Scholar API (free, recommended for CS) | Recommended |
OPENCITE_EMAIL |
OpenAlex polite pool (higher rate limits) | Recommended |
CROSSREF_EMAIL |
CrossRef polite pool | Optional |
PUBMED_API_KEY |
PubMed/NCBI access | Optional |
CiteBot works without API keys, but keyword quality and search rate limits will be degraded.
# Single-file paper: generate 30 references
citebot paper.tex --num-refs 30 --output references.bib
# Multi-file thesis: pass the main file, auto-tracks \input/\include
citebot main.tex -n 100 -o references.bib -k 50# Pass a directory — auto-finds main.tex / thesis.tex inside
citebot thesis/ -n 100 -o refs.bib
# Insert \cite{} into each chapter file (writes .cited.tex copies)
citebot main.tex -n 100 -o refs.bib --insert-cites
# Filter by year range
citebot paper.tex --year-from 2020 --year-to 2025
# Select specific data sources (CS recommended)
citebot paper.tex --sources s2,openalex,arxiv
# Verbose output with reference table
citebot paper.tex -n 20 -o refs.bib -v| Option | Short | Default | Description |
|---|---|---|---|
--num-refs |
-n |
30 | Number of references to find |
--output |
-o |
references.bib |
Output .bib file path |
--insert-cites |
off | Insert \cite{} into .tex file |
|
--year-from |
none | Minimum publication year | |
--year-to |
none | Maximum publication year | |
--sources |
all | Comma-separated: openalex,s2,pubmed,arxiv,biorxiv |
|
--keywords |
-k |
15 | Number of keywords to extract |
--verbose |
-v |
off | Show detailed output |
┌──────────┐ ┌───────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Parse │───>│ Extract │───>│ Search │───>│ Rank & │───>│ Generate │
│ .tex │ │ Keywords │ │ Papers │ │ Filter │ │ .bib │
└──────────┘ └───────────┘ └──────────┘ └──────────┘ └──────────┘
│
v
┌──────────┐
│ (Insert │
│ cites) │
└──────────┘
- Parse — Reads your
.texfile (or directory), auto-tracks\input{}/\include{}for multi-file projects, extracts title, abstract, sections from all files - Extract Keywords — Uses LLM (DeepSeek/OpenAI) for semantic keyword extraction; for multi-file projects, extracts per-chapter then merges (100+ keywords). NLP ensemble fallback (KeyBERT + YAKE + spaCy)
- Search — Builds queries scaled to keyword count and searches academic databases in parallel via OpenCite
- Rank & Filter — Deduplicates results and scores each paper on keyword overlap (40%), citation count (25%), recency (20%), and abstract similarity (15%)
- Generate — Fetches authoritative BibTeX entries via DOI, falling back to metadata-based generation
- Insert (optional) — Adds
\cite{}commands at relevant positions in each file (.cited.texcopies)
| Source | Coverage | Access |
|---|---|---|
| OpenAlex | 250M+ works across all disciplines | Open, no key required |
| Semantic Scholar | 200M+ papers, CS/biomedical focus | Free API key recommended |
| PubMed | 36M+ biomedical citations | Free API key recommended |
| arXiv | 2M+ preprints in STEM fields | Open |
| BioRxiv | Biology preprints | Open |
Configurable via --sources. For CS papers, --sources s2,openalex,arxiv is recommended.
conda activate citebot
python -m pytest tests/ -v --cov=citebot --cov-report=term-missingCiteBot/
├── citebot/
│ ├── __init__.py Package init
│ ├── types.py Frozen dataclasses + exception hierarchy
│ ├── config.py Configuration (OpenCite + CLI params)
│ ├── latex_parser.py .tex file parsing
│ ├── keyword_extractor.py LLM-first keyword extraction + NLP fallback
│ ├── literature_searcher.py Async multi-source search
│ ├── filter_ranker.py Deduplication + composite scoring
│ ├── bib_generator.py BibTeX generation + validation
│ ├── cite_inserter.py Optional \cite{} insertion
│ ├── pipeline.py Pipeline orchestration
│ └── main.py CLI entry point
├── tests/ Unit + integration tests
├── pyproject.toml Build configuration
├── requirements.txt Pinned dependencies
└── .env.example API key template
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/amazing-feature) - Commit your Changes (
git commit -m 'feat: add amazing feature') - Push to the Branch (
git push origin feature/amazing-feature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.
