paper2alpha

Extract executable alpha factors from research PDFs via LLM — Chinese brokerage reports first.

paper2alpha turns a quant research PDF into structured ResearchCard metadata and generator-ready Python factor code, with automatic static-analysis guardrails from qtype.

Quick start

uv sync
export OPENAI_API_KEY=sk-...
uv run p2a run path/to/report.pdf --out ./out

Outputs:

out/card.json — structured ResearchCard
out/factor_<name>.py — generator code (qtype-clean)
out/qtype_report.json — static-analysis report

Pipeline

PDF → PyMuPDF parse → LLM extract (JSON mode) → Pydantic ResearchCard
                                                   │
                                                   ▼
                         Jinja2 template ← factor stubs ← qtype static check

Configuration

Create p2a.toml in your working directory:

[llm]
provider = "openai"          # "openai" is the default; Anthropic / local via custom LLMClient
model    = "gpt-4o-mini"

Set the matching env var (OPENAI_API_KEY or ANTHROPIC_API_KEY). API keys are never read from the TOML file.

Known limitations (v0.1)

Generated factor bodies are stubs (raise NotImplementedError); only the metadata, constants, and docstrings are filled in. LLM-bodied generation is scoped to v0.2.
Table / formula image extraction is heuristic — numeric metrics may be missed on scanned PDFs.
Only OpenAI is wired out of the box. Plug in another provider by writing a small class (≈20 LOC) against the paper2alpha.core.llm_client.LLMClient Protocol.

Roadmap

v0.2: English (arXiv / SSRN) support + LLM-bodied factor code with context-distiller retrieval augmentation.
v0.3: batch processing + factor deduplication (rank correlation against an existing library).
v0.4: paper citation-graph traversal.

License

MIT — see LICENSE. Vendored components (qtype, context_distiller) retain their upstream licenses; see LICENSE-VENDORED.md.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
src/paper2alpha		src/paper2alpha
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
LICENSE-VENDORED.md		LICENSE-VENDORED.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

paper2alpha

Quick start

Pipeline

Configuration

Known limitations (v0.1)

Roadmap

License

About

Licenses found

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

paper2alpha

Quick start

Pipeline

Configuration

Known limitations (v0.1)

Roadmap

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages