Skip to content

VernonOY/paper2alpha

Repository files navigation

paper2alpha

Extract executable alpha factors from research PDFs via LLM — Chinese brokerage reports first.

paper2alpha turns a quant research PDF into structured ResearchCard metadata and generator-ready Python factor code, with automatic static-analysis guardrails from qtype.

Quick start

uv sync
export OPENAI_API_KEY=sk-...
uv run p2a run path/to/report.pdf --out ./out

Outputs:

  • out/card.json — structured ResearchCard
  • out/factor_<name>.py — generator code (qtype-clean)
  • out/qtype_report.json — static-analysis report

Pipeline

PDF → PyMuPDF parse → LLM extract (JSON mode) → Pydantic ResearchCard
                                                   │
                                                   ▼
                         Jinja2 template ← factor stubs ← qtype static check

Configuration

Create p2a.toml in your working directory:

[llm]
provider = "openai"          # "openai" is the default; Anthropic / local via custom LLMClient
model    = "gpt-4o-mini"

Set the matching env var (OPENAI_API_KEY or ANTHROPIC_API_KEY). API keys are never read from the TOML file.

Known limitations (v0.1)

  • Generated factor bodies are stubs (raise NotImplementedError); only the metadata, constants, and docstrings are filled in. LLM-bodied generation is scoped to v0.2.
  • Table / formula image extraction is heuristic — numeric metrics may be missed on scanned PDFs.
  • Only OpenAI is wired out of the box. Plug in another provider by writing a small class (≈20 LOC) against the paper2alpha.core.llm_client.LLMClient Protocol.

Roadmap

  • v0.2: English (arXiv / SSRN) support + LLM-bodied factor code with context-distiller retrieval augmentation.
  • v0.3: batch processing + factor deduplication (rank correlation against an existing library).
  • v0.4: paper citation-graph traversal.

License

MIT — see LICENSE. Vendored components (qtype, context_distiller) retain their upstream licenses; see LICENSE-VENDORED.md.

About

Extract executable alpha factors from research PDFs via LLM — Chinese brokerage reports first.

Resources

License

MIT, Unknown licenses found

Licenses found

MIT
LICENSE
Unknown
LICENSE-VENDORED.md

Stars

Watchers

Forks

Packages

 
 
 

Contributors