Research-reproduction Agent: PDF → factor code → backtest → Red Team Validator → reproducibility score.
replicalpha takes a quant research PDF and produces a complete reproduction bundle in one command — structured paper metadata, executable Python factor code, backtest results, automated Red Team findings, and a plain-English reproducibility score.
A personal timeline for every quant paper you've ever read, with one verdict per paper: does the factor still work?
Multi-lane chronological view, color-coded by reproducibility verdict. Lanes by factor family (momentum / reversal / value / quality / volatility / liquidity / other). Filter by verdict, universe, or family. Hover for paper details.
For each paper: a one-sentence natural-language verdict, a score gauge, paper-claim vs reproduction side-by-side, cumret chart. One-click "Open in IDE" to inspect or fix the generated factor code. Up/down arrow keys to scrub through your timeline.
Daily-mark equity curve with CSI 300 benchmark overlay, drawdown panel, allocations table. Strategy +27% vs benchmark −0.14% on the simulated 252-session window above. Export rebalance orders to IBKR / Tiger / Futu CSV.
Watch any reproduced factor live. Alert when 30-day IC moving average crosses your threshold (decay, sign-flip, sample concentration). Sparkline shows raw IC + 7-pt MA against zero baseline.
Don't have a PDF? Describe a research direction in plain English: "Latest momentum factors in CSI 500 (2024)" / "Post-publication decay of value anomalies". Agent ranks candidates and lets you pick which to reproduce.
26 papers in this snapshot. Sort by claimed IC, reproduced IC, score, family, year. Filter, multi-select, batch-rerun, export.
Every reproduction is a full editable workspace: PDF + Claude-generated factor.py / backtest.py / redteam.py, terminal, and an in-context agent that knows the verdict, the data, and the code. Quick-actions surface the right next move per verdict ("Why is the verdict not green?" / "Fix the sign-flip" / "Add 1/99 winsorization" / "Audit for look-ahead leakage").
Bilingual UI (EN / 中文) · dark mode default · keyboard-first (⌘K for everything).
Reproducing Zeng & Liu (2016) "Momentum and Reversal Effects on the Chinese Stock Market" on CSI 300 top-30 with Tushare Pro daily data, 2022-01 to 2024-12:
| Paper claim (2010-2016, 554 stocks) | Our reproduction (2022-2024, CSI 300 top 30) | |
|---|---|---|
| 6-month reversal IC | +0.013 | −0.022 |
| Sign | positive reversal | sign-flipped → momentum |
| Reproducibility score | — | 0.00 (weak; sign mismatch) |
The Red Team validator also fires 1 warning (sample concentration). This is exactly the kind of finding replicalpha is built to surface — a published A-share factor whose sign flipped on a later period and tighter universe.
| Cumulative long-short return | Rolling IC | Drawdown |
|---|---|---|
![]() |
![]() |
![]() |
Full generated research report: docs/images/real-demo/report.md. The extracted (and manually polished) ResearchCard: docs/images/real-demo/research_card.json.
uv sync --extra demo # installs tushare + matplotlib
export TUSHARE_TOKEN=... OPENAI_API_KEY=...
uv run python scripts/generate_real_demo.py # auto-downloads the paper PDFFirst run pulls ~4 years × 30 tickers from Tushare (~1 min) and calls OpenAI once to extract the ResearchCard (~10s). Subsequent runs use on-disk caches — no API calls, regenerates charts in seconds.
uv run replicalpha run your-paper.pdf --data your-csv.csv \
--out ./out --start 2022-01-03 --end 2024-12-31See below for the CLI and DataAdapter Protocol.
Synthetic-data wiring test (no API key, fastest path)
For a 2-second smoke test on bundled synthetic data, see docs/images/demo-terminal.svg and docs/images/demo-report.md. This is what CI uses.
flowchart LR
PDF[📄 Paper PDF] --> EXT[paper2alpha<br/>PyMuPDF + LLM]
EXT --> RC[ResearchCard<br/>JSON schema]
RC --> CG[DSL interpreter<br/>+ LLM fallback]
CG --> QT[qtype lint<br/>look-ahead check]
QT --> BT[pandas backtest<br/>quintile IC]
BT --> RT[Red Team<br/>5 checks]
RT --> SC[Reproducibility<br/>score]
SC --> MD[📝 report.md]
classDef in fill:#e3f2fd,stroke:#1976d2,color:#0d47a1
classDef core fill:#fff3e0,stroke:#f57c00,color:#e65100
classDef out fill:#e8f5e9,stroke:#388e3c,color:#1b5e20
class PDF in
class EXT,RC,CG,QT,BT,RT,SC core
class MD out
uv sync
export OPENAI_API_KEY=sk-... # required for PDF → ResearchCard extraction
uv run replicalpha run path/to/paper.pdf \
--out ./out \
--data path/to/market_data.csv \
--start 2022-01-03 --end 2024-12-31Outputs under ./out/:
research_card.json— structured paper metadata (paper2alpha schema)factor_<name>.py— executable Python factor (qtype-clean)backtest.json— IC time series + aggregate statsvalidator.json— 5-check Red Team findingsreproducibility.json— numeric score + interpretationreport.md— aggregated markdown research report
# Generate a mock card JSON
cat > /tmp/card.json <<'JSON'
{
"source": "demo",
"factors": [{
"name": "ma20", "chinese_name": "20日均线", "definition": "d",
"formula": "rolling_mean(close, 20)", "data_fields": ["close"],
"params": {"lookback": 20}, "universe": "全A",
"reported_metrics": {"ic_mean": 0.045, "backtest_period": "2022~2024"}
}]
}
JSON
uv run replicalpha run tests/cases/demo.pdf \
--out ./demo-out \
--data tests/cases/sample_market_data.csv \
--start 2022-03-01 --end 2022-12-31 \
--extractor-mock /tmp/card.jsonExpected terminal output:
replicalpha run — demo.pdf
✓ run r-1a2b3c4d finished
output: ./demo-out
codegen (dsl) ✓
backtest: IC 0.0123 cumret +2.45% maxDD 4.21% Sharpe 0.87
reproducibility: 0.42 — weak reproduction: claimed IC = 0.0450,
reproduced = 0.0123 (72.7% deviation).
red team findings: 2 warning, 1 critical
See examples/reproduce_demo.md for the full walkthrough.
| # | Check | Fires when |
|---|---|---|
| 1 | overfitting_hint |
lookback is a non-round value (suggests tuning) |
| 2 | small_cap_exposure |
held-leg median market cap < 0.5× universe median |
| 3 | data_leakage |
generated code fails qtype (look-ahead / future function) |
| 4 | sample_concentration |
single year contributes > 50% of cumulative IC |
| 5 | factor_redundancy |
paper declares multiple factors with near-identical formulas |
RUNS_ROOT=./runs REPLICALPHA_DATA_CSV=./market.csv \
uv run uvicorn replicalpha.server.main:app --port 8000Endpoints:
POST /runs— upload PDF, start pipeline, returnsrun_idGET /runs/{run_id}— structuredPipelineReportGET /runs/{run_id}/report— markdown report (text/markdown)
replicalpha ships a 20-ticker × 500-day synthetic sample for CI and demos.
For real results, implement the DataAdapter Protocol (src/replicalpha/core/data.py)
against your data source (Wind / Tushare / yfinance / Bloomberg / …). The
adapter's contract is 3 methods: get_price(field, start, end, universe),
get_trading_days(start, end), get_metadata().
- Factor formulas outside the DSL whitelist (
pct_change/rolling_mean/rolling_std/rolling_sum/corr/rank/zscore+ arithmetic) fall back to LLM codegen or produce a stub. - Minimal pandas backtest (quintile long-short, weekly rebalance). No transaction costs, no slippage, no position sizing constraints.
- Red Team is computational only; semantic LLM critique is v0.2.
- No real-time / intraday / trading-execution features (roadmap v1.5 → v2.0).
- v0.2: LLM-based Red Team critique, transaction cost modeling, attribution (Barra factors), robustness (walk-forward / bootstrap), experiment memory (SQLite).
- v1.5: real-time factor monitoring, IC decay alerts, intraday signals.
- v2.0: broker API integration, paper trading → live.
| Component | Source | Purpose |
|---|---|---|
| paper2alpha | VernonOY/paper2alpha | PDF → ResearchCard extraction |
| qtype | VernonOY/qtype | static lint for look-ahead / future-function bugs |
Licenses preserved verbatim in LICENSE-VENDORED.md.
- 52 tests (unit + end-to-end subprocess) · 84% coverage · CI on 3.11 + 3.12
ruff check·ruff format --check·mypy --strictall green- See tests/ —
unit/for fast per-module tests,e2e/for full-pipeline subprocess tests
Bug reports, questions, feedback — open a GitHub Issue or start a Discussion. See SUPPORT.md.
MIT — see LICENSE. Vendored components retain upstream licenses.










