ml-return-forecast

Languages: English · 繁體中文

Predicting absolute 21-day forward returns on S&P 500 stocks with macro features, and comparing the result head-to-head with cross-sectional ranking in ml-cross-sectional.

Feature pipeline is independent from ml-cross-sectional. Raw data sources and low-level indicators overlap, but the feature matrix is rebuilt from scratch: cross-sectional z-scoring is removed, macro / beta / sector features are added, and the target is the continuous 21-day return rather than a cross-sectional rank.

The question

Repo 2 proved that cross-sectional ranking ("which 100 will lead and which 100 will lag?") can be learned from price-volume features alone, because the relative comparison nets out market direction. This repo asks the harder version: can the same features predict the absolute forward return? That requires the model to know where the market is going, not just which stocks beat the median, so the feature design has to change — and, as notebooks 04 and 05 document, the answer is "barely, and with worse portfolio-construction properties than the ranker."

Headline result

Model	OOS MAE	OOS Pearson
linear_ridge	0.079	+0.09
linear_lasso	0.078	+0.10
lgbm_regressor	0.089	+0.13
xgb_regressor	0.092	+0.11
hist_mean	0.074	+0.00

Full OOS 2020-2024, all S&P 500 names. hist_mean is a per-symbol training-period mean — note that it wins on MAE. Any learned model must be judged on Pearson / IC, not MAE, because the target is heavy-tailed enough that a constant prediction clocks a strong absolute error.

vs. ranking (notebook 05): when we pick the top-20 stocks each month from xgb_regressor and from Repo 2's xgb_ranker (same model family both sides, to isolate target formulation), the baskets overlap only ≈ 0.19 by Jaccard on average over 60 rebalances — roughly 3–4 shared names out of 20. The regressor's top-20 takes a deeper 2022 drawdown (−29.6% vs −22.7% intra-year), but over the full 2020–2024 window the two equity curves alternate leadership year by year — the measured mean basket beta is actually slightly higher for the ranker (1.43 vs 1.32), so the "ranking is more robust" claim is confined to the 2022 regime and the structural argument (a), not a universal beta-concentration story.

Method (one-minute version)

Universe. Current S&P 500 constituents (502 names), 2015-01 to 2025-07. Survivorship is acknowledged — results are an upper bound.
Target. fwd_ret_21d = close[t+21] / close[t] - 1, raw continuous.
Features (33 cols).
- Stock (11): mom_12_1, reversal_1w, ret_{21,63,126,252}d, vol_{20,60}d, rsi_14, macd_hist, volume_z_60.
- Macro (10), all lagged one business day: VIX level + 20d change, 10Y yield + 20d change, term slope (10Y−2Y), BAA credit spread (Moody's BAA − 10Y), S&P 3M / 12M trailing return + 60d vol, 6M fed-funds move count.
- Exposure (12): 252d rolling beta vs ^GSPC, 11 GICS sector dummies.
Models. Ridge / Lasso on standardised + median-imputed features; LightGBM & XGBoost regressors with RMSE loss; per-symbol HistMean as the zero-skill bar.
Validation. Annual expanding-window walk-forward, OOS 2020–2024.
Evaluation. MAE / RMSE / direction accuracy / Pearson / Spearman IC; threshold-long strategy with 5 bps one-way costs; Jaccard + signal correlation vs Repo 2.

Notebooks (money-shots)

#	Notebook	What it shows
01	`01_regression_eda.ipynb`	Target σ ≈ 0.08, fat tails, per-stock R² vs market ≈ 0.3 — why macro matters
02	`02_training_walkforward.ipynb`	Cross-model table + year-by-year + MAE vs VIX regime
03	`03_error_analysis.ipynb`	Per-sector MAE, high/low-VIX split, worst 20 predictions
04	`04_threshold_strategy.ipynb`	Long when `pred > τ`; τ sweep; net equity curves vs SPX
05	`05_vs_ranking.ipynb`	Head-to-head with Repo 2: daily Spearman, top-20 Jaccard, drawdown behaviour

Notebooks are built from scripts/build_0N_*.py — source diffs stay on Python, not ipynb JSON. Re-run the build script and then jupyter nbconvert --execute --ExecutePreprocessor.kernel_name=ml-return-forecast to regenerate a notebook with outputs.

Failure discussion

Absolute-return regression has three structural disadvantages vs. ranking:

Market beta dominates the target. The median stock's 21-day return has R² ≈ 0.3 against the contemporaneous market return. A model that doesn't explicitly carry macro / beta features is learning the market, not the stock; a model that does carry them inherits macro look-ahead risks.
Target distribution is fat-tailed. Squared-loss regressors over-fit outlier months (COVID, 2022). The per-fold MAE swings by 40% with regime — look at notebook 02's year breakdown, not the headline row.
Thresholds don't beat quantiles. Notebook 04's τ sweep does not improve monotonically with τ: the top-predicted names aren't reliably better than the mass of positively-predicted names, because the regressor's "magnitude" is noisy. A proper ranker (Repo 2) uses top-quintile / long-short instead, which is more robust by construction.

Combined, these are the numerate version of the industry folklore that signal research is dominated by ranking. Repo 5 exists to make that folklore quantitative.

Caveats

Survivorship bias: universe is the current S&P 500. Names that were delisted or removed between 2015–2024 are invisible.
Credit spread choice: FRED's public CSV endpoint for ICE's BAMLH0A0HYM2 (HY OAS) only returns ~2 years due to a licensing change. BAA10Y (Moody's BAA − 10Y) is used instead — a reasonable IG-spread proxy that covers the full window.
Macro look-ahead: every macro series is lagged one business day. Some series (e.g. FEDFUNDS) are monthly and forward-filled — the look-ahead guard is conservative but not airtight.
Timing convention: predictions are assumed to be acted on at the close of day t (same-day close-to-close frame), so beta_252d uses unlagged returns up to t. Macro series, which are released at a different cadence than equity prices, are shifted by one business day as an extra safety margin rather than to match this frame.
Sector snapshot: GICS sector is the current assignment, not a point-in-time mapping.

Quickstart

conda create -n ml-return-forecast python=3.13
conda activate ml-return-forecast
pip install -e .
# register the kernel so nbconvert executes notebooks in the right env
python -m ipykernel install --user --name ml-return-forecast

# data (writes to data/raw/)
python scripts/download_data.py
python scripts/download_macro.py

# features (writes to data/processed/)
python scripts/build_features.py

# train OOS 2020-2024
python scripts/train.py   # writes reports/predictions/oos_2020_2024.parquet

# regenerate any notebook
python scripts/build_04_threshold_strategy.py
python -m jupyter nbconvert --to notebook --execute \
  --ExecutePreprocessor.kernel_name=ml-return-forecast \
  notebooks/04_threshold_strategy.ipynb --output 04_threshold_strategy.ipynb

Layout

ml-return-forecast/
├── data/
│   ├── raw/            # sp500_ohlcv_*.parquet, macro_*.parquet, sp500_sectors.csv
│   └── processed/      # features_*.parquet
├── notebooks/          # 01–05, executed
├── reports/
│   └── predictions/    # oos_2020_2024.parquet
├── scripts/
│   ├── download_data.py
│   ├── download_macro.py
│   ├── build_features.py
│   ├── train.py
│   └── build_0{1-5}_*.py   # notebook source-of-truth
└── src/mlr/
    ├── features_stock.py
    ├── features_macro.py
    ├── features.py         # assembly + beta + sector + target
    ├── model.py            # 4 wrapper classes, 5 model instantiations
    └── validation.py       # walk_forward_years

References

Cross-sectional absolute-return regression (direct benchmark)

Gu, S., Kelly, B., & Xiu, D. (2020). Empirical asset pricing via machine learning. Review of Financial Studies, 33(5), 2223–2273. doi:10.1093/rfs/hhaa009 — predicts absolute monthly US equity returns with 94 firm characteristics plus 8 macro predictors, comparing linear / tree / neural models. This repo is a scaled-down version of the same setup (21 stock + 10 macro + 12 exposure features, 21-day horizon, Ridge / Lasso / LGBM / XGB), and its pairing with ml-cross-sectional is the direct ranking-vs-regression comparison GKX does not make explicitly.

Macro predictability of returns (why macro doesn't save the regression)

Welch, I., & Goyal, A. (2008). A comprehensive look at the empirical performance of equity premium prediction. Review of Financial Studies, 21(4), 1455–1508. doi:10.1093/rfs/hhm014 — finds that the canonical macro predictor set (term spread, credit spread, dividend yield, etc.) offers almost no reliable out-of-sample forecast power for the aggregate market premium. The variables we feed as per-stock macro features here (VIX, 10Y, term slope, BAA credit spread, S&P trailing return/vol) are drawn from the same pool. We use them cross-sectionally rather than to time the index, but notebook 02's year-by-year MAE swings and notebook 04's flat threshold sweep are consistent with the W&G finding that these series carry less forward information than their contemporaneous correlation suggests.

Validation methodology

López de Prado, M. (2018). Advances in financial machine learning. Wiley. Chapter 7 argues for purging + embargo (with CPCV as the recommended scheme) in financial cross-validation. We use plain annual expanding-window walk-forward with no purging — the same deliberate deviation made in Repo 2, justifiable at a 21-day target horizon and annual retrain where fold-to-fold IC noise dominates leakage, but a design choice a production setup should revisit.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
notebooks		notebooks
reports		reports
scripts		scripts
src/mlr		src/mlr
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ml-return-forecast

The question

Headline result

Method (one-minute version)

Notebooks (money-shots)

Failure discussion

Caveats

Quickstart

Layout

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ml-return-forecast

The question

Headline result

Method (one-minute version)

Notebooks (money-shots)

Failure discussion

Caveats

Quickstart

Layout

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages