LLapDiff is a Laplace-domain latent diffusion model for irregular, partially observed panel time series. The public mainline contains joint forecast/imputation training, and generic training/evaluation entry points.
Dataset/Dataset-specific cache builders, loaders, and dataset statistics tools.Latent_Space/The latent VAE implementation and utilities.Model/The summarizer, LLapDiff backbone, and diffusion utilities.config.pyGeneric global defaults only.dataset_defaults.pyThe per-dataset preset table.train_val_pipeline.pyCanonical end-to-end runner for VAE + summarizer + LLapDiff.llapdiff_checkpoint_eval.pyGeneric checkpoint evaluation for forecast and imputation.run_multidataset_artifact_prep.pyVAE/summarizer artifact preparation and health audit.Viz/plot_llapdiff_poles.pyPole-visualization utility for trained checkpoints.
The canonical public recipe is:
PREDICT_TYPE="v"PRIMARY_EVAL_METRIC="crps"LOSS_WEIGHT_SCHEME="weighted_min_snr"BASE_LR=1.5e-4DATES_PER_BATCH=1- joint target-mask training enabled by default
Dataset-specific values are centralized in dataset_defaults.py. Table-listed values win where they are explicitly represented in the code.
The preset registry currently supports:
bms_airuci_airphysionetnoaa_usnoaa_ukus_equitycrypto
Default horizons and context lengths:
| Dataset | Horizons | Context |
|---|---|---|
bms_air |
24, 48, 96, 168 |
336 |
uci_air |
24, 48, 96, 168 |
336 |
physionet |
4, 8, 10, 12 |
24 |
noaa_us |
24, 48, 96, 168 |
336 |
noaa_uk |
24, 48, 96, 168 |
336 |
us_equity |
5, 20, 60, 100 |
200 |
crypto |
5, 20, 60, 100 |
200 |
Financial ticker lists used by the cache builders live at:
The training stack expects Python 3.11 plus the standard scientific/PyTorch stack used throughout the repo, including:
torchnumpypandasmatplotlibpyarrowfastparquetyfinancerequeststqdm
Main cache-builder entrypoints:
- financial datasets:
prepare_features_and_index_cache(...)in Dataset/fin_dataset.py - BMS:
prepare_bms_air_cache(...)in Dataset/bms_air_dataset.py - UCI Air:
prepare_uci_air_cache(...)in Dataset/uci_air_quality_dataset.py - NOAA:
prepare_isd_cache(...)in Dataset/noaa_isd_dataset.py - PhysioNet:
prepare_physionet_cinc_cache(...)in Dataset/physionet_cinc_dataset.py
Financial loaders and several dataset loaders can also rebuild the window index on demand through run_experiment(..., reindex=True).
Use the public cache summary tool:
cd /path/to/LLapDiff
python Dataset/dataset_summary.py \
--data-dir /path/to/LLapDiff/Dataset/fin_dataset/crypto \
--coverage 0.0 \
--per-assetThis reads the prepared cache_ratio_index/ tree and reports panel size, split counts, missingness, and coverage-sensitive step counts.
To dry-run the multi-dataset artifact plan:
cd /path/to/LLapDiff
python run_multidataset_artifact_prep.py \
--dry-run \
--datasets physionet bms_airTo train or reuse VAE and summarizer artifacts and emit a health report:
cd /path/to/LLapDiff
python run_multidataset_artifact_prep.py \
--datasets bms_air uci_air physionet noaa_us noaa_uk us_equity \
--summary-json /tmp/multidataset_artifact_prep_summary.jsonArtifacts are written under:
ldt/vae/saved_model/<dataset>/ldt/summarizer/saved_model/<dataset>/
The canonical training entrypoint is train_val_pipeline.py.
Example: train the full crypto stack for all preset horizons:
cd /path/to/LLapDiff
python train_val_pipeline.py \
--dataset-key crypto \
--summary-json /tmp/crypto_pipeline_summary.jsonExample: run only one horizon and force artifact recomputation:
cd /path/to/LLapDiff
python train_val_pipeline.py \
--dataset-key us_equity \
--preds 100 \
--recompute-vae \
--recompute-summarizer \
--summary-json /tmp/us_equity_pred100.jsonUse the generic evaluator:
cd /path/to/LLapDiff
python llapdiff_checkpoint_eval.py \
--dataset-key crypto \
--pred 100 \
--checkpoint /path/to/LLapDiff/ldt/output/crypto/llapdiff_pred-100_best_raw.pt \
--out-json /tmp/crypto_eval.jsonThe output includes:
forecast_testregular_keep25random_keep50balanced_summary
Pole plotting is handled by Viz/plot_llapdiff_poles.py.
Example:
cd /path/to/LLapDiff
python Viz/plot_llapdiff_poles.py \
--dataset-key crypto \
--pred 100 \
--checkpoint /path/to/LLapDiff/ldt/output/crypto_cov0_jointmix_vpred_dates1_lr15e4/llapdiff_pred-100_best_raw.pt \
--output-dir /tmp/pole_plot_smokeThis writes a PDF into the requested output directory.
Use this order when changing the training recipe:
- dataset/cache sanity
- normalization
- training objective / parameterization
- architecture last
For practical tuning in this repo:
- dataset-specific defaults belong in dataset_defaults.py
- generic defaults belong in config.py
- training/evaluation logic belongs in: