LoBiFlow is a history-conditioned flow-matching model for generating level-2 limit order books. This project includes the training code, benchmark scripts, dataset preparation utilities, and final experiment artifacts used for the paper-ready evaluation.
- model: conditional L2 LOB generation in parameterized book space
- objective: flow matching with minibatch optimal transport matching
- conditioning: transformer or hybrid history encoder
- evaluation: 4 primary metrics + 7 diagnostic metrics
- LOBSTER-calibrated synthetic data
- Optiver Realized Volatility Prediction
- Binance crypto LOB snapshots from Tardis
- Databento ES futures MBP-10
Primary metrics:
TSTR MacroF1Disc.AUC GapUnconditional W1Conditional W1
Additional diagnostics:
U-L1C-L1spread_specific_errorimbalance_specific_errorret_vol_acf_errorimpact_response_errorefficiency_ms_per_sample
The benchmark summaries also report the composite score_main used for model
ranking.
scripts/experiments_lobiflow.py: main LoBiFlow runnerscripts/benchmark_lobiflow_paper_ready.py: final quality / speed / architecture benchmarkscripts/export_model_metric_catalogs.py: flat metric catalog exportscripts/generate_final_metric_summary.py: regenerate the published metric summary from the paper-ready catalogsscripts/make_regularization_ablation_plots.py: generate pilot ablation figuresscripts/test_lobiflow.py: smoke and regression suite
Run the main LoBiFlow suite with dataset-specific defaults:
cd scripts
python experiments_lobiflow.py --dataset synthetic --out_dir results_synth
python experiments_lobiflow.py --dataset optiver --out_dir results_optiver
python experiments_lobiflow.py --dataset cryptos --out_dir results_cryptos
python experiments_lobiflow.py --dataset es_mbp_10 --out_dir results_esRun the faster NFE=1 speed variant:
cd scripts
python experiments_lobiflow.py --dataset cryptos --lobiflow_variant speed --out_dir results_cryptos_speedRun the paper-ready benchmark bundle:
cd scripts
python benchmark_lobiflow_paper_ready.pyExport flat CSV/JSON metric catalogs:
cd scripts
python export_model_metric_catalogs.pyLoBiFlow applies dataset presets first, then CLI overrides. The main knobs are:
- data:
--dataset,--data_path,--synthetic_length - optimization:
--steps,--batch_size,--lr,--weight_decay - context:
--history_len,--ctx_encoder,--ctx_causal,--ctx_local_kernel,--ctx_pool_scales - sampling:
--eval_nfe,--solver,--lobiflow_variant - evaluation:
--eval_horizon,--rollout_horizons,--eval_windows_*
Typical examples:
cd scripts
python experiments_lobiflow.py --dataset cryptos --history_len 384 --ctx_encoder hybrid --ctx_local_kernel 7 --ctx_pool_scales 8,32
python experiments_lobiflow.py --dataset optiver --eval_nfe 4 --solver dpmpp2m
python experiments_lobiflow.py --dataset synthetic --synthetic_length 5000000 --steps 20000Published paper-ready quality presets:
synthetic:transformer,history_len=128,solver=euler,eval_nfe=2optiver:transformer,history_len=128,solver=dpmpp2m,eval_nfe=4cryptos:hybrid,history_len=256,solver=dpmpp2m,eval_nfe=2es_mbp_10:hybrid,history_len=256,solver=euler,eval_nfe=1
Paper-ready benchmark outputs are written under:
scripts/results_benchmark_lobiflow_paper_ready_20260315scripts/results_model_metric_catalogs_20260316scripts/results_regularization_ablation_20260324
The flat CSVs in results_model_metric_catalogs_20260316 are the easiest entry
point for comparing LoBiFlow against all baselines.
Key summaries:
scripts/results_benchmark_lobiflow_paper_ready_20260315/final_metric_summary.mdscripts/results_regularization_ablation_20260324/structured_conditional_regularization_ablation.md
We evaluated several structured conditional regularizers on top of the final LoBiFlow architecture:
- history-local causal OT
- global causal OT
- conditional current matching
- MI regularization
- path-space conditional FM
The conclusion is narrow but useful:
- none of these regularizers replaced the accepted final LoBiFlow defaults
- history-local causal OT was the strongest candidate
- its benefits were dataset-specific and strongest on
cryptos - the effect was largest at shorter optimization budgets and weakened later
The detailed summary and supporting diagnostics are in:
scripts/results_regularization_ablation_20260324/structured_conditional_regularization_ablation.mdscripts/results_regularization_ablation_20260324/structured_conditional_regularization_ablation.jsonscripts/results_regularization_ablation_20260324/causal_ot_applicability.pngscripts/results_regularization_ablation_20260324/current_matching_applicability.pngscripts/results_regularization_ablation_20260324/causal_ot_checkpoint_curve_cryptos.png
Pilot figures: