A benchmark for evaluating learning-augmented caching algorithms, featuring multiple implemented predictors, including PLECO, POPU, LRB, and Parrot.
The experimental scripts, training/evaluation pipelines, data preprocessing, and result analysis were implemented and maintained by Jiaji Zhang.
Some parts of the project code were inspired by or adapted from existing open-source libraries, including:
Download traces from the Dropbox provided by ML caching with guarantees[1], and organize them into per-workload directories, e.g., ./traces/xalanc for xalanc_test.csv.
And, we have reorganized the Brightkite and Cite datasets to conform to a unified format.
You can easily download all trace files, boost traces pickle and GBM Model Checkpoints from the Releases page.
We use an environment based on Python 3.10.16 with Anaconda3 (Anaconda3-2021.05-Linux-x86_64.sh), we strongly recommend using Anaconda for Python package management and virtual environment.
You can also install your own environment based on another Python Version but only pay attention that:
- If you want to use the
ParrotModel, you should install your torch env(in our benchmark, we use CUDA 12.4 to enable Torch GPU) - Because of different Python pickle rules, the released
boost-traces.zipmay not work in your env, so you need to generate your own boost trace file.
A Simple usage:
python -m benchmark --dataset xalanc --real --pred pleco --boost --boost_fr You will see the benchmark results of all supported algorithms on the console.
python -m benchmark [--dataset DATASET] [--test_all] [--device DEVICE] (--oracle | --real)
[--pred {parrot,pleco,popu,pleco-bin,lrb,oracle_bin,oracle_dis}]
[--noise_type {dis,bin,logdis}] [--dump_file] [--output_root_dir OUTPUT_ROOT_DIR] [--verbose]
[--boost] [--num_workers NUM_WORKERS] [--boost_fr] [--boost_preds_dir BOOST_PREDS_DIR]
[--model_fraction MODEL_FRACTION] [--checkpoints_root_dir CHECKPOINTS_ROOT_DIR]
[--parrot_config_path PARROT_CONFIG_PATH] [--lightgbm_config_path LIGHTGBM_CONFIG_PATH]-
Dataset
--dataset: Benchmark source dataset name, supported name:brightkitecitiastarbwavesbzipcactusadmgemslbmleslie3dlibqmcfmilcomnetppsphinx3xalanc
You can also support your own dataset.
--test_all: This option only works forbrightkiteandcitidatasets, meaning the entire traces will be used in these datasets. It cannot be used withparrotorlrbpredictors as they require a training set. -
Device
--device: Model target device, likecpu,cuda:0andcuda:1...This option only works for the
parrotpredictor, in other cases we only use CPU. -
Mode
realandoracle: You must specify one of them in command.-
Real Mode
Supported Predictors:
parrotplecopopupleco-bingbm
verbosedefault to Trueboostwill use pickle traces (num_workersdoesn't work)noise_typedoesn't work -
Oracle Mode
Supported Predictors:
oracle_disoracle_bin
verbosedefault to Falseboostwill use multiprocess pool (num_workersworks)boost_preds_dirmodel_fractioncheckpoints_root_dirparrot_config_pathlightgbm_config_pathdon't work
-
-
Noise Type
logdis: Log Normal Noise on Next Request Time (Reuse Distance)dis: Normal Noise on Next Request Time (Reuse Distance)bin: Binary Flipping Noise on Belady's label (FBP label)
-
Algorithm
We implemented most of the existing algorithms in our benchmark, you can also implement and test your own algorithm in the benchmark.
Algorithm PLECO POPU Parrot LRB Oracle-Dis Oracle-Bin Rand ❌ ❌ ❌ ❌ ❌ ❌ LRU ❌ ❌ ❌ ❌ ❌ ❌ Marker ❌ ❌ ❌ ❌ ❌ ❌ Predict (BlindOracle/LRB/Parrot) ✔ ✔ ✔ ✔ ✔ ✔ PredictiveMarker[2] ✔ ✔ ❌ ❌ ✔ ❌ LMarker[3] ✔ ✔ ❌ ❌ ✔ ❌ LNonMarker[3] ✔ ✔ ❌ ❌ ✔ ❌ Follower&Robust(F&R)[4] ✔ ✔ ❌ ❌ ✔ ❌ Mark0[5] ❌ ❌ ❌ ✔ ❌ ✔ Mark&Predict[5] ❌ ❌ ❌ ❌ ❌ ✔ CombineDeterministic[6] ✔ ✔ ✔ ✔ ✔ ✔ CombineRandom[6] ✔ ✔ ✔ ✔ ✔ ✔ The Guard Framework ✔ ✔ ✔ ✔ ✔ ✔ Guard&BlindOracle ✔ ✔ ❌ ❌ ✔ ❌ Guard&LRB ❌ ❌ ❌ ✔ ❌ ✔ Guard&Parrot ❌ ❌ ✔ ❌ ❌ ❌ -
Predictor
pleco: A PLECO Model that predicts next request times (NRT).popu: A Popularity Model that predicts next request times (NRT).lrb: A Gradient Boosting Machine based on Delta and EDC features that predicts belady's binary labels.oracle_bin: An offline predictor that predicts next request times (NRT), potentially affected by noise (dis).oracle_dis: An offline predictor that predicts belady's binary label, potentially affected by noise (logdis, dis).parrot: A Parrot Imitation Model that predicts eviction priority for each page.
-
Dump and Verbose
dump_file: Enables result dumping, saving test results in the format$output_root_dir/$dataset/$fraction/$predictor.csv.verbose: Enables detailed output. When enabled, all statistical data (hits, misses, hit rates, and competitive ratios) will be displayed; otherwise, only the hit rate and competitive ratio will be shown. -
Boost
In
realmode,boostuses a pickle file fromboost_preds_dir(creating it if it doesn't exist) to accelerate testing.In
oraclemode, boost utilizes a multiprocessing pool to handle all test tasks concurrently.We also provide the
boost_frparameter, which enables theFollow&Robustalgorithm to use the Belady trace for calculating the Belady cost within this algorithm. Without this optimization, the entire test process would be significantly slower. -
Model Config
To use the Parrot or GBM predictors, you must specify the following parameters (or use the defaults):
model_fraction(default to 1): Specifies the fraction of the training set to be used for training. (In benchmarking, you must select a fraction corresponding to an existing pre-trained model.)checkpoints_root_dir: Checkpoints dir.parrot_config_path: Theparrotpredictor's config file.lightgbm_config_path: Thegbmpredictor's config file.
python -m model.lightgbm [--dataset DATASET] [--device DEVICE]
[--model_fraction MODEL_FRACTION] [--model_config_path MODEL_CONFIG_PATH]
[--checkpoints_root_dir CHECKPOINTS_ROOT_DIR] [--traces_root_dir TRACES_ROOT_DIR]
[--iter_threshold] [--real_test]You can use our GBM training tool to generate and test the cache model.
-
Params:
dataset: Source datasetdevice: Model load device (in GBM, onlycpucan work)model_fraction: The proportion of the training set used for training. The default value is 1, indicating that the entire training set will be used.model_config_path: Model's configcheckpoints_root_dir: The directory where the checkpoint files are saved after training is completed.traces_root_dir: The directory where the datasets are stored.iter_threshold: When enabled, this iteratively adjusts the binary splitting threshold from 0 to 1 to determine the optimal threshold for binary classification. If disabled, the threshold defaults to 0.5.real_test: After training is completed, the test set will be loaded to evaluate the model's performance.
-
Config Example:
{ "delta_nums": 10, "edc_nums": 10, "training": { "boosting_type": "gbdt", "objective": "binary", "metric": "l2", "learning_rate": 0.01, "num_boost_round": 8000, "num_leaves": 31, "max_depth": 6, "subsample": 0.8, "colsample_bytree": 0.8 } }delta_numsandedc_nums: The dimensions of the Delta and EDC features, respectively. For more details, refer to webcachesim2 and Learning relaxed Belady for content distribution network caching[7].training: The training parameters that will be used by LightGBM.
python -m model.parrot [--dataset DATASET] [--device DEVICE]
[--model_fraction MODEL_FRACTION] [--model_config_path MODEL_CONFIG_PATH]
[--checkpoints_root_dir CHECKPOINTS_ROOT_DIR] [--traces_root_dir TRACES_ROOT_DIR]-
Params:
dataset: Source datasetdevice: Model load device (Parrot useCUDA_VISIBLE_DEVICESto set target device)model_fraction: The proportion of the training set used for training. The default value is 1, indicating that the entire training set will be used.model_config_path: Model's configcheckpoints_root_dir: The directory where the checkpoint files are saved after training is completed.traces_root_dir: The directory where the datasets are stored.
-
Config Example:
{ "address_embedder": { "embed_dim": 64, "max_vocab_size": 5000, "type": "dynamic-vocab" }, "cache_line_embedder": "address_embedder", "cache_pc_embedder": "none", "loss": [ "ndcg", "reuse_dist" ], "lstm_hidden_size": 128, "max_attention_history": 30, "pc_embedder": { "embed_dim": 64, "max_vocab_size": 5000, "type": "dynamic-vocab" }, "positional_embedder": { "embed_dim": 128, "type": "positional" }, "sequence_length": 80, "lr": 0.001, "total_steps": 25000, "eval_freq": 5000, "save_freq": 5000, "batch_size": 32, "collection_multiplier": 5, "dagger_init": 1, "dagger_final": 1, "dagger_steps": 200000, "dagger_update_freq": 50000 }You can find more details about Parrot and its training in cache_replacement[8].
We also provide some scripts for training and testing, for reference purposes only. You can find them in the scripts folder.
[1] Chłędowski, Jakub Polak, Adam Szabucki, Bartosz Zolna, Konrad. 2021. Robust Learning-Augmented Caching: An Experimental Study. 10.48550/arXiv.2106.14693.
[2] Thodoris Lykouris and Sergei Vassilvitskii. 2021. Competitive Caching with Machine Learned Advice. J. ACM 68, 4, Article 24 (August 2021), 25 pages. https://doi.org/10.1145/3447579
[3] Dhruv Rohatgi. 2020. Near-optimal bounds for online caching with machine-learned advice. In Proceedings of the Thirty-First Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '20). Society for Industrial and Applied Mathematics, USA, 1834–1845.
[4] Sadek, K. A. and Elias, M. Algorithms for caching and MTS with reduced number of predictions. arXiv preprint arXiv:2404.06280, 2024.
[5] Antonios Antoniadis, Joan Boyar, Marek Eliáš, Lene M. Favrholdt, Ruben Hoeksma, Kim S. Larsen, Adam Polak, and Bertrand Simon. 2023. Paging with succinct predictions. In Proceedings of the 40th International Conference on Machine Learning (ICML'23), Vol. 202. JMLR.org, Article 39, 952–968.
[6] Wei, A. Better and simpler learning-augmented online caching. arXiv preprint arXiv:2005.13716, 2020.
[7] Zhenyu Song, Daniel S. Berger, Kai Li, and Wyatt Lloyd. 2020. Learning relaxed Belady for content distribution network caching. In Proceedings of the 17th USENIX Conference on Networked Systems Design and Implementation (NSDI'20). USENIX Association, USA, 529–544.
[8] Evan Zheran Liu, Milad Hashemi, Kevin Swersky, Parthasarathy Ranganathan, and Junwhan Ahn. 2020. An imitation learning approach for cache replacement. In Proceedings of the 37th International Conference on Machine Learning (ICML'20), Vol. 119. JMLR.org, Article 579, 6237–6247.
If you have any questions regarding the paper, feel free to contact me at pgchen@zju.edu.cn.
@misc{chen2025robustifyinglearningaugmentedcachingefficiently,
title={Robustifying Learning-Augmented Caching Efficiently without Compromising 1-Consistency},
author={Peng Chen and Hailiang Zhao and Jiaji Zhang and Xueyan Tang and Yixuan Wang and Shuiguang Deng},
year={2025},
eprint={2507.16242},
archivePrefix={arXiv},
primaryClass={cs.DS},
url={https://arxiv.org/abs/2507.16242},
}