This is the official implementation of REASONING COMPILER: LLM-Guided Optimizations for Efficient Model Serving (NeurIPS 2025)
In our project, we use TVM since it is an open source compiler stack for deep learning systems with Apache-2.0 license.
To run this repo, follow these steps:
- Clone this repo. Configure the environment as detailed in TVM's documentation https://tvm.apache.org/docs/install/index.html
- Instead of using the default strategy, create the LLM guided MCTS search strategy object by
llm_mcts_strategy = MCTSSearchPyFull(
use_llm=True,
llm_budget=600,
llm_model_name="API_MODEL_NAME",
)
If you want to run the pure MCTS search, set use_llm = False so you do not enable LLM.
To use the function tune_tir for tuning, pass llm_mcts_strategy as a parameter of tune_tir, like
database = ms.tune_tir(
mod=MyModule,
target="llvm --num-cores=1",
max_trials_global=64,
num_trials_per_iter=64,
work_dir="./tune_tmp",
strategy=llm_mcts_strategy,
)
@inproceedings{
tang2025reasoning,
title={{REASONING} {COMPILER}: {LLM}-Guided Optimizations for Efficient Model Serving},
author={Annabelle Sujun Tang and Christopher Priebe and Rohan Mahapatra and Lianhui Qin and Hadi Esmaeilzadeh},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025}
}