[ICLR'25] Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search
This repository officially implements Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search.
LLM-GS combines the large language model and search algorithms for solving Programmatic Reinforcement Learning (PRL) problems. LLM-GS has a good sample efficiency in Karel environments. Also, LLM-GS shows good extensibility to novel tasks and adaptability to the novel environments of MInigrid.
After you download the repo, please initialize the leaps submodule.
git submodule update --init --recursiveWe recommend using conda to install the dependencies:
conda env create --name llm_gs_env --file environment.yml
pip install -r requirements.txtIf conda is not available, it is also possible to install dependencies using pip on Python 3.8:
pip install -r requirements.txtAfter installing the environment, please export your OpenAI API key to execute our main method:
export OPENAI_KEY="YOUR_API_KEY"To execute our main method and baselines. You can change method and task inside the scripts. (LLM-GS is our main method.)
bash scripts/run_main_results.shOr you can run specific algorithm and tasks
# All scripts are in scripts/{baseline}/run_{task}.sh
bash scripts/LLM-GS/run_DoorKey.shYou can run revision method of the task DoorKey
# The revision scripts are in scripts/evision/run_{revision_method}.sh
bash scripts/LLM-Revision/run_regeneration.shPlease note that the result of LLM-GS might not be the same as the one we reported in our paper due to the randomness of the LLMs.
The experiment results will be in the output directory.
To use LLM-GS for your custom PRL task:
-
Define your DSL Create a new DSL in
prog_policies/your_dsl/and specify production rules. -
Register your environment Add it to
prog_policies/utils/__init__.py. -
Implement your PRL environment
- Write your environment in
prog_policies/your_environment/ - Option A: Subclass
BaseEnvironmentinprog_policies/base/environment.py - Option B: Use
gymnasium.core.Wrapper
- Write your environment in
-
Write your prompt template Follow
llm/prompt_template.pystructure to write your system prompt and user prompt. -
Set up search space (if needed) Create a custom search space in
prog_policies/search_space. You can specify your mutation method here for local search. If the production rules are more complicated than Karel's, writing your own search space is necessary. -
Parse LLM output Use
convert()andget_program_str_from_llm_response_dsl()inllm/utils.pyto post-process Python and DSL programs.
- The baseline implementations in
prog_policiesare from Reclaiming the Source of Programmatic Policies: Programmatic versus Latent Spaces. The baselines (CEM, CEBS, HC) code underprog_policiesshould follow the GPL-3.0 license. - The HPRL baseline implementation is not in this repository. We run our experiment in this repository
@inproceedings{liu2025synthesizing,
title = {Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search},
author = {Max Liu and Chan-Hung Yu and Wei-Hsu Lee and Cheng-Wei Hung and Yen-Chun Chen and Shao-Hua Sun},
booktitle = {The Thirteenth International Conference on Learning Representations},
year = {2025},
}