Skip to content

nikhilsab/LLMFE

Repository files navigation

[TMLR 2026] LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers

License: MIT arXiv Hugging Face

Official implementation of LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers.

📄 Overview

LLM-FE is a novel framework that leverages Large Language Models (LLMs) as evolutionary optimizers to automate feature engineering for tabular datasets. LLM-FE iteratively generates and refines features using structured prompts, selecting high-impact transformations based on model performance. This approach enables the discovery of interpretable and high-quality features, enhancing the performance of various machine learning models across diverse classification and regression tasks.

⚙️ Installation

To run the code, create a conda environment and install the dependencies using requirements.txt:

conda create -n llmfe python=3.11.7
conda activate llmfe
pip install -r requirements.txt

🔧 Usage

In run_llmfe.sh file, set the OPENAI API key under

export API KEY = <ENTER YOUR API KEY>

To run the LLM-FE pipeline on a sample dataset:

bash run_llmfe.sh

✅ Evaluation

  • The generated features for each dataset are stored in the logs under the samples/ folder.

  • Install caafe into your environment before running the evaluation.

    conda activate llmfe
    pip install caafe
    
  • To run the evaluation, open the evaluation.ipynb notebook and set the pb_name variable to the dataset name you want to evaluate (i.e., replace the existing dataset name in pb_name with your target dataset), then run the notebook cells.

📝 Citation

@article{
abhyankar2026llmfe,
title={{LLM}-{FE}: Automated Feature Engineering for Tabular Data with {LLM}s as Evolutionary Optimizers},
author={Nikhil Abhyankar and Parshin Shojaee and Chandan K. Reddy},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2026},
url={https://openreview.net/forum?id=qvI35hkpOO}
}

📄 License

This repository is licensed under MIT licence.

This work is built on top of other open source projects like FunSearch and LLM-SR. We thank the original contributors of these works for open-sourcing their valuable source codes.

📬 Contact Us

For any questions or issues, you are welcome to open an issue in this repo, or contact us at nikhilsa@vt.edu and parshinshojaee@vt.edu.

About

[TMLR 2026] LLM-FE: Automated Feature Engineering with Large Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors