This repository contains the implementation for the paper "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines".
arithmetic: Execution structure for each arithmetic operator.data: Generates individual samples for each operator.eval: Evaluation logic.synthetic: Scripts to generate datasets for training and testing.turing_machine: Turing machine prototype implementation for each operator.
- Install the required packages:
pip install -r requirements.txt- Download the LoRA adapters and evaluation datasets from our HuggingFace page:
python download.py- Update the paths in the following files to match your local setup:
turing_machine/tm_path.pyutils.py
Example:
base_model_3_path = "/model/meta-llama/Meta-Llama-3-8B/"
base_model_31_path = "/model/meta-llama/Meta-Llama-3.1-8B/"
add = dict(
lora_path = '', # optional, for testing first-stage executor
lora_path_no_prompt = 'ckpt/executors/addition', # LoRA path of the executor
aligner_path = 'ckpt/aligners/addition', # LoRA path of the aligner
task_path = '', # optional, for testing first-stage executor
task_path_no_prompt = '', # optional, executor for one-step transition
task_path_executor = '', # optional, executor for full process
task_path_raw = 'datasets/raw/addition/test_5_5.jsonl', # for testing whole process
aligner_input_path = '', # optional, for testing aligner input
aligner_output_path = '', # optional, for testing aligner output
)- (Optional) Generate separate test sets for executors and aligners:
python generate.py --task add --split test --setting separateThen, configure task_path_executor, aligner_input_path, and aligner_output_path in turing_machine/tm_path.py.
To evaluate the complete process:
python eval_tm.py --model 3.1 --task add --batch_size 64 --no_prompt --execute --alignmentTo evaluate only the executors:
python eval_tm.py --model 3.1 --task add --batch_size 64 --no_prompt --executeTo evaluate only the aligners (input or output):
# input evaluation
python eval_tm.py --model 3.1 --task add --batch_size 64 --no_prompt --aligner_input
# output evaluation
python eval_tm.py --model 3.1 --task add --batch_size 64 --no_prompt --aligner_outputYou can modify the task and batch_size parameters as needed.
If you want to train executors or aligners on your own, follow the instructions below to generate the necessary training data. Both JSON and JSONL formats are supported. Check the files in the synthetic directory for examples.
- First stage:
python generate.py --task add --min 1 --max 100 --num 20 --split train --setting execute- Second stage:
python generate.py --task add --min 1 --max 100 --num 20 --split train --no_prompt --setting execute- First stage:
python generate.py --task add --min 1 --max 100 --num 20 --split train --setting alignment- Second stage:
python generate.py --task add --min 1 --max 100 --num 20 --split train --no_prompt --setting alignmentThe expressions are divided into equivalence classes based on the pair (len(a), len(b)). The min and max parameters refer to the minimum and maximum length of the operands, respectively. The num parameter defines how many expressions will be generated per class, though the actual number may vary based on sampling and balancing strategies.
If you use CAEF for your research, please cite our paper:
@misc{lai2024executing,
title={Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines},
author={Junyu Lai and Jiahe Xu and Yao Yang and Yunpeng Huang and Chun Cao and Jingwei Xu},
year={2024},
eprint={2410.07896},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2410.07896},
}