Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization
This repository contains PyTorch source code for EMNLP 2024 paper Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization.
Our implementation is based on EBFT, Wanda, SparseGPT, and LLM-QAT.
- python 3.9
pip3 install torch torchvision torchaudio
pip install -r requirements.txtLR
python main.py --config=./configs/llama.py --config.epochs=0BR
python main.py --config=./configs/llama.pyBR + GP
python main.py --config=./configs/llama.py --config.use_gp=TrueBR + GP + CR
python main.py --config=./configs/llama.py --config.use_gp=True --config.use_cr=Truepython main.py --config=./configs/opt.pyFirst, generate the data as follows.
python generate_data.py --config=./configs/data.pyThen, set config.self_nsamples to be a positive number.
python main.py --config=./configs/llama.py --config.self_nsamples=256First, download the directory from the link provided from the Wanda repository. Next, change the directory name to lm_eval. Then, set config.eval_zero_shot as True.
python main.py --config=./configs/llama.py --config.eval_zero_shot=True