Official implementation for the paper “ICP: Immediate Compensation Pruning for Mid‑to‑High Sparsity.”
Xin Luo, Xueming Fu, Zihang Jiang, S. Kevin Zhou University of Science and Technology of China, MIRACLE Center & CAS ICT
Figure 1. ICP pipeline: a two‑block sliding window alternates prune → compensate; sparsity is rearranged across blocks (α) and within a block (β).
- Sliding‑window pruning: We load only two Transformer blocks at a time. Block Bi is pruned; immediately after, Bi+1 is fine‑tuned for a few steps to compensate the error introduced in Bi. The window then slides forward.
- Inter‑block sparsity rearrangement (α): To avoid uncompensated error in the last block, we prune it less aggressively and redistribute the removed weights to earlier blocks.
- Intra‑block sparsity rearrangement (β): Inside each block we prune attention (QKV) matrices more conservatively than the feed‑forward second linear layer (fc2), because their errors are harder to fix in the next block.
- Memory efficiency: At most one full block is resident on GPU at any moment, so the entire workflow fits comfortably on a single 24 GB card.
Extensive experiments show superior perplexity / accuracy / IoU versus SparseGPT, Wanda, and magnitude pruning across sparsity 50–90 % and mixed 2:4 / 4:8 + int3/int4 compression.
Important: this project was developed and validated with
transformers==4.29.1andlm_eval==0.4.2. Other transformers versions sometimes alter how intermediate blocks exposeposition_ids, which can break our scripts.
- OPT / Llama models are auto‑downloaded from Hugging Face on first run.
- SAM models ― download the Meta SAM checkpoints and COCO 2017 + SA‑1B splits:
sa_000001.tar,sa_000003.tar→ extract tosa_000001/,sa_000003/and update the paths below.
# Language models
bash bash/icp_opt.sh # prune OPT-125M/1.3B/2.7B/6.7B
bash bash/icp_llama.sh # prune Llama-2-7B
# Vision model
bash bash/icp_sam.sh # prune SAM-B/L/HEach script exposes CLI flags for sparsity (--sparsity), block window (--alpha), intra-block β, calibration size, epochs, quantisation bits, etc.
Below are the complete results from the paper for all six core tables. More results and ablation experiments are provided in the paper.
| Method | Model | 10 % | 20 % | 30 % | 40 % | 50 % | 60 % | 70 % | 80 % | 90 % |
|---|---|---|---|---|---|---|---|---|---|---|
| Magnitude | 125M | 28.25 | 29.65 | 34.51 | 54.60 | 193.4 | 920.0 | 3806 | 4890 | 6614 |
| SparseGPT | 27.88 | 27.98 | 28.86 | 30.56 | 37.02 | 59.55 | 220.9 | 2378 | 4666 | |
| Wanda | 27.48 | 27.76 | 28.11 | 30.67 | 38.92 | 75.17 | 328.2 | 1918 | 4609 | |
| ICP (ours) | 27.37 | 27.87 | 28.66 | 29.66 | 33.90 | 45.78 | 65.20 | 270.4 | 1612 | |
| Magnitude | 1.3B | 14.72 | 15.62 | 24.74 | 388.0 | 1713 | 9392 | 9443 | 16344 | 28871 |
| SparseGPT | 14.67 | 14.75 | 15.16 | 16.49 | 17.50 | 22.08 | 51.75 | 752.4 | 6797 | |
| Wanda | 14.63 | 14.69 | 15.01 | 15.89 | 18.41 | 26.55 | 99.53 | 2258 | 16868 | |
| ICP (ours) | 14.59 | 14.60 | 14.67 | 15.04 | 16.49 | 21.17 | 36.78 | 117.1 | 1099 | |
| Magnitude | 2.7B | 12.59 | 13.13 | 15.58 | 30.32 | 265.2 | 3604 | 7251 | 9614 | 16668 |
| SparseGPT | 12.32 | 12.39 | 12.66 | 12.66 | 13.46 | 16.04 | 26.92 | 138.4 | 5818 | |
| Wanda | 12.23 | 12.26 | 12.36 | 12.86 | 14.21 | 19.83 | 387.9 | 5905 | 16527 | |
| ICP (ours) | 12.22 | 12.25 | 12.34 | 12.62 | 13.32 | 15.77 | 21.79 | 54.67 | 764.2 | |
| Magnitude | 6.7B | 10.92 | 11.27 | 12.53 | 31.89 | 968.7 | 12639 | 16975 | 25591 | 5297 |
| SparseGPT | 10.83 | 10.76 | 10.75 | 10.96 | 11.59 | 13.48 | 20.48 | 96.25 | 11938 | |
| Wanda | 10.73 | 10.63 | 10.64 | 10.96 | 11.98 | 15.19 | 159.2 | 3954 | 16076 | |
| ICP (ours) | 10.56 | 10.58 | 10.59 | 10.85 | 11.27 | 13.03 | 17.82 | 44.69 | 1469 |
| Method | Sparsity | BQ | RTE | HS | WG | ARC‑e | ARC‑c | OBQA | Mean |
|---|---|---|---|---|---|---|---|---|---|
| Dense | 0 | 77.74 | 63.18 | 57.10 | 68.98 | 76.26 | 43.43 | 31.40 | 59.73 |
| Magnitude | 50 % | 62.94 | 57.04 | 49.12 | 63.38 | 64.06 | 34.64 | 26.80 | 51.14 |
| SparseGPT | 50 % | 76.33 | 55.96 | 52.89 | 68.98 | 72.01 | 38.23 | 28.40 | 56.11 |
| Wanda | 50 % | 76.42 | 53.43 | 52.45 | 68.67 | 72.22 | 39.33 | 31.00 | 56.22 |
| ICP (ours) | 50 % | 76.50 | 57.04 | 53.03 | 69.11 | 72.39 | 40.44 | 31.00 | 56.96 |
| Magnitude | 70 % | 37.95 | 53.07 | 25.93 | 49.25 | 27.82 | 22.87 | 17.00 | 33.41 |
| SparseGPT | 70 % | 64.65 | 53.79 | 33.45 | 58.64 | 43.31 | 21.67 | 17.00 | 41.79 |
| Wanda | 70 % | 48.47 | 52.71 | 27.97 | 49.64 | 30.81 | 18.60 | 12.20 | 34.34 |
| ICP (ours) | 70 % | 67.71 | 55.23 | 41.45 | 60.22 | 55.56 | 27.05 | 21.40 | 46.94 |
SAM‑H shown; see Tables below for full SAM‑B/L.
| Method | Sparsity | 10 % | 20 % | 30 % | 40 % | 50 % | 60 % | 70 % | 80 % | 90 % |
|---|---|---|---|---|---|---|---|---|---|---|
| Magnitude | SAM‑H | 74.80 | 74.57 | 73.87 | 72.26 | 69.45 | 61.51 | 28.22 | 2.65 | 1.70 |
| SparseGPT | 74.81 | 74.77 | 74.60 | 74.20 | 73.30 | 71.15 | 65.69 | 50.97 | 18.15 | |
| Wanda | 74.82 | 74.76 | 74.35 | 73.60 | 71.65 | 67.65 | 53.00 | 21.84 | 1.20 | |
| ICP (ours) | 74.82 | 74.78 | 74.66 | 74.60 | 74.42 | 73.82 | 72.85 | 69.73 | 57.60 |
Complete SAM‑B/L/H tables are included in the paper.
| Method | Strategy | OPT‑125M | OPT‑1.3B | OPT‑2.7B | OPT‑6.7B | ||||
|---|---|---|---|---|---|---|---|---|---|
| WT | PTB | WT | PTB | WT | PTB | WT | PTB | ||
| SparseGPT | 2:4 | 59.35 | 93.04 | 23.87 | 38.09 | 17.11 | 26.98 | 14.15 | 21.54 |
| Wanda | 2:4 | 80.01 | 111.97 | 28.20 | 43.37 | 21.18 | 34.55 | 15.90 | 25.09 |
| ICP (ours) | 2:4 | 46.90 | 67.56 | 22.03 | 33.95 | 16.39 | 25.31 | 14.04 | 21.19 |
| SparseGPT | 4:8 | 43.90 | 71.99 | 20.21 | 31.36 | 15.00 | 23.05 | 12.49 | 18.86 |
| Wanda | 4:8 | 53.18 | 79.02 | 22.17 | 34.56 | 16.79 | 26.13 | 13.55 | 20.19 |
| ICP (ours) | 4:8 | 39.91 | 59.34 | 19.15 | 30.41 | 14.88 | 22.88 | 12.38 | 18.53 |
| SparseGPT | 2:4+int3 | 122.33 | 196.44 | 40.58 | 64.53 | 24.70 | 41.06 | 20.07 | 31.91 |
| ICP (ours) | 2:4+int3 | 74.07 | 114.93 | 27.88 | 46.14 | 19.57 | 30.28 | 18.38 | 26.82 |
| SparseGPT | 2:4+int4 | 70.88 | 107.39 | 25.96 | 41.69 | 18.56 | 30.68 | 15.24 | 23.01 |
| ICP (ours) | 2:4+int4 | 51.24 | 75.39 | 23.49 | 34.42 | 17.05 | 27.04 | 15.10 | 22.54 |
| SparseGPT | 4:8+int3 | 84.02 | 120.16 | 32.41 | 49.27 | 20.24 | 32.99 | 17.08 | 26.36 |
| ICP (ours) | 4:8+int3 | 59.67 | 86.01 | 24.73 | 40.19 | 17.89 | 26.80 | 16.16 | 24.30 |
| SparseGPT | 4:8+int4 | 50.58 | 80.97 | 23.32 | 34.38 | 15.74 | 24.76 | 13.44 | 19.72 |
| ICP (ours) | 4:8+int4 | 42.03 | 60.60 | 20.86 | 31.58 | 15.33 | 23.80 | 13.24 | 19.49 |
| Method | Strategy | BQ | RTE | HS | WG | ARC‑e | ARC‑c | OBQA | Mean |
|---|---|---|---|---|---|---|---|---|---|
| SparseGPT | 2:4 | 68.07 | 58.84 | 43.38 | 65.59 | 64.18 | 31.91 | 24.80 | 50.97 |
| Wanda | 2:4 | 68.13 | 53.43 | 41.42 | 62.43 | 63.13 | 30.63 | 23.80 | 49.00 |
| ICP (ours) | 2:4 | 69.72 | 58.84 | 46.75 | 64.48 | 66.04 | 33.70 | 26.20 | 52.25 |
| SparseGPT | 4:8 | 71.07 | 54.87 | 48.35 | 67.80 | 68.69 | 34.47 | 27.20 | 53.21 |
| Wanda | 4:8 | 73.00 | 53.79 | 47.03 | 66.93 | 67.47 | 34.22 | 27.00 | 52.78 |
| ICP (ours) | 4:8 | 73.12 | 58.12 | 50.13 | 64.88 | 68.90 | 36.01 | 28.00 | 54.16 |
| SparseGPT | 2:4+int3 | 64.65 | 55.23 | 38.58 | 59.04 | 55.47 | 25.34 | 19.40 | 45.39 |
| ICP (ours) | 2:4+int3 | 66.82 | 54.15 | 43.36 | 60.77 | 59.55 | 28.67 | 21.00 | 47.76 |
| SparseGPT | 2:4+int4 | 66.73 | 54.15 | 42.34 | 65.19 | 61.74 | 28.92 | 23.80 | 48.98 |
| ICP (ours) | 2:4+int4 | 70.46 | 56.68 | 46.22 | 62.90 | 64.81 | 30.63 | 22.80 | 50.64 |
| SparseGPT | 4:8+int3 | 68.59 | 53.43 | 42.12 | 62.35 | 60.10 | 29.10 | 21.60 | 48.18 |
| ICP (ours) | 4:8+int3 | 68.38 | 54.87 | 43.54 | 62.85 | 63.38 | 32.34 | 22.40 | 49.68 |
| SparseGPT | 4:8+int4 | 69.27 | 54.15 | 46.58 | 66.06 | 67.55 | 34.39 | 27.20 | 52.17 |
| ICP (ours) | 4:8+int4 | 72.29 | 61.01 | 48.86 | 62.75 | 66.88 | 33.87 | 28.00 | 53.38 |
| Method | Strategy | SAM‑B | SAM‑L | SAM‑H | |||
|---|---|---|---|---|---|---|---|
| COCO | SA‑1B | COCO | SA‑1B | COCO | SA‑1B | ||
| SparseGPT | 2:4 | 62.80 | 67.59 | 65.01 | 69.07 | 67.02 | 69.62 |
| Wanda | 2:4 | 60.59 | 62.06 | 62.37 | 65.84 | 64.68 | 66.36 |
| ICP (ours) | 2:4 | 64.95 | 71.30 | 67.83 | 73.56 | 69.09 | 73.64 |
| SparseGPT | 4:8 | 63.93 | 69.69 | 66.56 | 71.27 | 67.98 | 71.41 |
| Wanda | 4:8 | 62.62 | 66.05 | 65.02 | 69.75 | 66.80 | 69.55 |
| ICP (ours) | 4:8 | 65.37 | 71.72 | 68.17 | 73.86 | 69.44 | 74.03 |
| SparseGPT | 2:4+int3 | 62.09 | 64.77 | 63.45 | 65.67 | 66.39 | 67.85 |
| ICP (ours) | 2:4+int3 | 64.47 | 69.71 | 67.13 | 72.60 | 68.75 | 73.26 |
| SparseGPT | 2:4+int4 | 62.66 | 66.76 | 64.78 | 68.69 | 66.81 | 69.44 |
| ICP (ours) | 2:4+int4 | 64.89 | 70.64 | 67.64 | 73.09 | 69.02 | 73.72 |
| SparseGPT | 4:8+int3 | 63.02 | 67.27 | 64.74 | 68.02 | 67.25 | 69.69 |
| ICP (ours) | 4:8+int3 | 64.84 | 69.86 | 67.68 | 73.28 | 69.10 | 73.35 |
| SparseGPT | 4:8+int4 | 63.77 | 68.98 | 66.04 | 70.31 | 67.87 | 71.05 |
| ICP (ours) | 4:8+int4 | 65.24 | 70.98 | 68.02 | 73.82 | 69.34 | 73.78 |
More results and exhaustive ablations are provided in the paper. and exhaustive ablations are provided in the paper.
This project is released under the MIT License for both research and commercial use. See the LICENSE file for the full text.
Parts of the code are adapted from SparseGPT, Wanda and Meta’s segment-anything. We thank the open‑source community!