GitHub - Juntongkuki/Birkhoff-Model-Compression: The open-sourced code of the paper 《Compress Any Segment Anything Model (SAM)》

《Compress Any Segment Anything Model (SAM)》

News 📢

2025/06/18 : Big day! 🥰 Our Birkhoff code has officially gone open source. ✨ We'll be pushing updates as the saga continues. Thanks for watching, supporting! 😄 Moreover, welcome to read our previous work 《Hyper-Compression: Model Compression via Hyperfunction》, which is the theoretical foundation of Birkhoff.

Introduction

Due to the excellent performance in yielding high-quality, zero-shot segmentation, Segment Anything Model (SAM) and its variants have been widely applied in diverse scenarios such as healthcare and intelligent manufacturing. Therefore, effectively compressing SAM and its variants has become an increasingly pressing practical need. Unlike quantization, pruning, distillation, and low-rank decomposition, we propose Birkhoff algorithm for systematically compressing SAM and its variants. Specifically, Birkhoff introduces a novel compression algorithm: Hyper-Compression, whose core principle is to find a dense trajectory to turn a high-dimensional parameter vector into a low-dimensional scalar. Furthermore, Birkhoff designs a dedicated linear layer operator, HyperLinear, to fuse decompression and matrix multiplication to significantly accelerate inference of the compressed SAMs. Birkhoff is a universal, data-free, fast, and high-accuracy-compression-ratio compression algorithm. Extensive experiments on 18 SAMs in the COCO, LVIS, and SA-1B datasets show that Birkhoff performs consistently and competitively in compression time, compression ratio, post-compression performance, and inference speed. For example, Birkhoff can achieve a compression ratio of 5.17× on SAM2-B, with less than 1% performance drop without using any fine-tuning data. Moreover, the compression is finished within 60 seconds for all models.

Fig. Overview of our Birkhoff compression framework.

The proposed Birkhoff enjoys the following merits: 💡

Versatility across model types
Agility in model deployment
Faithfulness to the original model
Compactness in model size

🥳 Birkhoff 🍾

HyperLinear Operator

The previous Hyper-Compression is vector-wise, which is incompatible with the matrix product in GPU acceleration that is “block-wise" operated. Therefore, Hyper-Compression needs to decompress the entire matrix first before doing the matrix product, which has large room for improvement. In the Birkhoff algorithm, we design a “block-wise" compression and decompression, respectively. Then, we further propose Hyperlinear to fuse decompression and matrix product, which can greatly expedite the inference because linear layers including Q, K, V attention modules usually dominate SAM.

Fig. Technical evolution of decompression: from theoretical element-wise operation to vector-wise operation, and finally to our proposed block-wise one (HyperLinear) to facilitate operator fusion.

Segment Everything Visualization

Visualization comparisons indicate that Birkhoff-compressed model preserves the boundaries of masks to such an extent that virtually no noticeable visual differences from those of the original models are discernible.

Fig. We select SAM-B, SAM-L, and SAM-H for a visual comparison of the Segment Everything task before and after Birkhoff compression.

Compression Performance

This whole table is the accuracy and sizes (MB) of 18 SAMs and their corresponding Birkhoff-compressed versions on three datasets (COCO, LVIS, and SA-1B). For example, the following tables illustrate the results of SAM-B, SAM-L, SAM-H:

Compression Time and Ratio

The majority of models can be fully compressed within 60 seconds. In terms of the compression ratio, most models can be compressed by more than 4.3×, with the highest ratio reaching 5.17×. Even for variants such as EdgeSAM, MobileSAM, and SAM-HQ-Tiny, which have little remaining compressibility after distillation from SAM, our method can still achieve the compression ratios in excess of 3.3×.

Fig. The compression time and achieved compression ratios for 18 SAMs using Birkhoff.

Comparison with Other Data-Free Compression Methods

We compare our Birkhoff with other data-free methods, including MinMax, Percentile, and OMSE, and fine-tuning-based methods, such as PTQ4SAM, AdaRound, BRECQ and QDrop. The following table presents a comparison of the mAP scores before and after compressing SAM-B, SAM-L, SAM-H models on the COCO dataset using box prompts. Although Birkhoff currently does not exceed the compression ratio of the INT6 quantization, it demonstrates compelling merits under stringent constraints: compressing without fine-tuning datasets while rendering almost lossless compression.

Fig. Image segmentation results (mAP) on COCO dataset using box prompts obtained from objection detection results produced by YOLOX.

Inference Time

We evaluate the inference speed of the models before and after replacing the HyperLinear operator. As shown in Table, we conduct comparative experiments on six different models, randomly using a small subset of the SA-1B dataset to measure the average inference time per segmentation. The results indicate that, although the inference speed slightly decreases after incorporating the HyperLinear operator, the performance gap is minimal and barely noticeable to human perception. The data further illustrates that the larger the model, the smaller gap in inference. For example, in the case of SAM-H, the inference speed decreases by only 3.94%. This is attributed to the nature of our method, where a single-memory access allows the decoding of two parameters, thereby improving the L2 cache hit rate of the operator. For smaller models, most parameters can be accessed directly from registers, ensuring fast memory access. However, for larger models, our method becomes increasingly beneficial by improving the efficiency of memory access. Moreover, we emphasize that though the Birkhoff-compressed SAMs still infer more slowly than the original, their differences are at the level of ms, which shall not hurt the user experience.

Fig. Comparison of inference time between the compressed model and the original.

Requirements

The code requires python>=3.8.0 and we use torch==2.4.0 and torchvision==0.19.0. For the usage of HyperLinear operator, triton==3.0.0 is also required.

python == 3.8.0
torch==2.4.0
torchvision==0.19.0
albumentations==2.0.8
numpy==1.24.3
safetensors==0.5.3
triton==3.0.0

Usage

Download the checkpoint of SAM-HQ-H into the directory of ./sam_family/checkpoints/.
Run the demo code for compressing SAM-HQ-H.

python main.py --mode='encode' --model_name='sam_hq_vit_h' --checkpoint_path='/home/ET/jtfan/MyData/checkpoints/sam_hq_vit_h.pth'

Run the demo code for segment everything task, the results will be saved into ./test_data/seg_results.

python main.py --mode='inference' --model_name='sam_hq_vit_h'

Acknowledgements

We thank the following projects: SAM, SAM2, SAM-HQ, MobileSAM, MobileSAMv2, EdgeSAM, EfficientSAM, TinySAM, MedSAM.

License

This project is licensed under Apache License 2.0. Redistribution and use should follow this license.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
assets		assets
sam_family		sam_family
utils		utils
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

《Compress Any Segment Anything Model (SAM)》

News 📢

Introduction

The proposed Birkhoff enjoys the following merits: 💡

🥳 Birkhoff 🍾

HyperLinear Operator

Segment Everything Visualization

Compression Performance

Compression Time and Ratio

Comparison with Other Data-Free Compression Methods

Inference Time

Requirements

Usage

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

《Compress Any Segment Anything Model (SAM)》

News 📢

Introduction

The proposed Birkhoff enjoys the following merits: 💡

🥳 Birkhoff 🍾

HyperLinear Operator

Segment Everything Visualization

Compression Performance

Compression Time and Ratio

Comparison with Other Data-Free Compression Methods

Inference Time

Requirements

Usage

Acknowledgements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages