Skip to content

LinLLLL/DeltaEnergy

Repository files navigation

Delta Energy: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OOD Generalization

Official implementation of Delta Energy: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OOD Generalization The paper has been accepted by NeurIPS 2025.

Ads

If you are interested in concurrent optimzaition for OOD generalzaition and OOD detection, checkout our

T-PAMI 2025 work:InfoBound: A Provable Information-Bounds Inspired Framework for Both OoD Generalization and OoD Detection

ICML 2024 work: CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection (openreview.net)

Abstract

Recent approaches for vision-language models (VLMs) have shown remarkable success in achieving fast downstream adaptation. When applied to real-world downstream tasks, VLMs inevitably encounter both the in-distribution (ID) data and out-of-distribution (OOD) data. The OOD datasets often include both covariate shifts (e.g., known classes with changes in image styles) and semantic shifts (e.g., test-time unseen classes). This highlights the importance of improving VLMs' generalization ability to covariate-shifted OOD data, while effectively detecting open-set semantic-shifted OOD classes. In this paper, inspired by the substantial energy change observed in closed-set data when re-aligning vision-language modalities—specifically by directly reducing the maximum cosine similarity to a low value—we introduce a novel OOD score, named $\Delta\mathrm{Energy}$. $\Delta\mathrm{Energy}$ significantly outperforms the vanilla energy-based OOD score and provides a more reliable approach for OOD detection. Furthermore, $\Delta\mathrm{Energy}$ can simultaneously improve OOD generalization under covariate shifts, which is achieved by lower-bound maximization for $\Delta\mathrm{Energy}$ (termed EBM). EBM is theoretically proven to not only enhance OOD detection but also yields a domain-consistent Hessian, which serves as a strong indicator for OOD generalization. Based on this finding, we developed a unified fine-tuning framework that allows for improving VLMs' robustness in both OOD generalization and OOD detection. Extensive experiments on challenging OOD detection and generalization benchmarks demonstrate the superiority of our method, outperforming recent approaches by \textbf{10%–25%} in AUROC.

Motivation

masking-effect

(A) Illustration of $\Delta\mathrm{Energy}$ for OOD detection. Significant differences in $\Delta\mathrm{Energy}$ are observed between closed-set data and open-set OOD data when the maximum cosine similarity is cropped to zero.

(B) Illustration of the $\Delta\mathrm{Energy}$ for OOD generalization. We introduce the EBM method to achieve domain-consistent Hessians, which simultaneously triggers bound optimization for $\Delta\mathrm{Energy}$.

(C) Comparison between our $\Delta\mathrm{Energy}$ and EBM with state-of-the-art methods. In the radar plots, all values are normalized to the range [0, 1]. It is observed that recent methods aimed at improving VLMs' OOD detection may not scale well to handling different types of distribution shifts in challenging ImageNet-1k OOD datasets.

Pipeline

pipeline_new2

Overview of the proposed method. Based on the prompt-tuning approach, we freeze both the image encoder and the text encoder, making only the context vectors ($\uptheta=[\uptheta_1, \cdots, \uptheta_n]$) learnable under the proposed objective function, as shown in Equation 8. During fine-tuning, we apply a masking operation to each ID image feature based on the top-1 similarity, as defined in Equation 6. We then compute the resulting energy change after modifying the vision-language alignment via masking, which allows us to perform bound optimization on $\Delta \mathrm{Energy}$. In the inference phase, following Equation 1, we reset the top-$c$ cosine similarities and then compute $\Delta\mathrm{Energy}$ for OOD detection. Simultaneously, we use the fine-tuned text feature and unmasked image feature for classification at test time. The complete algorithm can be seen in Appendix G.

How to Run

This code is developed based on CRoFT. For environment setup and datasets used to evaluate both OOD generalization and OOD detection, please refer to the original CRoFT repository.

(1) ZeroShot OOD detection Ability by Delta Energy:

We provide the running scripts in CoOp/scripts. We take Delta Energy as an example, other methods can be similarly evaluated. Make sure you change the path on DATA in shell files under CoOp/scripts/DeltaEnergy and run the commands under CoOp/scripts/DeltaEnergy.

# For evaluating Delta Energy on the SETUP-I:
python test_setup1.py
# For evaluating Delta Energy on the SETUP-II:
python test_setup2.py

This workflow is consistent for other baselines (such as, MCM, MaxLogits, MSP, Energy Score, CLIPN, React, ODIN). For example, to evaluate MCM, please navigate to its corresponding directory, CoOp/scripts/MCM and execute the provided shell scripts:

bash test_openood_setup1.sh gpu_id
bash test_openood_setup1_osr.sh gpu_id
(2) Few-shot Fine-tuning to Enhance both OOD Generalzaition and OOD Detection by EBM:

We provide the running scripts in CoOp/scripts. We take EBM as an example, other methods can be similarly evaluated. Make sure you change the path on DATA in shell files under CoOp/scripts/EBM and run the commands under CoOp/scripts/EBM.

# For training EBM on the in-distribution ImageNet46 datasets:
python run_setup1.py

# For evaluating EBM on the closed-set OOD datasets and open-set OOD datasets:
python test_setup1.py

# For training EBM on the in-distribution PACS or VLCS datasets:
python run_setup2.py

# For evaluating EBM on the closed-set OOD datasets and open-set OOD datasets:
python test_setup2.py
(3) Collect Results

To collect the EBM results across different experimental setups, please use the scripts provided in the original CRoFT repository. For example:

  • To collect OOD generalization results in SETUP-I, run:
# run the commands under CoOp/
python collect_result_set1_oodg.py

Acknowledgement

This repo benefits from CLIP, CoOp [CoCoOp](KaiyangZhou/CoOp: Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22) (github.com)), MCM, etc.

Thanks for their wonderful works.

Citation

If you use this code in your research, please kindly cite the following papers:

@inproceedings{zhu2025DeltaEnergy,
  title={Delta Energy: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OOD Generalization},
  author={Zhu, Lin and Yang, Yifeng and Wang, Xinbing and Gu, Qinying and Ye, Nanyang},
  booktitle={Advances in Neural Information Processing Systems},
  year={2025}
}

@article{zhu2024croft,
  title={CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection},
  author={Zhu, Lin and Yang, Yifeng and Gu, Qinying and Wang, Xinbing and Zhou, Chenghu and Ye, Nanyang},
  journal={arXiv preprint arXiv:2405.16417},
  year={2024}
} 

Contact

If you have any question about this project, please feel free to contact zhulin_sjtu@sjtu.edu.cn.

About

The official implementation of Delta Energy: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OOD Generalization (NeurIPS2025)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors