Skip to content
/ SVIP Public

[ICCV2025] Official Pytorch implementation for SVIP (SVIP: Semantically Contextualized Visual Patches for Zero-Shot Learning)

Notifications You must be signed in to change notification settings

uqzhichen/SVIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SVIP: Semantically Contextualized Visual Patches for Zero-Shot Learning

Zhi Chen1*    Zecheng Zhao2*    Jingcai Guo3    Jingjing Li4    Zi Huang2

1University of Southern Queensland     2University of Queensland    

3The Hong Kong Polytechnic University   

4University of Electronic Science and Technology of China   

 

About

SVIP is a transformer-based framework designed to enhance visual-semantic alignment for zero-shot learning. Specifically, we propose a self-supervised patch selection mechanism that preemptively learns to identify semantic-unrelated patches in the input space. This is trained with the supervision from aggregated attention scores across all transformer layers, which estimate each patch’s semantic score. As removing semantic-unrelated patches from the input sequence may disrupt object structure, we replace them with learnable patch embeddings. With initialization from word embeddings, we can ensure they remain semantically meaningful throughout feature extraction. Extensive experiments on ZSL benchmarks demonstrate that SVIP achieves stateof-the-art performance results while providing more interpretable and semantically rich feature representations.

⚙️ Installation

Python 3.12

PyTorch 2.5.1

All experiments are tested with a single NVIDIA RTX 3090 GPU.

♨️ Data Preparation

  • Dataset: please download the dataset, i.e., CUB, AWA2, SUN, and put the datasets in ./data/ folder
  • Data split and meta data: please download the info-files folder and place it in ./info-files/.
  • attribute w2v: use scripts in ./tools to generate attribute w2v and place in ./attribute/w2v folder.
  • Pre-trained models: please download the pre-trained models and place it in ./pretrained_models/.

📊 Main Results

We provide the trained ZSL model checkpoints for three datasets as follows:

Dataset ZSL Accuracy Download link GZSL Accuracy Download link
CUB 79.8 Download 75.0 Download
AWA2 69.8 Download 74.9 Download
SUN 71.6 Download 50.7 Download

License

This work is under the Apache License Version 2.0, while some specific implementations in this codebase might be with other licenses.

Kindly refer to LICENSE.md for a more careful check, if you are using our code for commercial matters.

Citation

If you find this work helpful for your research, please kindly consider citing our paper:

@inproceedings{chen2025svip,
    title = {SVIP: Semantically Contextualized Visual Patches for Zero-Shot Learning},
    author = {Chen, Zhi and Zhao, Zecheng and Guo, Jingcai and Li, Jingjing and Huang, Zi},
    booktitle = {IEEE/CVF International Conference on Computer Vision (ICCV)},
    year = {2025}
}

About

[ICCV2025] Official Pytorch implementation for SVIP (SVIP: Semantically Contextualized Visual Patches for Zero-Shot Learning)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published