Xiwei Xuan, Ziquan Deng, and Kwan-Liu Ma
Training-free open-vocabulary semantic segmentation (OVS) aims to segment images given a set of arbitrary textual categories without costly model fine-tuning. Existing solutions often explore attention mechanisms of pre-trained models, such as CLIP, or generate synthetic data and design complex retrieval processes to perform OVS. However, their performance is limited by the capability of reliant models or the suboptimal quality of reference sets. In this work, we investigate the largely overlooked data quality problem for this challenging dense scene understanding task, and identify that a high-quality reference set can significantly benefit training-free OVS. With this observation, we introduce a data-quality-oriented framework, comprising a data pipeline to construct a reference set with well-paired segment-text embeddings and a simple similarity-based retrieval to unveil the essential effect of data. Remarkably, extensive evaluations on ten benchmark datasets demonstrate that our method outperforms all existing training-free OVS approaches, highlighting the importance of data-centric design for advancing OVS without training.
- Linux with Python ≥ 3.10
- PyTorch ≥ 2.5.1 is recommended and torchvision that matches the PyTorch installation. Install them together at pytorch.org to make sure of this. An example of installation is shown below:
git clone https://github.com/xiweix/ReME.git
cd ReME
conda create -n reme python=3.10 -y
conda activate reme
conda install pip
bash install.sh
This document explains how to download and organize datasets.
We refer to detectron.data for data preparation. Please follow Detectron2 installation instructions for installing Detectron2. Note that, with Detectron2, a dataset can be used by accessing DatasetCatalog for its data, or MetadataCatalog for its metadata (class names, etc). Use Custom Datasets gives a deeper dive on how to use DatasetCatalog and MetadataCatalog, and how to add new datasets to them.
The datasets are assumed to exist in a directory specified by the environment variable DETECTRON2_DATASETS. You can set the location by export DETECTRON2_DATASETS=/path/to/datasets. If left unset, the default is ./datasets relative to your current working directory.
In case of DETECTRON2 not used, modify the dataset_dir in scripts/datasets/prepare*.py to the dataset path.
We expect datasets in the structure described below.
data/ # Specify this location by DETECTRON2_DATASETS or dataset_dir
coco_stuff164k/ # COCO-Stuff
coco_object/ # COCO-Object
ADEChallengeData2016/ # ADE20K-150
ADE20K_2021_17_01/ # ADE20K-847
VOCdevkit/
VOC2010/ # PASCAL Context
VOC2012/ # PASCAL VOC
cityscapes/ # Cityscapes
Prepare data for COCO-Stuff:
coco_stuff164k/
annotations/
val2017/
images/
train2017/ #### For train split, only images are needed for curating the reference set
val2017/
# below are generated by prepare_coco_stuff.py
annotations_detectron2/
val2017/
Download the COCO (2017) images from https://cocodataset.org/
wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zipDownload the COCO-Stuff annotation from https://github.com/nightrome/cocostuff.
wget http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zipUnzip val2017.zip and stuffthingmaps_trainval2017.zip. Then put them to the correct location listed above. and generate the labels for testing.
python datasets/prepare_coco_stuff.py
Prepare data for ADE20K-150:
ADEChallengeData2016/
annotations/
validation/
images/
validation/
# below are generated by prepare_ade20k_150.py
annotations_detectron2/
validation/
Download the data of ADE20K-150 from http://sceneparsing.csail.mit.edu.
wget http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip
Unzip ADEChallengeData2016.zip and generate the labels for testing.
python datasets/prepare_ade20k_150.py
Prepare data for ADE20k-847:
ADE20K_2021_17_01/
images/
ADE/
validation/
index_ade20k.mat
index_ade20k.pkl
# below are generated by prepare_ade20k_847.py
annotations_detectron2/
validation/
Download the data of ADE20k-Full from https://groups.csail.mit.edu/vision/datasets/ADE20K/request_data/ Unzip the dataset and generate the labels for testing.
python datasets/prepare_ade20k_847.py
Prepare data for PASCAL VOC 2012:
VOCdevkit/
VOC2012/
Annotations/
ImageSets/
JPEGImages/
SegmentationClass/
SegmentationClassAug/
SegmentationObject/
# below are generated by prepare_voc.py
annotations_detectron2
annotations_detectron2_bg
Download the data of PASCAL VOC from http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#devkit.
We use SBD augmentated training data as SegmentationClassAug following Deeplab.
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
wget https://www.dropbox.com/s/oeu149j8qtbs1x0/SegmentationClassAug.zip
Unzip VOCtrainval_11-May-2012.tar and SegmentationClassAug.zip. Then put them to the correct location listed above and generate the labels for testing.
python datasets/prepare_voc.py
Prepare data for PASCAL Context:
VOCdevkit/
VOC2010/
Annotations/
ImageSets/
JPEGImages/
SegmentationClass/
SegmentationObject/
trainval/
labels.txt
pascalcontext_val.txt
trainval_merged.json
# below are generated by prepare_pascal_context_59.py and prepare_pascal_context_459.py
annotations_detectron2/
pc459_val
pc59_val
Download the data of PASCAL VOC 2010 from https://www.cs.stanford.edu/~roozbeh/pascal-context/.
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar
Download the annotation for 59 and 459 classes.
wget https://codalabuser.blob.core.windows.net/public/trainval_merged.json
wget https://roozbehm.info/pascal-context/trainval.tar.gz.
Unzip VOCtrainval_03-May-2010.tar and trainval.tar.gz. Then put them to the correct location listed above and generate the labels for testing.
python datasets/prepare_pascal_context_59.py
python datasets/prepare_pascal_context_459.py
Prepare data for Cityscapes:
cityscapes/
leftImg8bit/
val/
gtFine/
val/
Follow https://www.cityscapes-dataset.com/downloads/ for downloading data and annotations. Registration is needed.
Download leftImg8bit_trainvaltest.zip and gtFine_trainvaltest.zip. Then unzip the validation split and put them to the correct location listed above.