Skip to content

zwyang6/DiCLIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

[TIP 2026] DiCLIP: Diffusion Model Enhances CLIP’s Dense Knowledge for Weakly Supervised Semantic Segmentation

News

  • Mar. 28th, 2025: DiCLIP is Submitted.
  • May. 4th, 2026: DiCLIP is Accepted by IEEE Transactions on Image Processing!
  • The code will be released later this week. Please stay tuned.... 🔥🔥🔥
  • If you find this work helpful, please give us a 🌟 to receive the updation !

Overview

We propose DiCLIP, a novel WSSS framework, which leverages the generative diffusion model to enhance CLIP's dense knowledge across vision and text modalities

DiCLIP pipeline

Data Preparation

PASCAL VOC 2012

1. Download

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar

2. Segmentation Labels

The augmented annotations are from SBD dataset. The download link of the augmented annotations at DropBox. After downloading SegmentationClassAug.zip, you should unzip it and move it to VOCdevkit/VOC2012/.

VOCdevkit/
└── VOC2012
    ├── Annotations
    ├── ImageSets
    ├── JPEGImages
    ├── SegmentationClass
    ├── SegmentationClassAug
    └── SegmentationObject

MSCOCO 2014

1. Download

wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip

2. Segmentation Labels

To generate VOC style segmentation labels for COCO, you could use the scripts provided at this repo, or just download the generated masks from Google Drive.

COCO/
├── JPEGImages
│    ├── train2014
│    └── val2014
└── SegmentationClass
     ├── train2014
     └── val2014

Requirement

Please refer to the requirements.txt.

Train DiCLIP

### train voc
bash run_train.sh scripts/train_voc.py [gpu_device] [gpu_number] [master_port]  train_voc

### train coco
bash run_train.sh scripts/train_coco.py [gpu_devices] [gpu_numbers] [master_port] train_coco

Evaluate DiCLIP

### eval voc seg and LAM
bash run_evaluate_voc.sh tools/infer_lam.py [gpu_device] [gpu_number] [infer_set] [checkpoint_path]

### eval coco seg
bash run_evaluate_seg_coco.sh tools/infer_seg_coco.py [gpu_device] [gpu_number] [infer_set] [checkpoint_path]

Main Results

  • Quantitative Results

Semantic performance on VOC and COCO. Logs are available now. Checkpoints will be available soon.

Dataset Backbone Val Test Log Weight
PASCAL VOC ViT-B 78.8 78.9 log Checkpoint
MS COCO ViT-B 48.7 - log Checkpoint
  • Qualitative Results
  1. CAM Comparison

DiCLIP results

  1. VOC Segmentation

DiCLIP results

  1. COCO Segmentation

DiCLIP results

Citation

Please cite our work if you find it helpful to your reseach. 💕

About

[TIP 2026] DiCLIP: Diffusion Model Enhances CLIP’s Dense Knowledge for Weakly Supervised Semantic Segmentation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors