[TIP 2026] DiCLIP: Diffusion Model Enhances CLIP’s Dense Knowledge for Weakly Supervised Semantic Segmentation

News

Mar. 28th, 2025: DiCLIP is Submitted.
May. 4th, 2026: DiCLIP is Accepted by IEEE Transactions on Image Processing!
The code will be released later this week. Please stay tuned.... 🔥🔥🔥
If you find this work helpful, please give us a 🌟 to receive the updation !

Overview

We propose DiCLIP, a novel WSSS framework, which leverages the generative diffusion model to enhance CLIP's dense knowledge across vision and text modalities

Data Preparation

PASCAL VOC 2012

1. Download

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar

2. Segmentation Labels

The augmented annotations are from SBD dataset. The download link of the augmented annotations at DropBox. After downloading SegmentationClassAug.zip, you should unzip it and move it to VOCdevkit/VOC2012/.

VOCdevkit/
└── VOC2012
    ├── Annotations
    ├── ImageSets
    ├── JPEGImages
    ├── SegmentationClass
    ├── SegmentationClassAug
    └── SegmentationObject

MSCOCO 2014

1. Download

wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip

2. Segmentation Labels

To generate VOC style segmentation labels for COCO, you could use the scripts provided at this repo, or just download the generated masks from Google Drive.

COCO/
├── JPEGImages
│    ├── train2014
│    └── val2014
└── SegmentationClass
     ├── train2014
     └── val2014

Requirement

Please refer to the requirements.txt.

Train DiCLIP

### train voc
bash run_train.sh scripts/train_voc.py [gpu_device] [gpu_number] [master_port]  train_voc

### train coco
bash run_train.sh scripts/train_coco.py [gpu_devices] [gpu_numbers] [master_port] train_coco

Evaluate DiCLIP

### eval voc seg and LAM
bash run_evaluate_voc.sh tools/infer_lam.py [gpu_device] [gpu_number] [infer_set] [checkpoint_path]

### eval coco seg
bash run_evaluate_seg_coco.sh tools/infer_seg_coco.py [gpu_device] [gpu_number] [infer_set] [checkpoint_path]

Main Results

Quantitative Results

Semantic performance on VOC and COCO. Logs are available now. Checkpoints will be available soon.

Dataset	Backbone	Val	Test	Log	Weight
PASCAL VOC	ViT-B	78.8	78.9	log	Checkpoint
MS COCO	ViT-B	48.7	-	log	Checkpoint

Qualitative Results

CAM Comparison

VOC Segmentation

COCO Segmentation

Citation

Please cite our work if you find it helpful to your reseach. 💕

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
logs		logs
sources		sources
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[TIP 2026] DiCLIP: Diffusion Model Enhances CLIP’s Dense Knowledge for Weakly Supervised Semantic Segmentation

News

Overview

Data Preparation

PASCAL VOC 2012

1. Download

2. Segmentation Labels

MSCOCO 2014

1. Download

2. Segmentation Labels

Requirement

Train DiCLIP

Evaluate DiCLIP

Main Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

[TIP 2026] DiCLIP: Diffusion Model Enhances CLIP’s Dense Knowledge for Weakly Supervised Semantic Segmentation

News

Overview

Data Preparation

PASCAL VOC 2012

1. Download

2. Segmentation Labels

MSCOCO 2014

1. Download

2. Segmentation Labels

Requirement

Train DiCLIP

Evaluate DiCLIP

Main Results

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages