This is the PyTorch implementation of "Credible Dual-Experts for Weakly Supervised Semantic Segmentation (IJCV 2023)"
- MS COCO dataset and PASCAL VOC 2012 dataset are supported.
- This project heavily rely on kazuto1011's work, for more details, please see the link
- Many thanks for kazuto1011's great work!
- For the code of Res2Net, please email me: Bingfeng.Z@outlook.com as the original backbone code is from other's private code.
- For any problem related to this project, please email me: Bingfeng.Z@outlook.com, thanks.
Please follow the "Setup" part in the link to prepare your environment
### Download datasets
* Download PASCAL VOC 2012 Aug from this [link](https://drive.google.com/file/d/1jhtdjj3xrEp60zO3B7jZ14yxxZkCJMeM/view)
* Download MS COCO from this [link](https://cocodataset.org/#download)
* Download the GT mask of MS COCO from this [link](https://drive.google.com/file/d/1pRE9SEYkZKVg0Rgz2pi9tg48j7GlinPV/view)
* All used [pseudo labels] on PASCAL VOC 2012 can be found in this [link](https://drive.google.com/file/d/1jeYdx3oCcsfmjiPkeIPVXU0GH2HGqhxH/view?usp=sharing)
* All used [pseudo labels] on MS COCO can be found in this [link](https://drive.google.com/file/d/1xxQjfQZeqSXvjTtEkAVfh-5AxQsqorrA/view?usp=sharing)
For PASCAL VOC, During training, please put the used [pseudo label] folder into path/to/VOCdevkit/VOC2012, and rename the folder name as "SegmenatationCLassAug"
For COCO 2014, During training, please put the used [pseudo label] folder into path/to/COCO2014, and rename rename the folder name as "coco_seg_anno"
During tesing or inferencing, please use the original "SegmenatationClassAug" and "coco_seg_anno" which are GTs.
The final structure of the dataset is:
Dataset
├── VOCdevkit/VOC2012
| |── ImageSets
| |── JPEGImages
| |── SegmentationClass
| |── SegmentationClassAug (pseudo labels or GT)
├── coco
│ ├── coco_seg_anno (pseudo labels or GT)
| └── JPEGImages
coco-pretrain-resnet101 init weights can be download in this [link](https://drive.google.com/file/d/196TkMJPL3GA3CaXxKJrtsgX0CcXvWqsn/view?usp=sharing)
ImageNet-pretrain-resnet101 init weights can be download in this [link](https://drive.google.com/file/d/14soMKDnIZ_crXQTlol9sNHVPozcQQpMn/view)
To train DeepLab v2 on PASCAL VOC 2012 with CDL-b/e and coco init weights:
python main_CDL_b/e.py train \
--config-path configs/voc12_coco_pretrain.yamlTo train DeepLab v2 on PASCAL VOC 2012 with CDL-b/e and imagenet init weights:
python main_CDL_b/e.py train \
--config-path configs/voc12_imageNet_pretrain.yamlTo train DeepLab v2 on MS COCO with CDL-b/e and imagenet init weights:
python main_CDL_b/e.py train \
--config-path configs/coco.yaml
To evaluate the performance on a validation set with CDL-b/e:
python main_CDL_b/e.py test \
--config-path configs/voc12_coco_pretrain.yaml \
--model-path data/models/voc12/deeplabv2_resnet101_msc/train_aug/checkpoint_final.pthNote: This command saves the predicted logit maps (.npy) and the scores (.json).
To re-evaluate with a CRF post-processing:
python main_CDL_b/e.py crf \
--config-path configs/voc12_coco_pretrain.yamlFor CDL-e or ImageNet-pretrain, folloing above operations and change the corresponding name to "main_CDL-e" or "voc12_ImageNet_pretrain.yaml" or both of them.
To monitor a loss, run the following command in a separate terminal.
tensorboard --logdir data/logsCommon settings:
- Model: DeepLab v2 with ResNet-101 backbone. Dilated rates of ASPP are (6, 12, 18, 24). Output stride is 8.
- GPU: All the GPUs visible to the process are used. Please specify the scope with
CUDA_VISIBLE_DEVICES=. - Multi-scale loss: Loss is defined as a sum of responses from multi-scale inputs (1x, 0.75x, 0.5x) and element-wise max across the scales. The unlabeled class is ignored in the loss computation.
- Gradient accumulation: The mini-batch of 10 samples is not processed at once due to the high occupancy of GPU memories. Instead, gradients of small batches of 5 samples are accumulated for 2 iterations, and weight updating is performed at the end (
batch_size * iter_size = 10). GPU memory usage is approx. 11.2 GB with the default setting (tested on the single Titan X). You can reduce it with a smallbatch_size. - Learning rate: Stochastic gradient descent (SGD) is used with momentum of 0.9 and initial learning rate of 2.5e-4. Polynomial learning rate decay is employed; the learning rate is multiplied by
(1-iter/iter_max)**powerat every 10 iterations. - Monitoring: Moving average loss (
average_lossin Caffe) can be monitored in TensorBoard. - Preprocessing: Input images are randomly re-scaled by factors ranging from 0.5 to 1.5, padded if needed, and randomly cropped to 321x321.
-
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE TPAMI, 2018.
Project / Code / arXiv paper -
H. Caesar, J. Uijlings, V. Ferrari. COCO-Stuff: Thing and Stuff Classes in Context. In CVPR, 2018.
Project / arXiv paper -
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. IJCV, 2010.
Project / Paper
-
[A2GNN] Zhang B, Xiao J, Jiao J, et al. Affinity attention graph neural network for weakly supervised semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(11): 8082-8096.
-
[EPS] Lee S, Lee M, Lee J, et al. Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 5495-5505.
-
[RIB] Lee J, Choi J, Mok J, et al. Reducing information bottleneck for weakly supervised semantic segmentation[J]. Advances in Neural Information Processing Systems, 2021, 34: 27408-27421.
-
[VML] Ru L, Du B, Zhan Y, et al. Weakly-supervised semantic segmentation with visual words learning and hybrid pooling[J]. International Journal of Computer Vision, 2022, 130(4): 1127-1144.