- UNet: backbones MobileNetV2 (all aphas and expansions), ResNetV1 (all num_layers)
- DeepLab3+: backbones ResNetV1 (num_layers=18,34,50,101), VGG16_bn
- BiSeNet: backbones ResNetV1 (num_layers=18)
- PSPNet: backbones ResNetV1 (num_layers=18,34,50,101)
- ICNet: backbones ResNetV1 (num_layers=18,34,50,101)
To assess architecture, memory, forward time (in either cpu or gpu), numper of parameters, and number of FLOPs of a network, use this command:
python measure_model.py
- Python3.6.x is used in this repository.
- Clone the repository:
git clone --recursive https://github.com/YuantingMaSC/Image-Segmentation.git
cd Image-Segmentation
git submodule sync
git submodule update --init --recursive
- To install required packages, use pip:
workon humanseg
pip install -r requirements.txt
pip install -e models/pytorch-image-models
- Using "Lableme" tool to prepare and following script:
./original_data> python labelme2voc.py ./imgs_cut ./labelme_dataset --labels ./labels.txt
- For training a network from scratch, for example DeepLab3+, use this command:
python train.py --config config/config_DeepLab.json --device 0
where config/config_DeepLab.json is the configuration file which contains network, dataloader, optimizer, losses, metrics, and visualization configurations.
- For resuming training the network from a checkpoint, use this command:
python train.py --config config/config_DeepLab.json --device 0 --resume path_to_checkpoint/model_best.pth
- One can open tensorboard to monitor the training progress by enabling the visualization mode in the configuration file.
- using imgs2video.py
python imgs2video.py
There are two modes of inference:
python inference_video.py --watch --use_cuda --checkpoint path_to_checkpoint/model_best.pth
python inference_webcam.py --use_cuda --checkpoint path_to_checkpoint/model_best.pth
- Networks are trained on a combined dataset from the two mentioned datasets above.
- Input size of model is set to 320.
- The CPU and GPU time is the averaged inference time of 10 runs (there are also 10 warm-up runs before measuring) with batch size 1.
- The mIoU is measured on the testing subset (737 images) from the combined dataset.
- Hardware configuration for benchmarking:
CPU: 12th Gen Intel(R) Core(TM) i9-12900K 3.20 GHz
GPU: NVIDIA GeForce RTX 3090