We apply two different models for the detection of cloud types from the Understanding Clouds from Satellite Images Kaggle competition:
-
We modify the architecture from the object detection example in a pytorch tutorial by adding an evaluation function for the test data and by using a better data augmentation layer.
-
We use a model from the segmentation models library (preferred approach)
Set environment variables in .env:
source .envSetup the environment:
pip install -r requirements.txt
python setup.py installGenerate a dummy test.csv that is required by the dataloader.
python exec/generate_test_csv.pyTrain one of the three training models (exec/train_v1.py,
exec/train_v2.py, exec/train_v3.py):
python exec/train_v3.py \
--size_tr_val 20 \
--size_val 8 \
--batch_size 2 \
--print_freq 2 \
--num_epochs 3 \
--seed 1To make a prediction we have created a dummy test.csv file that has
the same structure as the train.csv file. It is created in order to
use the same dataloader for training and for making predictions. To make
a prediction use exec/predict_v1.py, exec/predict_v2.py or exec/predict_v3.py:
python exec/predict_v3.py \
--nrows 10 \
--load_epoch 2Faster RCNN Architecture
The examples (train_v1.py, train_v2.py) use a Faster RCNN with a pretrained Resnet50 as
a backbone. The model detects objects, masks and bounding boxes. Since
the training data provided in the Kaggle competition contains only masks
of four different object types and no bounding boxes we have used an
algorithm that detects non-connected regions from the masks and assigns
to each of them a bounding box (this just seemed easier than modifying
the loss function of the model).
The original model has several weaknesses:
- The loss function depends on the bounding box loss which is irrelevant for the current task (WIP).
- We use only random horizontal and vertical image flips to augment the data.
- The default
evaluate()function can not run on a GPU. We do not know if the model is overfitting.
The default forward function of the used classes has different output
that depends on whether the model is in train or eval mode. In
eval mode the losses are not calculated. In the current implementation
we have derived new classes from:
torchvision.models.detection.rpn.RegionProposalNetworktorchvision.models.detection.roi_heads.RoIHeadstorchvision.models.detection.generalized_rcnn.GeneralizedRCNN
which have a modified forward() method with an additional argument
return_loss=False that allows to return the losses in eval mode.
Look at /clouds/myclasses.py for the new class
definitions.
This function is used in both train.py and train_v2.py.
We have used the albumeration
library for data augmentation. The library takes care of all
transformations of the masks and bounding boxes. The Dataset
defined in /clouds/dataset_v2.py was modified to take into account
data transformations from this library.
Bounding box formats:
coco: [x_min, y_min, width, height]pascal_voc: [x_min, y_min, x_max, y_max]
The data augmentation is used only in train_v2.py.
Segmentation model Architecture (preferred)
The example train_v3.py makes use of the segmentation models library.
In this case we only work with different the U-shaped convolutional neural
networks which previously were representing only the backbone of the RCNN,
i.e. no region proposal network, no RoI heads for bounding boxes and masks.