Add invert_channels option to TransformData by sguada · Pull Request #1139 · BVLC/caffe

sguada · 2014-09-22T23:16:45Z

Added invert_channels option to TransformData. That would invert the order of the channels in datum and mean to be able to use the models released in #1138 with current level_db/lmdb and image_mean stored in BGR format.
This is a simple fix, what do you think @ksimonyan @shelhamer?

Ex.

name: "VGG_CNN_F"
layers {
  name: "data"
  type: DATA
  top: "data"
  top: "label"
  data_param {
    source: "examples/imagenet/ilsvrc12_val_leveldb"
    batch_size: 50
  }
  transform_param {
    crop_size: 224
    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
    mirror: false
    invert_channels: true
  }
  include: { phase: TEST }
}

shelhamer · 2014-09-22T23:37:28Z

I think it is simpler to do a one time conversion of the VGG models / mean to BGR to keep a standard throughout instead of introducing another configuration switch. I can do the surgery today, or it may be simplest if I send you the script so that you can upload the transformed models yourself and update the model zoo entries.

ksimonyan · 2014-09-22T23:42:23Z

I agree with @shelhamer. Actually, it would be nice to have a (Python) script which does the conversion, so that it could be used for other models trained on RGB images.

shelhamer · 2014-09-22T23:46:31Z

@ksimonyan I will share the script -- it is quite simple to do these conversions since parameters are mutable in the Python wrapper as the editing model parameters example shows.

sguada · 2014-09-22T23:46:40Z

Sure, I don't mind, I just wanted to give it a try quickly. With net surgery then @ksimonyan would need to update his models and change his Matlab code to switch the channels.

@ksimonyan with #501 is also very easy to do Net surgery in Matlab, just in case you want to do it yourself.

So far I tried two networks and they work well. These are the errors on the val set using only the center crop (differences with #1138 are probably due to the use of 10 crops per image and images resized to 256x256):

VGG_CNN_F: top-1 (42.92%) top-5 (20.4%)
VGG_CNN_S: top-1 (38.14%) top-5 (16.73%)

ksimonyan · 2014-09-23T00:07:11Z

@sguada, thanks for trying out the models on ILSVRC.

I've added the errors, achieved using a single test crop, to the gist: https://gist.github.com/ksimonyan/5c9129cfb8f0359eaf67
They are as follows:

CNN_S: 15.4%
CNN_M: 15.7%
CNN_M_2048: 15.6%
CNN_M_1024: 16.0%
CNN_M_128: 18.3%
CNN_F: 18.9%

So you seem to be losing ~1% somewhere. I used the central crop from the images rescaled so that the smallest side is 256 (while preserving the aspect ratio). So maybe the aspect ratio distortion is the reason for the drop.

sguada · 2014-09-23T00:17:06Z

@ksimonyan Thanks for sharing your errors with single test crop.
As you said it seems that since the model was trained with images preserving the aspect ratio, so when I tested on images with distorted aspect ratio performs ~1.5% worse. (resized to 256x256). I will try to test with images that maintain the aspect ratio.

ksimonyan · 2014-09-23T00:20:48Z

@sguada
ideally you'll also need a new mean image for that - or you can convert the one which I released (which is stored in mat as a 224x224 RGB image).

sguada · 2014-09-23T00:23:08Z

I have a question on how did you computed the mean image if images have different sizes and the crops can be in different positions. Or did you just compute the mean of the center crop?

ksimonyan · 2014-09-23T00:24:43Z

did you just compute the mean of the center crop?

Yes.

shelhamer · 2014-09-23T00:24:49Z

@ksimonyan regarding the mean, have you also tried training with a channel mean averaged over the spatial dimensions (since that simplifies preprocessing for differently-sized inputs)?

ksimonyan · 2014-09-23T00:27:18Z

@shelhamer
I did it for the ILSVRC-2014 submission (see http://arxiv.org/abs/1409.1556/), but not for the BMVC models. Is it supported by caffe?

sguada · 2014-09-23T00:28:42Z

No yet, I did it in private branch, but I could do a PR and share it.

ksimonyan · 2014-09-23T00:30:46Z

No yet, I did it in private branch, so I can do a PR and share it.

Please do, it might come in handy.

sguada · 2014-09-23T00:32:00Z

Sure I will, are you planing in releasing your models for ILSVRC-2014?

ksimonyan · 2014-09-23T00:33:45Z

are you planing in releasing your models for ILSVRC-2014?

Possibly, but let's sort out the BMVC-2014 models first.

shelhamer · 2014-09-23T04:35:24Z

@sguada here's a script to swap channels: https://gist.github.com/shelhamer/bee2a5b2b739fe6cee6f#file-swap_input_channels-py

PR your channel-wise mean transformer when you have chance -- like we've talked about the age of the channel mean will be a better one than these spatial mean times.

sguada · 2014-09-23T17:35:39Z

Thanks to @ksimonyan now the VGG models #1138 are in BGR format, so there is no need for this PR

Add invert_channels option to TransformData

4e20777

sguada closed this Sep 23, 2014

Comments

Conversation

sguada commented Sep 22, 2014

Uh oh!

shelhamer commented Sep 22, 2014

Uh oh!

ksimonyan commented Sep 22, 2014

Uh oh!

shelhamer commented Sep 22, 2014

Uh oh!

sguada commented Sep 22, 2014

Uh oh!

ksimonyan commented Sep 23, 2014

Uh oh!

sguada commented Sep 23, 2014

Uh oh!

ksimonyan commented Sep 23, 2014

Uh oh!

sguada commented Sep 23, 2014

Uh oh!

ksimonyan commented Sep 23, 2014

Uh oh!

shelhamer commented Sep 23, 2014

Uh oh!

ksimonyan commented Sep 23, 2014

Uh oh!

sguada commented Sep 23, 2014

Uh oh!

ksimonyan commented Sep 23, 2014

Uh oh!

sguada commented Sep 23, 2014

Uh oh!

ksimonyan commented Sep 23, 2014

Uh oh!

shelhamer commented Sep 23, 2014

Uh oh!

sguada commented Sep 23, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants