-
Notifications
You must be signed in to change notification settings - Fork 260
Description
BatchNorm in NVIDIA/caffe is not compatible with BatchNorm in BVLC/caffe.
There is also no compatibility b/w engine:CAFFE and engine:CUDNN BatchNorm in NVIDIA/caffe itself. (Blob shapes are different).
Kindly fix these issues - so that we can use pre-trained models for fine tuning.
Please refer to:
NVIDIA/DIGITS#629
and
BVLC#3919
as well where similar issues are discussed.
I have some suggestions to fix these issues:
-
Rename the NVIDIA/caffe's BatchNorm to BatchNormScale, since it now includes Scaling as well.
-
Put a check/exit in CUDNN BatchNormScale reshape function, if the top and bottom blobs are same - so that the user will get a warning.
-
Fix the inconsistency in blob shape between engine:CAFFE and engine:CUDNN
-
Currenty I have to specify so many parameters in the new BatchNorm layer. Thi is un-necessary.
layer {
name: "bn_conv1"
bottom: "conv1"
top: "conv1"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
(4a). In BatchNormScale, If you change the oder of the blobs to: gloabl_mean, and global_variance, scale, bias, global_counter, then I don't have to specify 4 param fields for lr_mult and decay_mult - but only 2.
(4b). If the definition of scale and bias fields in BatchNormParameter is changed to:
optional float scale_filler = 5 [default = 1];
optional float bias_filler = 6 [default = 0];
Then I don't have to specify these also in the prototxt.
- Keep the original BatchNorm from BVLC/caffe as it is, un-touched - so that compatibility to BVLC/caffe is not affected and old BVLC/caffe models can be used for fine tuning. If possible, provide a CUDNN version of this original BatchNorm without scaling as well, so that it can be accelerated.