Fixing harsh upgrade_proto for "BatchNorm" layer#5184
Conversation
…"name" in param, only set lr_mult and decay_mult to zero
|
@shelhamer would you please have a look at this issue/proposed fix? Thanks. |
|
Switching to zeroing the |
|
@shelhamer Thanks for merging This PR! |
|
@shaibagon Thank Shai for a fix. Not sure about internal structure. Just quick question. Does the upgraded proto of BN layer have the same interface as before having this upgrade? |
|
@antran89 There is no interface change. The actions |
|
@shaibagon @shelhamer what will happen if we share parameters in batchnorm layer, since mean and variance are calculated based on input, so, during training, there are two inputs in the siamese network,there would be two means and two variance based on different inputs, So, what will be used as paramter in batchnorm, or we just average them? |
|
@Jiangfeng-Xiong you obviously cannot have two means and variances in the same layer, it make no sense. |
This PR attempts to fix issues #5171 and #5120 cuased by PR #4704:
PR#4704 removes completely all
paramarguments of"BatchNorm"layers, and resetting them toparam {lr_mult: 0}. This "upgrade" is too harsh and it discards"name"argument that might be set by user.This PR fixes
upgrade_proto.cppfor"BatchNorm"layer to be more conservative, leave"name"in param, and only setlr_multanddecay_multto zero.Example of such upgrade:
Input prototxt
"Upgraded" prorotxt:
As you can see
lr_multanddecay_multare set to zero leavingnameintact when explicitly set by user.