Skip to content

Comments

Added batchnorm layer.#1867

Closed
ghost wants to merge 1 commit intomasterfrom
unknown repository
Closed

Added batchnorm layer.#1867
ghost wants to merge 1 commit intomasterfrom
unknown repository

Conversation

@ghost
Copy link

@ghost ghost commented Feb 15, 2015

No description provided.

@ghost
Copy link
Author

ghost commented Feb 15, 2015

Looks like some #ifdef CPU and cmake support need to be added.

@ChenglongChen
Copy link

Seems to me that you are doing per-neuron batch normalization which is feasible when it is fully connected layer. But in conv layer (and also fully connected layer), you should do per-channel batch normalization according to Google's paper.

Update: Here is my quick and messy implementation based on MVNLayer: https://github.com/ChenglongChen/batch_normalization

@ducha-aiki
Copy link
Contributor

@ChenglongChen
have you tested your layer? I have tried it and lenet haven`t converged at all.

@ChenglongChen
Copy link

@ducha-aiki,
I have used it a few times with mnist and my own dataset. Seems working fine in my case. For lenet, you can find the model file, solver and training log in the above repo.

@ducha-aiki
Copy link
Contributor

@ChenglongChen
Thank you, the problem was is fillers. Now it converges.

@ghost
Copy link
Author

ghost commented Feb 15, 2015

@ChenglongChen Perhaps having the layer normalize over num() and channels() is a better default. But by my reading, it's not actually possible to do proper convolution batch normalization without deep integration into the conv layer itself. I assumed Caffe would have to implement that separately. They apply normalization to each feature map independently, and feature maps are overlapping in the convolution operation, so it's not possible get the desired result via composition with the current conv layer. Essentially, the normalization is not just over num() and channels(), but also 3x3 or 5x5 patches of width() and height(). Correct me where I'm wrong.

paper

Edit: It looks like others are interpreting the feature maps as the entire width() and height() for a given channel, rather than kxk kernel regions, so you can get away with implementing the layer separately after all. It seems more natural to me to normalize across the local receptive fields, but perhaps the authors ran into the same difficulty of implementing that route. It would be interesting to know if they had tried.

@ducha-aiki
Copy link
Contributor

@Russell91, implementation of @ChenglongChen looks like handling this part properly. Actually, it is very similar to yours.
@ChenglongChen, I have tested your layer on different datasets and network sometimes goes to zero accuracy and it behaivour depends greatly on hyperparameters. (especially with leaky ReLU, as well as with normal ReLUs) However, your implementation looks straightforward and correct. Have you experienced any troubles? If you agree, I will rebase your implementation on current dev branch.

pannous added a commit to pannous/caffe that referenced this pull request Feb 16, 2015
@ducha-aiki
Copy link
Contributor

@ChenglongChen rebased and fixed in #1891
You have used y instead of x_norm in propagation.

@ChenglongChen
Copy link

@ducha-aiki,
Yes, you are right. Thanks for the fix!

@melgor
Copy link

melgor commented Feb 18, 2015

Could you clarify mu sth @ChenglongChen ?
In your example in lenet you do not have any activation function after BN layer. In paper is written that:
image

So, do I miss sth or there should be RELU or SIGMOID function after BN layer?

I try @ChenglongChen version and @ducha-aiki and both of them does not converge on my data. I do not know why.

@ducha-aiki
Copy link
Contributor

I try @ChenglongChen version and @ducha-aiki and both of them does not converge on my data. I do not know why.

@melgor you are right. My version only passes tests, but also has strange behaivour on some datasets. So any help is appreciated.

In your example in lenet you do not have any activation function after BN layer.

Just because original caffe-lenet also has no activation function, I think.

@ChenglongChen
Copy link

@melgor,
As indicated by @ducha-aiki, caffe-lenet doesn't have activation function, but of course you can add RELU or other activation functions right after BNLayer.

I did also observed some sudden blow-ups in some mini-batches in the training (with my older version). I suspect it is due to the variance division but not very sure.

@weiliu89
Copy link

According to Algorithm 2, line 8 - 12. Before testing, we should compute the mean and variance from some training mini-batch. I think this piece is still missed, and it should be integrated in solver.cpp?

@ducha-aiki
Copy link
Contributor

@weiliu89, it works when we extracting probabilities or features in mini-batches. With single input does not yet. See #1965

@shelhamer
Copy link
Member

Closing in favor of #1965 as it seems to be more complete -- comment if not.

@shelhamer shelhamer closed this Mar 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants