Update Xavier#610
Conversation
python/mxnet/initializer.py
Outdated
There was a problem hiding this comment.
That should probably be gaussian
There was a problem hiding this comment.
Both Gaussian and Uniform are fine. Personally I prefer uniform
There was a problem hiding this comment.
I guess vchuravy means there is a typo :)
python/mxnet/initializer.py
Outdated
There was a problem hiding this comment.
So if I read it correctly in the original paper it should be (fan_in + fan_out)/2 (Also that's what Caffe does) and then magnitude can default to 3.
There was a problem hiding this comment.
It doesn't matter in practice. We need to turn it a little bit to avoid too large weight or 0. Default is not always good and lead training fail in many case
|
So I am not sure that Also see BVLC/caffe#1946 and BVLC/caffe#1883 for some more discussion and http://arxiv.org/abs/1312.6120v3 for the paper where |
|
Also note they Caffe calculates fan_in and fan_out differently from us: |
|
At least I think use fan_out directly is useless and will make training fail because it is too large. I prefer manual adjust magnitude instead of stick on some magic formula. eg, if you check keras there will be a different define of these stuff formula. |
|
For me the concerns is consistency and flexibility. I think we shouldn't move to far away from prior tech with our definitions. I think the most flexible thing is to separate it into regularization, magnitude and distribution. That also gets rid of the naming issue and as the documentation we can just describe how to get traditional variants of |
|
@vchuravy I agree currently everyone is using their own way to name it which is a mass. For name |
|
Take a look ar vchuravy/MXNet.jl@fc6b4ed and It also allows to side-step the naming issue. |
|
@antinucleon Thanks for this and also thank you for the productive debate. |
Make magnitude as a variable, which is more flexible