Update Xavier by antinucleon · Pull Request #610 · apache/mxnet

antinucleon · 2015-11-17T22:07:29Z

Make magnitude as a variable, which is more flexible

vchuravy · 2015-11-18T00:08:43Z

python/mxnet/initializer.py

That should probably be gaussian

Both Gaussian and Uniform are fine. Personally I prefer uniform

I guess vchuravy means there is a typo :)

lol, thanks @pluskid @vchuravy

vchuravy · 2015-11-18T13:38:47Z

python/mxnet/initializer.py

So if I read it correctly in the original paper it should be (fan_in + fan_out)/2 (Also that's what Caffe does) and then magnitude can default to 3.

It doesn't matter in practice. We need to turn it a little bit to avoid too large weight or 0. Default is not always good and lead training fail in many case

vchuravy · 2015-11-18T13:42:53Z

So I am not sure that kaiming is the correct name. I would prefer specifying the regularization as avg = (fan_in + fan_out) / 2; in = fan_in; out = fan_out and the the difference between xavier and kaiming is just the magnitude.

Also see BVLC/caffe#1946 and BVLC/caffe#1883 for some more discussion and http://arxiv.org/abs/1312.6120v3 for the paper where kaiming was first proposed the A in MSRA.

vchuravy · 2015-11-18T13:47:21Z

Also note they Caffe calculates fan_in and fan_out differently from us:

You should make sure the input blob has shape (num, a, b, c) where a * b * c = fan_in and num * b * c  = fan_out.

antinucleon · 2015-11-18T16:08:18Z

At least I think use fan_out directly is useless and will make training fail because it is too large. I prefer manual adjust magnitude instead of stick on some magic formula. eg, if you check keras there will be a different define of these stuff formula.

vchuravy · 2015-11-19T00:31:35Z

For me the concerns is consistency and flexibility. I think we shouldn't move to far away from prior tech with our definitions. I think the most flexible thing is to separate it into regularization, magnitude and distribution. That also gets rid of the naming issue and as the documentation we can just describe how to get traditional variants of Xavier or MSR&A. I can implement my idea in MXNet.jl and we can then discuss which one is the better way.

antinucleon · 2015-11-19T00:40:39Z

@vchuravy I agree currently everyone is using their own way to name it which is a mass. For name kaiming comes from cxxnet and I just want to follow cxxnet way but not caffe way. I just changed the default magnitude and I suggest to use in this way first then we can discuss a better way then refactor.

vchuravy · 2015-11-19T00:45:57Z

Take a look ar vchuravy/MXNet.jl@fc6b4ed and It also allows to side-step the naming issue.

Update Xavier

vchuravy · 2015-11-20T08:46:16Z

@antinucleon Thanks for this and also thank you for the productive debate.

vchuravy reviewed Nov 18, 2015
View reviewed changes

vchuravy mentioned this pull request Nov 18, 2015

Different Xaiver variants dmlc/MXNet.jl#28

Merged

vchuravy reviewed Nov 18, 2015
View reviewed changes

vchuravy mentioned this pull request Nov 19, 2015

Rework Xavier to be more flexibility dmlc/MXNet.jl#32

Merged

Update Xavier

55b01c7

antinucleon added a commit that referenced this pull request Nov 20, 2015

Merge pull request #610 from antinucleon/min-net

4aa7d3b

Update Xavier

antinucleon merged commit 4aa7d3b into apache:master Nov 20, 2015

Conversation

antinucleon commented Nov 17, 2015

Uh oh!

vchuravy Nov 18, 2015

Choose a reason for hiding this comment

Uh oh!

antinucleon Nov 18, 2015

Choose a reason for hiding this comment

Uh oh!

pluskid Nov 18, 2015

Choose a reason for hiding this comment

Uh oh!

antinucleon Nov 18, 2015

Choose a reason for hiding this comment

Uh oh!

vchuravy Nov 18, 2015

Choose a reason for hiding this comment

Uh oh!

antinucleon Nov 18, 2015

Choose a reason for hiding this comment

Uh oh!

vchuravy commented Nov 18, 2015

Uh oh!

vchuravy commented Nov 18, 2015

Uh oh!

antinucleon commented Nov 18, 2015

Uh oh!

vchuravy commented Nov 19, 2015

Uh oh!

antinucleon commented Nov 19, 2015

Uh oh!

vchuravy commented Nov 19, 2015

Uh oh!

vchuravy commented Nov 20, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments