Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Update Xavier#610

Merged
antinucleon merged 1 commit intoapache:masterfrom
antinucleon:min-net
Nov 20, 2015
Merged

Update Xavier#610
antinucleon merged 1 commit intoapache:masterfrom
antinucleon:min-net

Conversation

@antinucleon
Copy link
Contributor

Make magnitude as a variable, which is more flexible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should probably be gaussian

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Gaussian and Uniform are fine. Personally I prefer uniform

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess vchuravy means there is a typo :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, thanks @pluskid @vchuravy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I read it correctly in the original paper it should be (fan_in + fan_out)/2 (Also that's what Caffe does) and then magnitude can default to 3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter in practice. We need to turn it a little bit to avoid too large weight or 0. Default is not always good and lead training fail in many case

@vchuravy
Copy link
Contributor

So I am not sure that kaiming is the correct name. I would prefer specifying the regularization as avg = (fan_in + fan_out) / 2; in = fan_in; out = fan_out and the the difference between xavier and kaiming is just the magnitude.

Also see BVLC/caffe#1946 and BVLC/caffe#1883 for some more discussion and http://arxiv.org/abs/1312.6120v3 for the paper where kaiming was first proposed the A in MSRA.

@vchuravy
Copy link
Contributor

Also note they Caffe calculates fan_in and fan_out differently from us:

You should make sure the input blob has shape (num, a, b, c) where a * b * c = fan_in and num * b * c  = fan_out. 

@antinucleon
Copy link
Contributor Author

At least I think use fan_out directly is useless and will make training fail because it is too large. I prefer manual adjust magnitude instead of stick on some magic formula. eg, if you check keras there will be a different define of these stuff formula.

@vchuravy
Copy link
Contributor

For me the concerns is consistency and flexibility. I think we shouldn't move to far away from prior tech with our definitions. I think the most flexible thing is to separate it into regularization, magnitude and distribution. That also gets rid of the naming issue and as the documentation we can just describe how to get traditional variants of Xavier or MSR&A. I can implement my idea in MXNet.jl and we can then discuss which one is the better way.

@antinucleon
Copy link
Contributor Author

@vchuravy I agree currently everyone is using their own way to name it which is a mass. For name kaiming comes from cxxnet and I just want to follow cxxnet way but not caffe way. I just changed the default magnitude and I suggest to use in this way first then we can discuss a better way then refactor.

@vchuravy
Copy link
Contributor

Take a look ar vchuravy/MXNet.jl@fc6b4ed and It also allows to side-step the naming issue.

antinucleon added a commit that referenced this pull request Nov 20, 2015
@antinucleon antinucleon merged commit 4aa7d3b into apache:master Nov 20, 2015
@vchuravy
Copy link
Contributor

@antinucleon Thanks for this and also thank you for the productive debate.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments