Skip to content

Comments

Rework Xavier to be more flexibility#32

Merged
pluskid merged 2 commits intodmlc:masterfrom
vchuravy:vc/xavier
Nov 20, 2015
Merged

Rework Xavier to be more flexibility#32
pluskid merged 2 commits intodmlc:masterfrom
vchuravy:vc/xavier

Conversation

@vchuravy
Copy link
Collaborator

Following the discussion in apache/mxnet#610 I took another swing at Xavier.

The idea is that the concepts proposed in the papers [1, 2, 3] can be generalized in choosing the regularization factor 1/fan_in 1/fan_out 2/(fan_out + fan_in), the distribution to sample from and a magnitude scaling factor. [1] proposes 3/fan_in and 6/(fan_out+fan_in) and [2,3] propose 2/fan_in.

[1] X. Bengio and Y. Glorot (2010) http://jmlr.csail.mit.edu/proceedings/papers/v9/glorot10a.html
[2] K. He, X. Zhang, S. Ren, and J. Sun (2015) http://arxiv.org/abs/1502.01852
[3] A. M. Saxe, J. L. McClelland, and S. Ganguli (2013/2014) http://arxiv.org/abs/1312.6120v3

@vchuravy
Copy link
Collaborator Author

Another point we should discuss is the calculation of fan_out.

Currently we have:

fan_in  = prod(dims[2:end])
fan_out = dims[1]

But following [1] and [4] `input blob has shape (num, a, b, c) where a * b * c = fan_in and num * b * c = fan_out.

We maybe should have (and if somebody could double check my logic :) )

fan_in = prod(dims[2:end])
fan_out = prod(dims[1:end]) / dims[2]

[4] https://github.com/BVLC/caffe/blob/603cbfb97767d1b9ebf102200646f5df237d1749/include/caffe/filler.hpp#L150-L151

@pluskid
Copy link
Member

pluskid commented Nov 19, 2015

The caffe code you cited looks definite weird to me. The fan out as a most intuitive interpretation should be the number of output units, which for the convolution filters, is only the number of output filters. I'm not sure why caffe choose to include the kernel size in this calculation. If consistency is the goal, I guess should really check what the cited papers say.

Also things are quite different when it comes to FullyConnected layer, the weights should be a matrix (instead of 4D tensor), and the fan-in fan-out calculation should handle this gracefully.

@vchuravy
Copy link
Collaborator Author

Yeah I am unsure about 710dd01. In [1] fan_out is the size of the next layer and [2] only uses fan_in

@vchuravy
Copy link
Collaborator Author

So for me this would be ready.

@codecov-io
Copy link

Current coverage is 76.39%

Merging #32 into master will increase coverage by +0.26% as of 813436d

@@            master     #32   diff @@
======================================
  Files           20      20       
  Stmts         1454    1449     -5
  Branches         0       0       
  Methods          0       0       
======================================
  Hit           1107    1107       
  Partial          0       0       
+ Missed         347     342     -5

Review entire Coverage Diff as of 813436d

Powered by Codecov. Updated on successful CI builds.

pluskid added a commit that referenced this pull request Nov 20, 2015
Rework Xavier to be more flexibility
@pluskid pluskid merged commit ea85774 into dmlc:master Nov 20, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants