Conversation
… use with ReLUs instead of tanh. Based on paper: He et al, "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification," 2015. Added VarianceNorm option to FillerParameters which allows one to normalize by fan_in, fan_out or their average. Updated XavierFiller to use the VarianceNorm option (default behavior unchanged). Added tests for MSRAFiller and XavierFiller.
|
Note that #1970 should fix the |
There was a problem hiding this comment.
#1970 is in so this filler is now right for InnerProduct layers too.
|
@nickcarlevaris thanks -- this looks good. The only potential issue is naming and attribution. I am not certain but if I understand correctly the same @nickcarlevaris you suggested "ReLU" since this is intended for use with the so-named nonlinearity. It could be this is the right choice. @longjon ? |
|
#1940 has been merged for a month. Can these two work together to reproduce the paper's results? |
|
This issue has been open for a long time. Hope it merged quickly. |
|
Why hasn't this been merged into master? anything wrong? |
Add MSRAFiller, an Xavier-like filler designed for use with ReLUs
|
Merged to master in c255709. Thanks @nickcarlevaris! I did a manual merge to re-format the commit message and add my own commit to note potentially related work. Closing since my edit threw off the github merge. |
|
Why there is no parameter to specify the \alpha defined in Equation 15? |
This PR adds MSRAFiller, which implements an Xavier-like filler designed for use with ReLUs instead of tanh, based on the paper: He et al, "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification," 2015.
It also adds a VarianceNorm option to FillerParameters which allows one to normalize by fan_in, fan_out or their average. VarianceNorm applies to the MSRAFiller and the XavierFiller (default behavior unchanged). It also adds tests for MSRAFiller and XavierFiller.
Replaces #1883 (updates based on that discussion and rebased against master).
Like the XavierFiller, the fan_in and fan_out dimensions are not correct for inner product layers (as pointed out by @seanbell in #1883). However, I did update the documentation to note this.