Conversation
src/caffe/test/test_neuron_layer.cpp
Outdated
There was a problem hiding this comment.
I think the value of the relu_param isn't actually being used here because you define it after the layer is constructed (in the line before, 84).
|
Looks good, thanks @qipeng! See my nitpicky comments though. I just realized this can also be used as an absolute value neuron, with negative_slope = -1. |
|
@jeffdonahue Many thanks for the helpful comments! I've made changes accordingly and fixed the bug in the unit test code. |
|
Since I was the one that made @qipeng go back and merge this into the ReLU layer, I wanted to do some benchmarking before merging this, to make sure the cost to architectures using the existing ReLU layer wasn't too high. I ran (All above times are in milliseconds.) So by those numbers this has incurs a performance hit of about 0.27% for the ReLU layer. The total benchmark run time for the full ImageNet architecture was around 76530ms, so the performance hit for the full ImageNet architecture is about 0.004%. This is a pretty small cost -- do we care? (If so, I guess we'd have to re-split this into a separate LReLU layer...sorry for the inconvenience @qipeng; I can redo the splitting for you if you'd like and this turns out to be our decision.) |
|
@jeffdonahue Thanks for the tests and insightful comments!
Actually I'm not too familiar with how caffe works: does it do any type of just-in-time (JIT) compilation? I.e. does the compilation happen before or after it knew |
|
There is no JIT compilation, it's all a single thread so as you say it's not surprising that the extra multiplication and addition incurs some cost. The impact on extra time per day is a good way to look at it, but I think the percentage I'd actually look at is the total cost in ImageNet training, which is an even more negligible 0.004% or ~3 seconds per day. I agree this is quite negligible and will merge this once it's rebased unless another Caffe dev says otherwise. @qipeng, please rebase this once more and comment when done. If it passes Travis after rebase I'll merge immediately so you won't have to worry about it anymore (and if anything else gets merged before that causes more conflicts, I'll redo the rebase myself). Thanks again! |
|
Hi @jeffdonahue , I've just done rebasing and Travis CI seems passing. Let me know if any last-minute changes are needed! :) |
|
Hi @qipeng, it seems your history for this PR includes some of dev's history (probably due to recent history rewriting). Can you remove these commits from your PR? The way I do this is an interactive rebase: |
|
Hi @jeffdonahue , I was stupid to have merged the branch from a rebase. It should be fixed now. :) |
|
Great, thanks again. Merging now. |
Leaky ReLU
|
(for the record: bug fixed at #1417 ) |
Implemented the Leaky ReLU unit described in this paper
Maas, Andrew L., Awni Y. Hannun, and Andrew Y. Ng. "Rectifier nonlinearities improve neural network acoustic models." ICML Workshop on Deep Learning for Audio, Speech, and Language Processing. 2013.
which shares similar sparse activation properties with the ReLU, but was shown easier to optimize.