Conversation
There was a problem hiding this comment.
This kernel is the same as caffe_gpu_exp isn't it? Let's remove it and replace with caffe_gpu_exp, unless I'm misunderstanding somehow. (I know it wasn't added this PR, but I just noticed it from seeing the diff.)
There was a problem hiding this comment.
I can't find caffe_gpu_exp. I only found caffe_exp, which calls vsExp in MKL.
There was a problem hiding this comment.
whoops, my bad, I think I was thinking of caffe_gpu_powx. caffe_gpu_exp should probably exist but device abstraction (#610) will probably take care of this so never mind, sorry!
|
assigning to @longjon, go ahead and merge when you're happy with everything |
|
@shelhamer suggested offline adding a switch to provide the original "normalize over everything" mode. So, @shelhamer, if you still want to do that, you can append to or rewrite this PR. @shelhamer and others, which mode do we think should be the default? It seems like the channel normalization is usually what is desired, and I doubt anyone is relying on the current behavior, although it is a little jarring to change what layers do. If we do want the default to be the channel normalization, we could go ahead and merge this, and add a switch in a later PR. |
src/caffe/layers/softmax_layer.cpp
Outdated
There was a problem hiding this comment.
Also note that the updated SoftmaxLayer CPU no longer allows in-place computations, since in CPU implementation bottom diff is first changed and then restored
caffe_mul(top[0]->count(), bottom_diff, top_data, bottom_diff);
while GPU implementation still allows in-place computations.
@jeffdonahue @shelhamer should we allow in-place computations in SoftmaxLayer?
There was a problem hiding this comment.
Yeah, good catch. I did this to avoid an extra loop, but I've added it back now to allow in-place computation. There should be no performance regression in the 1x1 case, and probably not a noticeable one in the general case, and anyway the GPU implementation is available.
In order to do this, I had to add functions to math_functions for strided dot products (which of course cblas already supports, but we didn't previously have an interface for.)
|
@longjon merge as you please, as the switch can follow. I agree channel is On Saturday, August 16, 2014, longjon notifications@github.com wrote:
|
This provides a more direct interface to the cblas_?dot functions. This is useful, for example, for taking dot products across channels.
|
@ronghanghu I amended your commit with some aesthetic changes (make all the channel kernels have the form |
|
Fixed order of specialization and instantiation for clang++ build in ac64a7b. You can't call |
Softmax works across channels
Softmax works across channels
In this pull request, the behavior of SoftmaxLayer is changed from softmax over
channels*height*widthelements (all elements within a num) to softmax overchannelselements (all elements at a spatial position within a num). This is for the purpose of running fully-connected layers as convolutions (see Net Surgery: http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb). It won't damage existing caffe examples, since fully-connected layer top blob haswidth==1andheight==1.The CPU version was implemented by @longjon, and I implemented the GPU version, including GPU backward.