GPU version of SoftmaxWithLossLayer by SaganBolliger · Pull Request #1789 · BVLC/caffe

SaganBolliger · 2015-01-23T07:58:19Z

Just thought I'd contribute something to what seems like a great community. The new forward and backward functions pass the existing test cases for SoftmaxWithLossLayer.

I'm not entirely sure about these lines in which I call cpu_data() on a blob of count() 1 so that I can then call caffe_gpu_scal with the retrieved value as the scalar. I'm not proficient in profiling CUDA code and I'm not sure if this is a bottleneck and a solution with no device to host data transfer at all is worth writing, or whether transferring a single scalar would incur a performance hit so minor that this is not worth optimizing.

Cheers!

shelhamer · 2015-01-23T20:52:07Z

Hey @SaganBolliger, thanks for addressing this longtime TODO with your first contribution. Check a few suggestions from http://caffe.berkeleyvision.org/development.html and in particular do

make lint

and fix any complaints it makes. Once that's done, please squash these changes into a single commit and push to your branch for us to review.

SaganBolliger · 2015-01-23T21:40:21Z

You're most welcome. I've fixed the lint issues and squashed the commits. Let me know if you have any further feedback.

shelhamer · 2015-01-25T04:43:42Z

@longjon this should be coordinated with #1654. Should this be merged with #1654 to follow once updated with GPU code? It's up to you.

SaganBolliger · 2015-01-25T20:32:33Z

Once there are tests written for #1654, I can update the GPU code to pass the tests.

SaganBolliger · 2015-01-27T02:04:16Z

Just an FYI, I've been using the code I submitted in this PR in my own modelling experiments and I'm running into some issues that are not being caught by the test cases. I'm trying to debug it and write some additional test cases to catch the issue. I'll update here when I've fixed the issue, so please don't merge until then. Thanks!

longjon · 2015-01-27T18:19:44Z

This looks pretty nice. Since @SaganBolliger has volunteered to update for #1654, that plan sounds good to me. Just two comments from a quick glance:

The kernel names don't seem quite right; SoftmaxForwardGPU actually does (part of) the multinomial logistic forward, while softmax layer is handling the softmax forward itself, and SoftmaxBackwardGPU is doing (part of) the softmax loss backward... it would be better to name the kernels according to what they actually compute, IMO.
Have you done a simple timing test, CPU vs GPU? I'm sure this is faster but it would nice to have a ballpark number.

SaganBolliger · 2015-02-06T21:19:53Z

I've updated the code to reflect the changes in #1654. It passes the new test cases that were added there.

@shelhamer Looks like the issues I was running into aren't related to this code, so I'd say this is safe to merge.

@longjon I've renamed the kernels SoftmaxLoss{Backward,Forward}GPU to avoid confusion with the SoftMaxLayer itself. It's maybe a bit tricky to find a concise name for the exact computation in the kernels, though I'm open to suggestions.

I'm currently working on an older macbook with a slow GPU, so benchmarking the CPU vs GPU versions produces comparable results. Though this is par for the course with this GPU as most networks take slightly longer to train on the GPU than on the CPU. If anyone else with access to better hardware would be willing to benchmark it, that would be great. Otherwise, I'm planning on setting up an AWS instance in the next week for some more serious computation and could run a benchmark there.

shelhamer · 2015-02-07T03:47:05Z

I benchmarked a 1000 x 1000 x 1 x 1 softmax loss by 100 iterations on CPU, dev with the host-device roundtrip, and this PR which does give a speedup. The improvement will scale as the output grows, as it does for sliding window detection or semantic segmentation, since it saves the roundtrip time of sending the output map back to the host.

CPU mode:

loss    forward: 42.2155 ms.
loss    backward: 0.00059 ms.

dev GPU mode (device-host communication):

loss    forward: 752.946 milliseconds.
loss    backward: 0.001216 milliseconds.

This PR:

loss    forward: 3.5701 ms.
loss    backward: 0.00121536 ms.

shelhamer · 2015-02-07T04:38:49Z

Thanks @SaganBolliger (and don't worry about the scalar for the loss weight)!

GPU version of SoftmaxWithLossLayer

longjon · 2015-02-07T05:31:55Z

Just a couple post-mortem notes on an excellent PR:

The new kernel names are fine, definitely less confusing.
Using the diff for storage in forward is something we've discussed before. In general we'd prefer to avoid it, and might outlaw it in the future, but right now that is a technique we do use sometimes (for example in hinge loss layer), so I'm okay with it here, well noted.

shelhamer added the speed-up label Jan 23, 2015

SaganBolliger force-pushed the softmax_loss_gpu branch from 9545437 to d770cda Compare January 23, 2015 21:27

shelhamer added the ready for review label Jan 23, 2015

shelhamer added in progress and removed ready for review labels Jan 27, 2015

Added GPU implementation of SoftmaxWithLossLayer.

4c894d0

SaganBolliger force-pushed the softmax_loss_gpu branch from d770cda to 4c894d0 Compare February 6, 2015 21:02

shelhamer added ready for review and removed in progress labels Feb 6, 2015

shelhamer added a commit that referenced this pull request Feb 7, 2015

Merge pull request #1789 from SaganBolliger/softmax_loss_gpu

4905366

GPU version of SoftmaxWithLossLayer

shelhamer merged commit 4905366 into BVLC:dev Feb 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU version of SoftmaxWithLossLayer#1789

GPU version of SoftmaxWithLossLayer#1789
shelhamer merged 1 commit intoBVLC:devfrom
SaganBolliger:softmax_loss_gpu

SaganBolliger commented Jan 23, 2015

Uh oh!

shelhamer commented Jan 23, 2015

Uh oh!

SaganBolliger commented Jan 23, 2015

Uh oh!

shelhamer commented Jan 25, 2015

Uh oh!

SaganBolliger commented Jan 25, 2015

Uh oh!

SaganBolliger commented Jan 27, 2015

Uh oh!

longjon commented Jan 27, 2015

Uh oh!

SaganBolliger commented Feb 6, 2015

Uh oh!

shelhamer commented Feb 7, 2015

Uh oh!

shelhamer commented Feb 7, 2015

Uh oh!

longjon commented Feb 7, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

SaganBolliger commented Jan 23, 2015

Uh oh!

shelhamer commented Jan 23, 2015

Uh oh!

SaganBolliger commented Jan 23, 2015

Uh oh!

shelhamer commented Jan 25, 2015

Uh oh!

SaganBolliger commented Jan 25, 2015

Uh oh!

SaganBolliger commented Jan 27, 2015

Uh oh!

longjon commented Jan 27, 2015

Uh oh!

SaganBolliger commented Feb 6, 2015

Uh oh!

shelhamer commented Feb 7, 2015

Uh oh!

shelhamer commented Feb 7, 2015

Uh oh!

longjon commented Feb 7, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments