clear bottom diffs after used as temp memory in softmax/sigmoid loss … by orzlibo · Pull Request #6186 · BVLC/caffe

orzlibo · 2018-01-18T20:08:57Z

This is basically the same problem as that of the accuracy layer recently. The problem with the loss layers is when they do not propagate down (i.e. Backward_gpu will not execute) but the blob is shared with other layers, the temporary data leads to invalid gradients.

This is common when multiple losses are used and some of them have loss_weight set to 0. A simple example will show that: copy the SoftmaxWithLoss layer in the mnist example and set its loss_weight to 0, and the loss will go to inf (~87) after hundreds of iterations.

I have also read the recent discussions about accuracy layer. From my view it is reasonable to use bottom diff as a temp mem and clear it (at least recently), because the data diffs (i.e. top and bottom blobs) are meant to be ** set ** (while the parameter diffs are meant to be ** accumulated **). Some evidence might support that:

Current implementation of some widely used layers set the bottom diff and accumulate the param diff. InnerProduct layer will be an example; current solvers clear param diff after iter_size * ForwardBackwards, but do not clear those data blobs.
Caffe use (automatically inserted) split layers to deal with the bottom sharing problem. This can be seen from the training log. As for the implementation, in the split layers, multiple tops share data with a bottom, but their diffs are not shared. The diffs are accumulated by Backward of the split layer. The problem also comes with this implementation, because in the current framework, a layer does not conveniently know which ones of its tops need back-prop (only propagate_down is given in LayerParameter). It has no option but to regard all its top blobs as the same, thus accumulating the wrong gradients from the tops on which Backward() is not executed.
This design (bottoms and tops to be set and params to be accumulated) is advantageous in some ways. For example, consider the implementation of iter_size. If we accumulate diff for shared bottoms, they still have to be cleared between iters. Compared to params, bottom and top blobs are much larger (e.g. feature maps vs params in CNNs). It is not so obvious that this will be faster than the current split layer design.

That is as far as I am concerned now. My usage of Caffe is limited to vision tasks so further discussions may be necessary for other tasks. Also I am newbee on github and I apologize for any potential inconvenience or offensive within the discussion above.

…layers

orzlibo · 2018-01-18T20:16:24Z

Allocating internal buffers (as recently mentioned) or clearing bottom diffs can be regarded as a trade-off between memory usage and computation. My opinion is that neither of them are absolutely the preferable choice.

shelhamer · 2018-01-29T01:31:33Z

See #6202 for a combined fix with further comments. It includes this fix and makes the analogous fix for the Accuracy layer.

shelhamer · 2018-01-29T19:34:28Z

@orzlibo Thanks for your thoroughly explained and straightforward pull request. Your interpretation and fix were right so I included them in a slightly more systematic PR #6202. That has just been merged so I'm closing this.

clear bottom diffs after used as temp memory in softmax/sigmoid loss …

81335f9

…layers

shelhamer added the focus label Jan 19, 2018

shelhamer mentioned this pull request Jan 29, 2018

Clear Scratch Diffs to Prevent Contaminating Backward through Splits #6202

Merged

shelhamer closed this Jan 29, 2018

shelhamer mentioned this pull request Jan 29, 2018

When dose caffe clear layers' bottom/top's diff? #3816

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

clear bottom diffs after used as temp memory in softmax/sigmoid loss …#6186

clear bottom diffs after used as temp memory in softmax/sigmoid loss …#6186
orzlibo wants to merge 1 commit intoBVLC:masterfrom
orzlibo:master

orzlibo commented Jan 18, 2018

Uh oh!

orzlibo commented Jan 18, 2018

Uh oh!

shelhamer commented Jan 29, 2018

Uh oh!

shelhamer commented Jan 29, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

orzlibo commented Jan 18, 2018

Uh oh!

orzlibo commented Jan 18, 2018

Uh oh!

shelhamer commented Jan 29, 2018

Uh oh!

shelhamer commented Jan 29, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants