clear bottom diffs after used as temp memory in softmax/sigmoid loss …#6186
Closed
orzlibo wants to merge 1 commit intoBVLC:masterfrom
Closed
clear bottom diffs after used as temp memory in softmax/sigmoid loss …#6186orzlibo wants to merge 1 commit intoBVLC:masterfrom
orzlibo wants to merge 1 commit intoBVLC:masterfrom
Conversation
Author
|
Allocating internal buffers (as recently mentioned) or clearing bottom diffs can be regarded as a trade-off between memory usage and computation. My opinion is that neither of them are absolutely the preferable choice. |
Member
|
See #6202 for a combined fix with further comments. It includes this fix and makes the analogous fix for the |
Member
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is basically the same problem as that of the accuracy layer recently. The problem with the loss layers is when they do not propagate down (i.e. Backward_gpu will not execute) but the blob is shared with other layers, the temporary data leads to invalid gradients.
This is common when multiple losses are used and some of them have loss_weight set to 0. A simple example will show that: copy the SoftmaxWithLoss layer in the mnist example and set its loss_weight to 0, and the loss will go to inf (~87) after hundreds of iterations.
I have also read the recent discussions about accuracy layer. From my view it is reasonable to use bottom diff as a temp mem and clear it (at least recently), because the data diffs (i.e. top and bottom blobs) are meant to be ** set ** (while the parameter diffs are meant to be ** accumulated **). Some evidence might support that:
That is as far as I am concerned now. My usage of Caffe is limited to vision tasks so further discussions may be necessary for other tasks. Also I am newbee on github and I apologize for any potential inconvenience or offensive within the discussion above.