Clear Scratch Diffs to Prevent Contaminating Backward through Splits#6202
Merged
shelhamer merged 3 commits intoBVLC:masterfrom Jan 29, 2018
Merged
Clear Scratch Diffs to Prevent Contaminating Backward through Splits#6202shelhamer merged 3 commits intoBVLC:masterfrom
shelhamer merged 3 commits intoBVLC:masterfrom
Conversation
a few layers make use of otherwise unused diffs to accumulate results, but unless the diffs are cleared in forward this contaminates the gradients when these layers share a bottom and their backward is skipped.
This was referenced Jan 29, 2018
Member
|
This is a neat, systemic solution. Performance impact of I have added a reference to close #6141 which attempted to solve the |
Member
Author
|
Thanks all for your work on this and thanks again @Noiredd for the final review! |
beniz
pushed a commit
to jolibrain/caffe
that referenced
this pull request
Feb 3, 2018
Clear Scratch Diffs to Prevent Contaminating Backward through Splits
XinYao1994
pushed a commit
to XinYao1994/caffe
that referenced
this pull request
Aug 29, 2018
Clear Scratch Diffs to Prevent Contaminating Backward through Splits
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Certain layers (
SoftmaxWithLossLayer,SigmoidCrossEntropyLossLayer, andAccuracyLayer) save memory during forward by making use of bottom diffs that are otherwise unused or overwritten during backward. The trouble is that these scratch diffs can be mistakenly propagated by backward through split layers. All of the top diffs of a split are accumulated even when backward is not called for the losses (when their loss weight is zero) or accuracy (which has no backward step). This was missed at first because it requires the interaction of split layers and the backward pruning that prevents computation of unnecessary gradients.This fix zeros out the scratch diffs to prevent this kind of contamination. This requires a little computation but not much at all. This is preferable to requiring more memory for internal buffers because the further memory usage might cause an unexpected crash for a previously good configuration.
See #2895 (comment) for the first explanation of this issue.
Related:
Credits:
Accuracylayer issue in New Accuracy Layer on GPU interferes with training #5981.cherry-pick of that fix in this PR.