Closed
Conversation
gpu_diff of bottom[0] used as temporary buffer needs to be set back to zeros. Otherwise the AccurarcyLayer may propagate back wrong gradient values in training.
Member
|
This is not a good fix: if the prediction layer is shared with other layers (e.g., "SoftmaxWithLoss") resetting Please see PR #5987 for a fix for this issue. |
Author
|
I see your point, though I think split layers are automatically added to avoid such cases. A better fix is to use bottom[0]->diff().use_count() to check whether the diff memory is shared with other layers. If so, initialize and use an internal buffer; otherwise reuse bottom[0]->diff() and reset it to zeros after usage. |
Member
|
@ryanflower this is an interesting point you are making. Can you please move this discussion to the open PR #5987? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
gpu_diff of bottom[0] used as temporary buffer needs to be set back to
zeros. Otherwise the AccurarcyLayer may propagate back wrong gradient
values in training.