Sparse Linear now does sparse updates from the last input#725
Sparse Linear now does sparse updates from the last input#725soumith merged 1 commit intotorch:masterfrom
Conversation
63fcaa6 to
1156255
Compare
|
This is super awesome! Thanks @ebetica ! |
|
|
||
| function SparseLinear:updateOutput(input) | ||
| if self.sparseUpdate == ONE_LAST_INPUT then | ||
| self.sparseUpdate = ACC_MULTIPLE_TIMES |
There was a problem hiding this comment.
this is wrong. afaik this has to be self.sparseUpdate = NO_LAST_INPUT
The rewritten logic in this file actually never goes back to self.sparseUpdate being reset to self.sparseUpdate = NO_LAST_INPUT as lines 195 and 208 have been removed.
Practically, this PR will enable sparse optimizations only for the first mini-batch, and take the dense path afterwards because it is always be in ACC_MULTIPLE_TIMES mode.
However, I suspect that I misunderstand this PR, can you explain the state transitions and their expected behavior?
There was a problem hiding this comment.
Yes that is the intended behavior. After enough non zero elements have been
passed through it becomes more efficient to simply update every parameter
instead of finding the unique nonzeroes first. A smarter method would
probably be to do this transition after accumulating too many non zero
elements, but to simplify it, my design decision was to only do sparse
updates after one minibatch.
On Sun, Mar 20, 2016, 14:01 Soumith Chintala notifications@github.com
wrote:
In SparseLinear.lua
#725 (comment):@@ -51,6 +51,9 @@ function SparseLinear:reshapeInput(input)
endfunction SparseLinear:updateOutput(input)
- if self.sparseUpdate == ONE_LAST_INPUT then
self.sparseUpdate = ACC_MULTIPLE_TIMESthis is wrong. afaik this has to be self.sparseUpdate = NO_LAST_INPUT
The rewritten logic in this file actually never goes back to
self.sparseUpdate being reset to self.sparseUpdate = NO_LAST_INPUT as lines
195 and 208 have been removed.
Practically, this PR will enable sparse optimizations only for the first
mini-batch, and take the dense path afterwards because it is always be in
ACC_MULTIPLE_TIMES mode.
However, I suspect that I misunderstand this PR, can you explain the state
transitions and their expected behavior?—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/torch/nn/pull/725/files/1156255899c621e322a1f7991a817d4139af754d#r56768682
There was a problem hiding this comment.
yes, if I understand, the intention is to only do sparse updates if you have a pattern of: forward + backward + update, forward + backward + update, ...
However, after processing two mini-batches of FW + BW + UP, FW + BW + UP, the third mini-batch no longer sees sparse updates, as you never reset the state to NO_LAST_INPUT (because lines 195, 208)
b2c927b to
617aa0a
Compare
|
Patched with the bugfix. State is reset on zeroGradParameters. |
Sparse Linear now does sparse updates from the last input
See dicussion @ #698
Speeds up updateGradParameters and zeroGradParameters when one forward/backward update has been run by keeping track of the immediately previous input.
The follow snippet
gives us