Skip to content

Sparse Linear now does sparse updates from the last input#725

Merged
soumith merged 1 commit intotorch:masterfrom
ebetica:sparselinear_fix
Mar 22, 2016
Merged

Sparse Linear now does sparse updates from the last input#725
soumith merged 1 commit intotorch:masterfrom
ebetica:sparselinear_fix

Conversation

@ebetica
Copy link
Copy Markdown
Contributor

@ebetica ebetica commented Mar 20, 2016

See dicussion @ #698

Speeds up updateGradParameters and zeroGradParameters when one forward/backward update has been run by keeping track of the immediately previous input.

The follow snippet

local nn = require('nn')
local sys = require('sys')

local model = nn.SparseLinear(65536, 256)

local input = torch.rand(5, 2)
input:select(2, 1):mul(65536):ceil()

sys.tic()
local output = model:forward(input)
sys.toc(true)

local gradOutput = torch.rand(output:size())

sys.tic()
local gradInput = model:backward(input, gradOutput)
sys.toc(true)

sys.tic()
model:updateParameters(1e-3)
sys.toc(true)

sys.tic()
model:zeroGradParameters()
sys.toc(true)

gives us

0.00010514259338379 
6.2942504882812e-05 
5.6028366088867e-05 
4.6968460083008e-05

@ebetica
Copy link
Copy Markdown
Contributor Author

ebetica commented Mar 20, 2016

@zhangxiangxiao

@zhangxiangxiao
Copy link
Copy Markdown
Contributor

This is super awesome! Thanks @ebetica !

Comment thread SparseLinear.lua

function SparseLinear:updateOutput(input)
if self.sparseUpdate == ONE_LAST_INPUT then
self.sparseUpdate = ACC_MULTIPLE_TIMES
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wrong. afaik this has to be self.sparseUpdate = NO_LAST_INPUT
The rewritten logic in this file actually never goes back to self.sparseUpdate being reset to self.sparseUpdate = NO_LAST_INPUT as lines 195 and 208 have been removed.
Practically, this PR will enable sparse optimizations only for the first mini-batch, and take the dense path afterwards because it is always be in ACC_MULTIPLE_TIMES mode.
However, I suspect that I misunderstand this PR, can you explain the state transitions and their expected behavior?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is the intended behavior. After enough non zero elements have been
passed through it becomes more efficient to simply update every parameter
instead of finding the unique nonzeroes first. A smarter method would
probably be to do this transition after accumulating too many non zero
elements, but to simplify it, my design decision was to only do sparse
updates after one minibatch.

On Sun, Mar 20, 2016, 14:01 Soumith Chintala notifications@github.com
wrote:

In SparseLinear.lua
#725 (comment):

@@ -51,6 +51,9 @@ function SparseLinear:reshapeInput(input)
end

function SparseLinear:updateOutput(input)

  • if self.sparseUpdate == ONE_LAST_INPUT then
  •  self.sparseUpdate = ACC_MULTIPLE_TIMES
    

this is wrong. afaik this has to be self.sparseUpdate = NO_LAST_INPUT
The rewritten logic in this file actually never goes back to
self.sparseUpdate being reset to self.sparseUpdate = NO_LAST_INPUT as lines
195 and 208 have been removed.
Practically, this PR will enable sparse optimizations only for the first
mini-batch, and take the dense path afterwards because it is always be in
ACC_MULTIPLE_TIMES mode.
However, I suspect that I misunderstand this PR, can you explain the state
transitions and their expected behavior?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/torch/nn/pull/725/files/1156255899c621e322a1f7991a817d4139af754d#r56768682

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, if I understand, the intention is to only do sparse updates if you have a pattern of: forward + backward + update, forward + backward + update, ...

However, after processing two mini-batches of FW + BW + UP, FW + BW + UP, the third mini-batch no longer sees sparse updates, as you never reset the state to NO_LAST_INPUT (because lines 195, 208)

@ebetica
Copy link
Copy Markdown
Contributor Author

ebetica commented Mar 21, 2016

Patched with the bugfix. State is reset on zeroGradParameters.

soumith added a commit that referenced this pull request Mar 22, 2016
Sparse Linear now does sparse updates from the last input
@soumith soumith merged commit 32318fb into torch:master Mar 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants