Skip to content

Comments

Reshape Layers before calling Forward / Backward#35

Merged
lukeyeager merged 1 commit intoNVIDIA:masterfrom
slayton58:reshape_fix
Nov 20, 2015
Merged

Reshape Layers before calling Forward / Backward#35
lukeyeager merged 1 commit intoNVIDIA:masterfrom
slayton58:reshape_fix

Conversation

@slayton58
Copy link

Fix issue where cuDNN convolution algorithms were being chosen, then subsequent layer reshapes caused allocations that resulted in not enough memory being available to actually allocate workspace(s) in CuDNNConvolutionLayer->{Forward,Backward}_gpu()

@lukeyeager lukeyeager added the bug label Sep 17, 2015
@lukeyeager
Copy link
Member

This fixes the bug that Andrew reported. Is it safe enough to merge? I don't understand all the implications of making this change.

@borisfom
Copy link

Looks safe to me - definitely should help with memory issues. Luke, it's your call to merge.

@lukeyeager
Copy link
Member

I'll defer to @thatguymike - he said this required some more thought.

@thatguymike
Copy link

I think this is okay now.

@thatguymike
Copy link

So the question is if we need/want this path on 0.14 as well. With the CUB path I don't think we need it, although I don't think it causes an issue, but maybe still with CNMem. @borisfom and @slayton58, comments?

@borisfom
Copy link

Doesn't it recalculate actual workspace size needed for forward/backward pass ?
Then it should affect performance (or even success?) somehow, with any pool being used.
Also, with the change I merged last week (adding buffer class that retains memory locally in absence of the pool), any pool strategy should operate with similar performance.

@slayton58
Copy link
Author

I think it'll still be needed (unless @borisfom has made some changes I haven't kept up with) -- we can still have the case where we Reshape() a convolution layer during network initialization, determining that a large amount of memory is currently free, and a costly algorithm should be used. We then finish initializing the network, potentially allocating all remaining space with subsequent layers + params. Then, when we go to grab that workspace from the allocator during the forward or backward pass, there may not be enough memory left, causing the error.

@borisfom
Copy link

Agreed with Simon - dynamics may be very different and recalculation would not hurt.
In case of using pool we are actually more prone to the error - in the no-pool case memory is already retained, so if no change in calculated worksite since last call, it won't fail. Pool case will fail if it will try allocating workspace using stale workspace size info.

lukeyeager added a commit that referenced this pull request Nov 20, 2015
Reshape Layers before calling Forward / Backward
@lukeyeager lukeyeager merged commit a8cf019 into NVIDIA:master Nov 20, 2015
@lukeyeager
Copy link
Member

Merged into master and caffe-0.13.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants