Fallback to different cuDNN algorithm when under memory pressure#2211
Fallback to different cuDNN algorithm when under memory pressure#2211shelhamer merged 1 commit intoBVLC:masterfrom
Conversation
CUDNN_CONVOLUTION_FWD_PREFER_FASTEST requires a lot of GPU memory, which may not always be available. Add a fallback path that uses CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM when the allocation fails.
|
I can confirm this works for models like CaffeNet where #2038 runs out of memory. |
There was a problem hiding this comment.
Will this always rule out the cuDNN GEMM convolution? At least in the Caffe GEMM convolution the workspace is the kernel dimensions (kernel_h * kernel_w * channels) * output dimensions (height_out_* width_out_) as in https://github.com/BVLC/caffe/blob/master/src/caffe/layers/base_conv_layer.cpp#L147, although in the cuDNN implementation I suppose the workspace could be just the input data so the +1 allows it here.
There was a problem hiding this comment.
Yes, my understanding is that the workspace size depends only on the input data size.
The intent is that GEMM convolution will still be chosen if possible. However, in practice, we expect that there won't be enough memory available in many use cases.
|
Thanks for the fix @nsubtil! Please address my inline comments for merge. |
|
It sounds like no further code changes are actually required here. If you're satisfied with my replies, I think this should be ready to merge. Thanks! |
Fallback to different cuDNN algorithm when under memory pressure; fix #2197
CUDNN_CONVOLUTION_FWD_PREFER_FASTEST requires a lot of GPU memory, which may
not always be available. Add a fallback path that uses
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM when the allocation fails.