Conversation
|
Travis seems to dislike cudaMalloc when built WITH_CUDA=true ("CUDA driver version is insufficient for CUDA runtime version"). @shelhamer any ideas on how to work around this? |
|
Currently this PR doesn't play too nicely with parallelism. It will just throw an exception if two threads are trying to access the temporary memory concurrently. |
|
Hi philkr, |
|
@nian-liu this seems a bit odd, as cudnn doesn't seem to be using col_buffer_. What version of cuDNN are you using? |
|
@philkr I am using cuDNN v1. |
Saving memory by reusing col_buffer_.
Saving memory by reusing col_buffer_.
Saving memory by reusing col_buffer_.
|
Closing since master has drifted and this was always potentially a bit dangerous. Thanks for the bespoke conv memory pool all the same Philipp! |
For fully convolutional nets the
col_buffer_in theBaseColvolutionLayercan get quite large (1GB or more in total). Currently this temporary buffer is allocated for each convolution separately. This PR introduces aTemporaryMemoryclass which allows all temporary memory to be shared between convolutional layers. This could be used for other layers that use temporary memory, but I couldn't find any other memory hungry ones...This saves up to 1GB of memory for fully convolutional VGG models.