Conversation
|
@kouyoumin Have you test the forward time? |
|
I have restarted the tests that failed prior to #5973, it passes now. However I'm not sure if Travis is even able to test cuDNN 7 right now. Could you take a look at #5972, where the dependencies script was also modified, and compare? Also, this needs a squash before merging, but I think that for a review it's good as is. |
|
not fast enough. Anyone have tested this on MobileNet? |
|
@kouyoumin Please note before rebasing that #5972 modified the |
Noiredd
left a comment
There was a problem hiding this comment.
@kouyoumin I made a more thorough testing of this PR: from what I see (both on some synthetic benchmark nets like a 1x256x256x256 cube convolved with 5x5 filters in a group of 256, as well as more realistic nets), this does not seem to make it faster (on my old GTX TITAN Z), but the RAM saving is quite significant (from 4% to almost 60% - the larger group the better). Please take a look at the comments I left in the code review, squash and rebase and I will merge this.
| cudnn::dataType<Dtype>::one, | ||
| bias_desc_, bias_diff + bias_offset_ * g)); | ||
| #else | ||
| CUDNN_CHECK(cudnnConvolutionBackwardBias(handle_[g], |
There was a problem hiding this comment.
Shouldn't we leave CUDNN_CHECK(cudnnConvolutionBackwardBias(handle_[0*this->group_ + g], here?
| case CUDNN_STATUS_RUNTIME_IN_PROGRESS: | ||
| return "CUDNN_STATUS_RUNTIME_IN_PROGRESS"; | ||
| case CUDNN_STATUS_RUNTIME_FP_OVERFLOW: | ||
| return "CUDNN_STATUS_RUNTIME_FP_OVERFLOW"; |
There was a problem hiding this comment.
Adding this fragement is no longer necessary (since #5972), please remove this while rebasing.
Uses CuDNN 7's new grouped convolution instead of for loop.