Use new cuDNN R2 API by slayton58 · Pull Request #1739 · BVLC/caffe

slayton58 · 2015-01-15T22:34:49Z

Modify cuDNN layers to use the R2 API for cuDNN

bhack · 2015-01-15T23:23:04Z

See #1731

shelhamer · 2015-01-15T23:48:01Z

@slayton58 thanks for the API integration! I'll cherry-pick your commit for these changes into #1731.

slayton58 · 2015-01-16T00:54:25Z

Let me know if there's more changes you need, or API questions that need clearing up

shelhamer · 2015-01-16T01:11:58Z

Please lint your branch by make lint and make any corrections -- thanks.

slayton58 · 2015-01-16T01:31:36Z

Ok, done

immars · 2015-01-21T05:00:23Z

Hi Layton,
I got a error compiling:

./include/caffe/util/cudnn.hpp:113:13: error: ‘CUDNN_POOLING_AVERAGE’ was not declared in this scope
     *mode = CUDNN_POOLING_AVERAGE;

I got cudnn-6.5-linux-x64-v2-rc2 installed ( libcudnn.so.6.5.41), with

grep CUDNN_POOLING_AVERAGE -R /usr/local/cuda/include/
/usr/local/cuda/include/cudnn.h:    CUDNN_POOLING_AVERAGE_COUNT_INCLUDE_PADDING = 1, // count for average includes padded values
/usr/local/cuda/include/cudnn.h:    CUDNN_POOLING_AVERAGE_COUNT_EXCLUDE_PADDING = 2 // count for average does not include padded values

longjon · 2015-01-21T07:37:04Z

src/caffe/layers/cudnn_conv_layer.cu

Why all the reinterpret_casts?

If we must cast, static_cast should work here, no?

Shouldn't the cast be unneeded here, just as it's not needed for the data arguments?

We always explicitly cast to (void *), and reinterpret_cast<void *> captures the inherent type-unsafe nature of that operation. Original commit used (void *) casting, (void *) -> reinterpret_cast<void *> was requested by your lint tool.

Right, C-style casting is right out; whether one should use reinterpret_cast or static_cast here is a stylistic issue that can be argued either way. What bothers me is that the alpha and beta arguments are being treated differently than the data arguments, even though (according to the header I am looking at), all of those arguments have type void* and are subject to the same danger. Is there a reason for that?

I don't see a reason that the casts have to be there, it was a stylistic decision by the author of some of our code that I followed when I wrote this patch -- I personally prefer the explicit cast, but it is extraneous

Right, I'm not asking why the casts are there, but why the coefficients are being treated differently from the data. The code suggests that the coefficients are being passed untyped, while the data is typed, so that the data is safer than the coefficients. But from the headers it seems that everything is untyped.

Ideally one could a gain little safety with (templated, perhaps) wrappers, or perhaps y'all will introduce a typed interface at some point. But, to me it seems misleading to mark some arguments as unsafe when all are unsafe. Of course, marking every argument with reinterpret_cast would be noisy, so maybe it is better to leave them out and make a note of the unsafe interface...

slayton58 · 2015-01-21T13:03:47Z

The compile issue is because the new cuDNN drop last night changes the pooling enums slightly - this pull request was originally written for the older RC

kerkilchoi · 2015-01-26T04:14:10Z

EDIT: I fixed this issue by install CUDA 7.0.

~~While building @slayton58's commits I got this error. Can someone help me how to resolve this error? Am I missing any dependencies?~~

/usr/local/cuda/include/thrust/iterator/detail/iterator_traits.inl:60:53: error: no type named 'iterator_category' in 'thrust::iterator_traits<thrust::device_ptr<void> >'
        typename thrust::iterator_traits<Iterator>::iterator_category
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
/usr/local/cuda/include/thrust/detail/device_malloc.inl:36:19: note: in instantiation of template class 'thrust::iterator_system<thrust::device_ptr<void> >' requested here
  typedef thrust::iterator_system< thrust::device_ptr<void> >::type system;
                  ^
/usr/local/cuda/include/thrust/detail/device_malloc.inl:41:35: error: no matching function for call to 'malloc'
  return thrust::device_ptr<void>(thrust::malloc(s, n).get());
                                  ^~~~~~~~~~~~~~
/usr/local/cuda/include/thrust/memory.h:304:29: note: candidate template ignored: could not match 'execution_policy_base<type-parameter-0-0>' against 'int'
pointer<void,DerivedPolicy> malloc(const thrust::detail::execution_policy_base<DerivedPolicy> &system, std::size_t n);
                            ^
/usr/local/cuda/include/thrust/memory.h:341:26: note: candidate template ignored: could not match 'execution_policy_base<type-parameter-0-1>' against 'int'
pointer<T,DerivedPolicy> malloc(const thrust::detail::execution_policy_base<DerivedPolicy> &system, std::size_t n);
                         ^
In file included from src/caffe/layers/cudnn_softmax_layer.cpp:6:
In file included from /usr/local/cuda/include/thrust/device_vector.h:25:
In file included from /usr/local/cuda/include/thrust/device_malloc_allocator.h:27:
In file included from /usr/local/cuda/include/thrust/device_malloc.h:102:
/usr/local/cuda/include/thrust/detail/device_malloc.inl:50:64: error: no type named 'type' in 'thrust::iterator_system<thrust::device_ptr<void> >'
  typedef thrust::iterator_system< thrust::device_ptr<void> >::type system;
          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^

shelhamer · 2015-02-07T17:23:40Z

Hey @slayton58 I only just now realized this is based on master. This branch should be rebased on dev and then submitted as a replacement PR since that is where new development comes in and master is far out of date. Comment if the rebase causes any trouble -- I'm happy to help.

kerkilchoi · 2015-02-10T02:27:28Z

Got a new error while building @slayton58's commit:
Can anyone help?

src/caffe/layers/cudnn_conv_layer.cu(113): error: no operator "*" matches these operands
            operand types are: * const std::__1::vector<caffe::Blob<float> *, std::__1::allocator<caffe::Blob<float> *>>
          detected during instantiation of "void caffe::CuDNNConvolutionLayer<Dtype>::Backward_gpu(const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &, const std::__1::vector<__nv_bool, std::__1::allocator<__nv_bool>> &, const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=float]" 
(142): here

src/caffe/layers/cudnn_conv_layer.cu(126): error: no operator "*" matches these operands
            operand types are: * const std::__1::vector<caffe::Blob<float> *, std::__1::allocator<caffe::Blob<float> *>>
          detected during instantiation of "void caffe::CuDNNConvolutionLayer<Dtype>::Backward_gpu(const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &, const std::__1::vector<__nv_bool, std::__1::allocator<__nv_bool>> &, const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=float]" 
(142): here

src/caffe/layers/cudnn_conv_layer.cu(113): error: no operator "*" matches these operands
            operand types are: * const std::__1::vector<caffe::Blob<double> *, std::__1::allocator<caffe::Blob<double> *>>
          detected during instantiation of "void caffe::CuDNNConvolutionLayer<Dtype>::Backward_gpu(const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &, const std::__1::vector<__nv_bool, std::__1::allocator<__nv_bool>> &, const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=double]" 
(142): here

src/caffe/layers/cudnn_conv_layer.cu(126): error: no operator "*" matches these operands
            operand types are: * const std::__1::vector<caffe::Blob<double> *, std::__1::allocator<caffe::Blob<double> *>>
          detected during instantiation of "void caffe::CuDNNConvolutionLayer<Dtype>::Backward_gpu(const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &, const std::__1::vector<__nv_bool, std::__1::allocator<__nv_bool>> &, const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=double]" 
(142): here

slayton58 · 2015-02-10T02:57:51Z

Have you pulled the latest version of the code? I’m using CUDA 6.5, gcc 4.6.3 and cuDNN v2 RC2 and it compiles without issue on my machine

On Feb 9, 2015, at 9:27 PM, Kerkil Choi notifications@github.com wrote:

Got a new error while building @slayton58 https://github.com/slayton58's commit:
Can anyone help?

src/caffe/layers/cudnn_conv_layer.cu(113): error: no operator "*" matches these operands
operand types are: * const std::__1::vectorcaffe::Blob<float *, std::__1::allocatorcaffe::Blob<float *>>
detected during instantiation of "void caffe::CuDNNConvolutionLayer::Backward_gpu(const std::__1::vectorcaffe::Blob<Dtype *, std::__1::allocatorcaffe::Blob<Dtype *>> &, const std::__1::vector<__nv_bool, std::__1::allocator<__nv_bool>> &, const std::__1::vectorcaffe::Blob<Dtype *, std::__1::allocatorcaffe::Blob<Dtype *>> &) [with Dtype=float]"
(142): here

src/caffe/layers/cudnn_conv_layer.cu(126): error: no operator "*" matches these operands
operand types are: * const std::__1::vectorcaffe::Blob<float *, std::__1::allocatorcaffe::Blob<float *>>
detected during instantiation of "void caffe::CuDNNConvolutionLayer::Backward_gpu(const std::__1::vectorcaffe::Blob<Dtype *, std::__1::allocatorcaffe::Blob<Dtype *>> &, const std::__1::vector<__nv_bool, std::__1::allocator<__nv_bool>> &, const std::__1::vectorcaffe::Blob<Dtype *, std::__1::allocatorcaffe::Blob<Dtype *>> &) [with Dtype=float]"
(142): here

src/caffe/layers/cudnn_conv_layer.cu(113): error: no operator "*" matches these operands
operand types are: * const std::__1::vectorcaffe::Blob<double *, std::__1::allocatorcaffe::Blob<double *>>
detected during instantiation of "void caffe::CuDNNConvolutionLayer::Backward_gpu(const std::__1::vectorcaffe::Blob<Dtype *, std::__1::allocatorcaffe::Blob<Dtype *>> &, const std::__1::vector<__nv_bool, std::__1::allocator<__nv_bool>> &, const std::__1::vectorcaffe::Blob<Dtype *, std::__1::allocatorcaffe::Blob<Dtype *>> &) [with Dtype=double]"
(142): here

src/caffe/layers/cudnn_conv_layer.cu(126): error: no operator "*" matches these operands
operand types are: * const std::__1::vectorcaffe::Blob<double *, std::__1::allocatorcaffe::Blob<double *>>
detected during instantiation of "void caffe::CuDNNConvolutionLayer::Backward_gpu(const std::__1::vectorcaffe::Blob<Dtype *, std::__1::allocatorcaffe::Blob<Dtype *>> &, const std::__1::vector<__nv_bool, std::__1::allocator<__nv_bool>> &, const std::__1::vectorcaffe::Blob<Dtype *, std::__1::allocatorcaffe::Blob<Dtype *>> &) [with Dtype=double]"
(142): here
—
Reply to this email directly or view it on GitHub #1739 (comment).

kerkilchoi · 2015-02-10T03:11:57Z

@slayton58 I pulled slayton58@bb00447.

I am using

CUDA 7.0
cuDNN v2 RC2
Clang:
Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin14.0.0
Thread model: posix
on Mac OSX Yosemite.

shelhamer · 2015-02-10T19:47:54Z

Replaced by #1854 to dev.

Use new cuDNN R2 API

bb00447

This was referenced Jan 16, 2015

'cudnn.h' file not found error [OS X] #1572

Closed

installing with cuDNN R2 on OS X 10.10 #1556

Closed

Fixed lint errors

cc03692

shelhamer mentioned this pull request Jan 16, 2015

cuDNN R2 is available #1622

Closed

std::reinterpret_cast -> reinterpret_cast

dbb997a

longjon reviewed Jan 21, 2015
View reviewed changes

Change in pooling enum for new cuDNN RC

a5b88ef

This was referenced Jan 23, 2015

cuDNN R2 #1731

Closed

incompatibility with cudnn v2-rc2 #1792

Closed

slayton58 closed this Feb 10, 2015

mynameis2 mentioned this pull request Mar 1, 2015

cuDNN, compile errors dmlc/cxxnet#42

Closed

Comments

Conversation

slayton58 commented Jan 15, 2015

Uh oh!

bhack commented Jan 15, 2015

Uh oh!

shelhamer commented Jan 15, 2015

Uh oh!

slayton58 commented Jan 16, 2015

Uh oh!

shelhamer commented Jan 16, 2015

Uh oh!

slayton58 commented Jan 16, 2015

Uh oh!

immars commented Jan 21, 2015

Uh oh!

longjon Jan 21, 2015

Choose a reason for hiding this comment

Uh oh!

slayton58 Jan 21, 2015

Choose a reason for hiding this comment

Uh oh!

longjon Jan 22, 2015

Choose a reason for hiding this comment

Uh oh!

slayton58 Jan 22, 2015

Choose a reason for hiding this comment

Uh oh!

longjon Jan 22, 2015

Choose a reason for hiding this comment

Uh oh!

slayton58 commented Jan 21, 2015

Uh oh!

kerkilchoi commented Jan 26, 2015

Uh oh!

shelhamer commented Feb 7, 2015

Uh oh!

kerkilchoi commented Feb 10, 2015

Uh oh!

slayton58 commented Feb 10, 2015

Uh oh!

kerkilchoi commented Feb 10, 2015

Uh oh!

shelhamer commented Feb 10, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants