Skip to content

Comments

Use new cuDNN R2 API#1739

Closed
slayton58 wants to merge 4 commits intoBVLC:masterfrom
slayton58:master
Closed

Use new cuDNN R2 API#1739
slayton58 wants to merge 4 commits intoBVLC:masterfrom
slayton58:master

Conversation

@slayton58
Copy link
Contributor

Modify cuDNN layers to use the R2 API for cuDNN

@bhack
Copy link
Contributor

bhack commented Jan 15, 2015

See #1731

@shelhamer
Copy link
Member

@slayton58 thanks for the API integration! I'll cherry-pick your commit for these changes into #1731.

@slayton58
Copy link
Contributor Author

Let me know if there's more changes you need, or API questions that need clearing up

@shelhamer
Copy link
Member

Please lint your branch by make lint and make any corrections -- thanks.

@slayton58
Copy link
Contributor Author

Ok, done

@shelhamer shelhamer mentioned this pull request Jan 16, 2015
@immars
Copy link

immars commented Jan 21, 2015

Hi Layton,
I got a error compiling:

./include/caffe/util/cudnn.hpp:113:13: error: ‘CUDNN_POOLING_AVERAGE’ was not declared in this scope
     *mode = CUDNN_POOLING_AVERAGE;

I got cudnn-6.5-linux-x64-v2-rc2 installed ( libcudnn.so.6.5.41), with

grep CUDNN_POOLING_AVERAGE -R /usr/local/cuda/include/
/usr/local/cuda/include/cudnn.h:    CUDNN_POOLING_AVERAGE_COUNT_INCLUDE_PADDING = 1, // count for average includes padded values
/usr/local/cuda/include/cudnn.h:    CUDNN_POOLING_AVERAGE_COUNT_EXCLUDE_PADDING = 2 // count for average does not include padded values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why all the reinterpret_casts?

  • If we must cast, static_cast should work here, no?
  • Shouldn't the cast be unneeded here, just as it's not needed for the data arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always explicitly cast to (void *), and reinterpret_cast<void *> captures the inherent type-unsafe nature of that operation. Original commit used (void *) casting, (void *) -> reinterpret_cast<void *> was requested by your lint tool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, C-style casting is right out; whether one should use reinterpret_cast or static_cast here is a stylistic issue that can be argued either way. What bothers me is that the alpha and beta arguments are being treated differently than the data arguments, even though (according to the header I am looking at), all of those arguments have type void* and are subject to the same danger. Is there a reason for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason that the casts have to be there, it was a stylistic decision by the author of some of our code that I followed when I wrote this patch -- I personally prefer the explicit cast, but it is extraneous

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'm not asking why the casts are there, but why the coefficients are being treated differently from the data. The code suggests that the coefficients are being passed untyped, while the data is typed, so that the data is safer than the coefficients. But from the headers it seems that everything is untyped.

Ideally one could a gain little safety with (templated, perhaps) wrappers, or perhaps y'all will introduce a typed interface at some point. But, to me it seems misleading to mark some arguments as unsafe when all are unsafe. Of course, marking every argument with reinterpret_cast would be noisy, so maybe it is better to leave them out and make a note of the unsafe interface...

@slayton58
Copy link
Contributor Author

The compile issue is because the new cuDNN drop last night changes the pooling enums slightly - this pull request was originally written for the older RC

This was referenced Jan 23, 2015
@kerkilchoi
Copy link

EDIT: I fixed this issue by install CUDA 7.0.

While building @slayton58's commits I got this error. Can someone help me how to resolve this error? Am I missing any dependencies?

/usr/local/cuda/include/thrust/iterator/detail/iterator_traits.inl:60:53: error: no type named 'iterator_category' in 'thrust::iterator_traits<thrust::device_ptr<void> >'
        typename thrust::iterator_traits<Iterator>::iterator_category
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
/usr/local/cuda/include/thrust/detail/device_malloc.inl:36:19: note: in instantiation of template class 'thrust::iterator_system<thrust::device_ptr<void> >' requested here
  typedef thrust::iterator_system< thrust::device_ptr<void> >::type system;
                  ^
/usr/local/cuda/include/thrust/detail/device_malloc.inl:41:35: error: no matching function for call to 'malloc'
  return thrust::device_ptr<void>(thrust::malloc(s, n).get());
                                  ^~~~~~~~~~~~~~
/usr/local/cuda/include/thrust/memory.h:304:29: note: candidate template ignored: could not match 'execution_policy_base<type-parameter-0-0>' against 'int'
pointer<void,DerivedPolicy> malloc(const thrust::detail::execution_policy_base<DerivedPolicy> &system, std::size_t n);
                            ^
/usr/local/cuda/include/thrust/memory.h:341:26: note: candidate template ignored: could not match 'execution_policy_base<type-parameter-0-1>' against 'int'
pointer<T,DerivedPolicy> malloc(const thrust::detail::execution_policy_base<DerivedPolicy> &system, std::size_t n);
                         ^
In file included from src/caffe/layers/cudnn_softmax_layer.cpp:6:
In file included from /usr/local/cuda/include/thrust/device_vector.h:25:
In file included from /usr/local/cuda/include/thrust/device_malloc_allocator.h:27:
In file included from /usr/local/cuda/include/thrust/device_malloc.h:102:
/usr/local/cuda/include/thrust/detail/device_malloc.inl:50:64: error: no type named 'type' in 'thrust::iterator_system<thrust::device_ptr<void> >'
  typedef thrust::iterator_system< thrust::device_ptr<void> >::type system;
          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^

@shelhamer
Copy link
Member

Hey @slayton58 I only just now realized this is based on master. This branch should be rebased on dev and then submitted as a replacement PR since that is where new development comes in and master is far out of date. Comment if the rebase causes any trouble -- I'm happy to help.

@kerkilchoi
Copy link

Got a new error while building @slayton58's commit:
Can anyone help?

src/caffe/layers/cudnn_conv_layer.cu(113): error: no operator "*" matches these operands
            operand types are: * const std::__1::vector<caffe::Blob<float> *, std::__1::allocator<caffe::Blob<float> *>>
          detected during instantiation of "void caffe::CuDNNConvolutionLayer<Dtype>::Backward_gpu(const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &, const std::__1::vector<__nv_bool, std::__1::allocator<__nv_bool>> &, const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=float]" 
(142): here

src/caffe/layers/cudnn_conv_layer.cu(126): error: no operator "*" matches these operands
            operand types are: * const std::__1::vector<caffe::Blob<float> *, std::__1::allocator<caffe::Blob<float> *>>
          detected during instantiation of "void caffe::CuDNNConvolutionLayer<Dtype>::Backward_gpu(const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &, const std::__1::vector<__nv_bool, std::__1::allocator<__nv_bool>> &, const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=float]" 
(142): here

src/caffe/layers/cudnn_conv_layer.cu(113): error: no operator "*" matches these operands
            operand types are: * const std::__1::vector<caffe::Blob<double> *, std::__1::allocator<caffe::Blob<double> *>>
          detected during instantiation of "void caffe::CuDNNConvolutionLayer<Dtype>::Backward_gpu(const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &, const std::__1::vector<__nv_bool, std::__1::allocator<__nv_bool>> &, const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=double]" 
(142): here

src/caffe/layers/cudnn_conv_layer.cu(126): error: no operator "*" matches these operands
            operand types are: * const std::__1::vector<caffe::Blob<double> *, std::__1::allocator<caffe::Blob<double> *>>
          detected during instantiation of "void caffe::CuDNNConvolutionLayer<Dtype>::Backward_gpu(const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &, const std::__1::vector<__nv_bool, std::__1::allocator<__nv_bool>> &, const std::__1::vector<caffe::Blob<Dtype> *, std::__1::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=double]" 
(142): here

@slayton58
Copy link
Contributor Author

Have you pulled the latest version of the code? I’m using CUDA 6.5, gcc 4.6.3 and cuDNN v2 RC2 and it compiles without issue on my machine

On Feb 9, 2015, at 9:27 PM, Kerkil Choi notifications@github.com wrote:

Got a new error while building @slayton58 https://github.com/slayton58's commit:
Can anyone help?

src/caffe/layers/cudnn_conv_layer.cu(113): error: no operator "*" matches these operands
operand types are: * const std::__1::vectorcaffe::Blob<float *, std::__1::allocatorcaffe::Blob<float *>>
detected during instantiation of "void caffe::CuDNNConvolutionLayer::Backward_gpu(const std::__1::vectorcaffe::Blob<Dtype *, std::__1::allocatorcaffe::Blob<Dtype *>> &, const std::__1::vector<__nv_bool, std::__1::allocator<__nv_bool>> &, const std::__1::vectorcaffe::Blob<Dtype *, std::__1::allocatorcaffe::Blob<Dtype *>> &) [with Dtype=float]"
(142): here

src/caffe/layers/cudnn_conv_layer.cu(126): error: no operator "*" matches these operands
operand types are: * const std::__1::vectorcaffe::Blob<float *, std::__1::allocatorcaffe::Blob<float *>>
detected during instantiation of "void caffe::CuDNNConvolutionLayer::Backward_gpu(const std::__1::vectorcaffe::Blob<Dtype *, std::__1::allocatorcaffe::Blob<Dtype *>> &, const std::__1::vector<__nv_bool, std::__1::allocator<__nv_bool>> &, const std::__1::vectorcaffe::Blob<Dtype *, std::__1::allocatorcaffe::Blob<Dtype *>> &) [with Dtype=float]"
(142): here

src/caffe/layers/cudnn_conv_layer.cu(113): error: no operator "*" matches these operands
operand types are: * const std::__1::vectorcaffe::Blob<double *, std::__1::allocatorcaffe::Blob<double *>>
detected during instantiation of "void caffe::CuDNNConvolutionLayer::Backward_gpu(const std::__1::vectorcaffe::Blob<Dtype *, std::__1::allocatorcaffe::Blob<Dtype *>> &, const std::__1::vector<__nv_bool, std::__1::allocator<__nv_bool>> &, const std::__1::vectorcaffe::Blob<Dtype *, std::__1::allocatorcaffe::Blob<Dtype *>> &) [with Dtype=double]"
(142): here

src/caffe/layers/cudnn_conv_layer.cu(126): error: no operator "*" matches these operands
operand types are: * const std::__1::vectorcaffe::Blob<double *, std::__1::allocatorcaffe::Blob<double *>>
detected during instantiation of "void caffe::CuDNNConvolutionLayer::Backward_gpu(const std::__1::vectorcaffe::Blob<Dtype *, std::__1::allocatorcaffe::Blob<Dtype *>> &, const std::__1::vector<__nv_bool, std::__1::allocator<__nv_bool>> &, const std::__1::vectorcaffe::Blob<Dtype *, std::__1::allocatorcaffe::Blob<Dtype *>> &) [with Dtype=double]"
(142): here

Reply to this email directly or view it on GitHub #1739 (comment).

@kerkilchoi
Copy link

@slayton58 I pulled slayton58@bb00447.

I am using

  1. CUDA 7.0
  2. cuDNN v2 RC2
  3. Clang:
    Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)
    Target: x86_64-apple-darwin14.0.0
    Thread model: posix
    on Mac OSX Yosemite.

@slayton58 slayton58 closed this Feb 10, 2015
@shelhamer
Copy link
Member

Replaced by #1854 to dev.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants