Hi, I find this library when looking for a trivially copyable tuple implementation in my CUDA code. According to the CUDA C Programming Guide https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#global-function-argument-processing the argument passed to a CUDA kernel must be trivially copyable, so std::tuple is not a choice. However, I find this library do not support CUDA in lack of __device__ modifiers. Would you consider add CUDA support to this library? Specifically, when __CUDACC__ macro is defined, add __host__ __device__ to each of the member functions?
Hi, I find this library when looking for a trivially copyable tuple implementation in my CUDA code. According to the CUDA C Programming Guide https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#global-function-argument-processing the argument passed to a CUDA kernel must be trivially copyable, so std::tuple is not a choice. However, I find this library do not support CUDA in lack of
__device__modifiers. Would you consider add CUDA support to this library? Specifically, when__CUDACC__macro is defined, add__host__ __device__to each of the member functions?