Conversation
lockshaw
left a comment
There was a problem hiding this comment.
This is mostly correct except for the interaction with kernels (in particular, init_kernel). Any ff<Thing>_t (like ffTensorDescriptor_t is a cuda/hip thing and should be initialized in init_kernel based on the operator attributes and potentially the tensor shapes, not passed in--initializing these cuda/hip objects from the non-cuda/hip operator description is the essentially the whole point of init_kernel
Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @KateUnger)
lib/kernels/include/kernels/conv_2d_kernels.h line 19 at r1 (raw file):
ffConvolutionBwdFilterAlgo_t bwdFilterAlgo; ffConvolutionBwdDataAlgo_t bwdDataAlgo; req<optional<Activation>> activation;
Only the last field needs req
Suggestion:
optional<Activation> activation;lib/kernels/include/kernels/conv_2d_kernels.h line 50 at r1 (raw file):
ffConvolutionBwdFilterAlgo_t bwdFilterAlgo, ffConvolutionBwdDataAlgo_t bwdDataAlgo, req<optional<Activation>> relu,
req should only be a struct field and should not be passed around
Suggestion:
Conv2DPerDeviceState init_kernel(PerDeviceFFHandle handle,
ffTensorDescriptor_t inputTensor,
ffTensorDescriptor_t biasTensor,
ffTensorDescriptor_t outputTensor,
ffFilterDescriptor_t filterDesc,
ffActivationDescriptor_t actiDesc,
ffConvolutionDescriptor_t convDesc,
ffConvolutionFwdAlgo_t fwdAlgo,
ffConvolutionBwdFilterAlgo_t bwdFilterAlgo,
ffConvolutionBwdDataAlgo_t bwdDataAlgo,
optional<Activation> relu,lib/kernels/include/kernels/conv_2d_kernels.h line 50 at r1 (raw file):
ffConvolutionBwdFilterAlgo_t bwdFilterAlgo, ffConvolutionBwdDataAlgo_t bwdDataAlgo, req<optional<Activation>> relu,
No cuda objects can be passed in--they should be created in init_kernel
Suggestion:
Conv2DPerDeviceState init_kernel(PerDeviceFFHandle handle,
req<optional<Activation>> relu)lib/runtime/src/ops/conv_2d.h line 137 at r1 (raw file):
#endif
Delete
lib/runtime/src/ops/conv_2d.cc line 62 at r1 (raw file):
PerDeviceFFHandle handle = acc.get_argument<PerDeviceFFHandle>(HANDLE); auto const &attrs = acc.get_argument<Conv2DAttrs>(ATTRS);
Honestly just copy, the ref isn't worth it
Suggestion:
auto attrs = acc.get_argument<Conv2DAttrs>(ATTRS);lib/runtime/src/ops/conv_2d.cc line 72 at r1 (raw file):
ffConvolutionFwdAlgo_t fwdAlgo; ffConvolutionBwdFilterAlgo_t bwdFilterAlgo; ffConvolutionBwdDataAlgo_t bwdDataAlgo;
All of these should be create in init_kernel, not passed in
Code quote:
ffTensorDescriptor_t inputTensor;
ffTensorDescriptor_t biasTensor;
ffTensorDescriptor_t outputTensor;
ffFilterDescriptor_t filterDesc;
ffActivationDescriptor_t actiDesc;
ffConvolutionDescriptor_t convDesc;
ffConvolutionFwdAlgo_t fwdAlgo;
ffConvolutionBwdFilterAlgo_t bwdFilterAlgo;
ffConvolutionBwdDataAlgo_t bwdDataAlgo;lib/runtime/src/ops/conv_2d.cc line 87 at r1 (raw file):
bwdDataAlgo, attrs.activation, attrs.use_bias));
init_kernel should just be a function of the handle and the attrs (and any shape info from parallel_tensor_shape etc)
Code quote:
init_kernel(handle,
inputTensor,
biasTensor,
outputTensor,
filterDesc,
actiDesc,
convDesc,
fwdAlgo,
bwdFilterAlgo,
bwdDataAlgo,
attrs.activation,
attrs.use_bias));lib/runtime/src/ops/conv_2d.cc line 114 at r1 (raw file):
profiling, "[Conv2d] forward_time = %.2lfms\n", &per_device_state,
Change forward_kernel to take the state by reference instead of by pointer
lib/runtime/src/ops/conv_2d.cc line 146 at r1 (raw file):
profiling, "[Conv2d] backward_time = %.2lfms\n", &per_device_state,
Change backward_kernel to take the per device state as const &, not as a pointer
KateUnger
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 9 files reviewed, 9 unresolved discussions (waiting on @KateUnger and @lockshaw)
lib/kernels/include/kernels/conv_2d_kernels.h line 19 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Only the last field needs
req
Done.
lib/kernels/include/kernels/conv_2d_kernels.h line 50 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
reqshould only be a struct field and should not be passed around
Done.
lib/kernels/include/kernels/conv_2d_kernels.h line 50 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
No cuda objects can be passed in--they should be created in
init_kernel
Done.
lib/runtime/src/ops/conv_2d.h line 137 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Delete
Done.
lib/runtime/src/ops/conv_2d.cc line 62 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Honestly just copy, the ref isn't worth it
Done.
lib/runtime/src/ops/conv_2d.cc line 72 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
All of these should be create in
init_kernel, not passed in
Done.
lib/runtime/src/ops/conv_2d.cc line 87 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
init_kernelshould just be a function of the handle and the attrs (and any shape info fromparallel_tensor_shapeetc)
Done.
lib/runtime/src/ops/conv_2d.cc line 114 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Change
forward_kernelto take the state by reference instead of by pointer
Done.
lib/runtime/src/ops/conv_2d.cc line 146 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Change
backward_kernelto take the per device state asconst &, not as a pointer
Done.
lockshaw
left a comment
There was a problem hiding this comment.
Reviewed 9 of 9 files at r2, all commit messages.
Reviewable status: all files reviewed, 20 unresolved discussions (waiting on @KateUnger)
lib/kernels/include/kernels/conv_2d_kernels.h line 14 at r2 (raw file):
struct Conv2DPerDeviceState { PerDeviceFFHandle handle; optional<Activation> activation;
Why does this even have an activation field? It should probably be removed, along with Conv2DPerDeviceState::use_bias as both of them are available from the Conv2DAttrs
lib/kernels/include/kernels/conv_2d_kernels.h line 25 at r2 (raw file):
ffConvolutionFwdAlgo_t fwdAlgo; ffConvolutionBwdFilterAlgo_t bwdFilterAlgo; ffConvolutionBwdDataAlgo_t bwdDataAlgo;
Suggestion:
req<ffConvolutionBwdDataAlgo_t> bwdDataAlgo;lib/kernels/src/cuda/conv_2d_kernels.cu line 46 at r2 (raw file):
} return perfResults[0].algo; }
Suggestion:
cudnnConvolutionBwdDataAlgo_t selectConvolutionBackwardDataAlgorithm(
cudnnHandle_t handle,
const cudnnFilterDescriptor_t wDesc,
void const *w,
const cudnnTensorDescriptor_t dyDesc,
void const *dy,
const cudnnConvolutionDescriptor_t convDesc,
void *workSpace,
size_t workSpaceSize,
const cudnnTensorDescriptor_t dxDesc,
void *dx,
float *time) {
int const reqAlgCnt = 8;
int cnt = 0;
cudnnConvolutionBwdDataAlgoPerf_t perfResults[reqAlgCnt];
checkCUDNN(cudnnFindConvolutionBackwardDataAlgorithmEx(handle,
wDesc,
w,
dyDesc,
dy,
convDesc,
dxDesc,
dx,
reqAlgCnt,
&cnt,
perfResults,
workSpace,
workSpaceSize));
assert(cnt > 0);
checkCUDNN(perfResults[0].status);
if (time != nullptr) {
*time = perfResults[0].time;
}
return perfResults[0].algo;
}lib/kernels/src/cuda/conv_2d_kernels.cu line 85 at r2 (raw file):
} return perfResults[0].algo; }
Suggestion:
assert(cnt > 0);
checkCUDNN(perfResults[0].status);
if (time != nullptr) {
*time = perfResults[0].time;
}
return perfResults[0].algo;
}lib/kernels/src/cuda/conv_2d_kernels.cu line 124 at r2 (raw file):
} return perfResults[0].algo; }
Suggestion:
cudnnConvolutionBwdFilterAlgo_t selectConvolutionBackwardFilterAlgorithm(
cudnnHandle_t handle,
const cudnnTensorDescriptor_t xDesc,
void const *x,
const cudnnTensorDescriptor_t dyDesc,
void const *dy,
const cudnnConvolutionDescriptor_t convDesc,
void *workSpace,
size_t workSpaceSize,
const cudnnFilterDescriptor_t dwDesc,
void *dw,
float *time) {
int const reqAlgCnt = 8;
int cnt = 0;
cudnnConvolutionBwdFilterAlgoPerf_t perfResults[reqAlgCnt];
checkCUDNN(cudnnFindConvolutionBackwardFilterAlgorithmEx(handle,
xDesc,
x,
dyDesc,
dy,
convDesc,
dwDesc,
dw,
reqAlgCnt,
&cnt,
perfResults,
workSpace,
workSpaceSize));
assert(cnt > 0);
checkCUDNN(perfResults[0].status);
if (time != nullptr) {
*time = perfResults[0].time;
}
return perfResults[0].algo;
}lib/kernels/src/cuda/conv_2d_kernels.cu line 185 at r2 (raw file):
kernel_w, input_c / groups, output_c);
lib/kernels/src/cuda/conv_2d_kernels.cu line 202 at r2 (raw file):
1 /*upscale_y*/, CUDNN_CROSS_CORRELATION, CUDNN_DATA_FLOAT));
Suggestion:
checkCUDNN(cudnnSetConvolution2dDescriptor(convDesc,
pad_h,
pad_w,
stride_h,
stride_w,
1 /*upscale_x*/,
1 /*upscale_y*/,
CUDNN_CROSS_CORRELATION,
CUDNN_DATA_FLOAT));lib/kernels/src/cuda/conv_2d_kernels.cu line 238 at r2 (raw file):
outputTensor, output.get_float_ptr(), &time);
Suggestion:
// select forward algorithm
fwdAlgo = selectConvolutionForwardAlgorithm(handle.dnn,
inputTensor,
input.get_float_ptr(),
filterDesc,
filter_ptr,
convDesc,
handle.workSpace,
handle.workSpaceSize,
outputTensor,
output.get_float_ptr(),
nullptr);lib/kernels/src/cuda/conv_2d_kernels.cu line 252 at r2 (raw file):
filterDesc, filter_grad_ptr, &time);
Suggestion:
bwdFilterAlgo =
selectConvolutionBackwardFilterAlgorithm(handle.dnn,
inputTensor,
input.get_float_ptr(),
outputTensor,
output.get_float_ptr(),
convDesc,
handle.workSpace,
handle.workSpaceSize,
filterDesc,
filter_grad_ptr,
nullptr);lib/kernels/src/cuda/conv_2d_kernels.cu line 265 at r2 (raw file):
inputTensor, input.get_float_ptr(), &time);
Suggestion:
bwdDataAlgo = selectConvolutionBackwardDataAlgorithm(handle.dnn,
filterDesc,
filter_ptr,
outputTensor,
output.get_float_ptr(),
convDesc,
handle.workSpace,
handle.workSpaceSize,
inputTensor,
input.get_float_ptr(),
nullptr);lib/kernels/src/cuda/conv_2d_kernels.cu line 282 at r2 (raw file):
fwdAlgo, bwdFilterAlgo, bwdDataAlgo};
Suggestion:
Conv2DPerDeviceState per_device_state = {handle,
inputTensor,
biasTensor,
outputTensor,
filterDesc,
actiDesc,
convDesc,
fwdAlgo,
bwdFilterAlgo,
bwdDataAlgo};lib/kernels/src/cuda/conv_2d_kernels.cu line 291 at r2 (raw file):
float *output_ptr, float const *filter_ptr, float const *bias_ptr) {
Suggestion:
void forward_kernel(cudaStream_t stream,
Conv2DPerDeviceState const &m,
optional<Activation> const &activation,
bool use_bias,
float const *input_ptr,
float *output_ptr,
float const *filter_ptr,
float const *bias_ptr) {lib/kernels/src/cuda/conv_2d_kernels.cu line 292 at r2 (raw file):
float const *filter_ptr, float const *bias_ptr) { checkCUDNN(cudnnSetStream(m->handle.dnn, stream));
Suggestion:
checkCUDNN(cudnnSetStream(m.handle.dnn, stream));lib/kernels/src/cuda/conv_2d_kernels.cu line 307 at r2 (raw file):
&beta, m->outputTensor, output_ptr));
Suggestion:
float alpha = 1.0f, beta = 0.0f;
checkCUDNN(cudnnConvolutionForward(m.handle.dnn,
&alpha,
m.inputTensor,
input_ptr,
m.filterDesc,
filter_ptr,
m.convDesc,
m.fwdAlgo,
m.handle.workSpace,
m.handle.workSpaceSize,
&beta,
m.outputTensor,
output_ptr));lib/kernels/src/cuda/conv_2d_kernels.cu line 317 at r2 (raw file):
&alpha, m->outputTensor, output_ptr));
Suggestion:
// use_bias == True
if (bias_ptr != NULL) {
checkCUDNN(cudnnAddTensor(m.handle.dnn,
&alpha,
m.biasTensor,
bias_ptr,
&alpha,
m.outputTensor,
output_ptr));lib/kernels/src/cuda/conv_2d_kernels.cu line 322 at r2 (raw file):
checkCUDNN(cudnnActivationForward(m->handle.dnn, m->actiDesc, &alpha,
Suggestion:
if (activation.has_value()) {
checkCUDNN(cudnnActivationForward(m.handle.dnn,
m.actiDesc,
&alpha,lib/kernels/src/cuda/conv_2d_kernels.cu line 344 at r2 (raw file):
float alpha = 1.0f; // float beta = 0.0f; if (m->activation) {
Suggestion:
if (m.activation) {lib/kernels/src/hip/conv_2d_kernels.cpp line 0 at r2 (raw file):
See comments in conv_2d_kernels.cu
lib/runtime/src/ops/conv_2d.cc line 115 at r2 (raw file):
input.get_float_ptr(), output.get_float_ptr(), filter.get_float_ptr(),
Suggestion:
return profile(forward_kernel,
profiling,
"[Conv2d] forward_time = %.2lfms\n",
per_device_state,
input.get_float_ptr(),
output.get_float_ptr(),
filter.get_float_ptr(),lib/runtime/src/ops/conv_2d.cc line 147 at r2 (raw file):
input.get_float_ptr(), input_grad.get_float_ptr(), output.get_float_ptr(),
Suggestion:
return profile(backward_kernel,
profiling,
"[Conv2d] backward_time = %.2lfms\n",
per_device_state,
input.get_float_ptr(),
input_grad.get_float_ptr(),
output.get_float_ptr(),
KateUnger
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 20 unresolved discussions (waiting on @lockshaw)
lib/kernels/include/kernels/conv_2d_kernels.h line 14 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why does this even have an
activationfield? It should probably be removed, along withConv2DPerDeviceState::use_biasas both of them are available from theConv2DAttrs
Done.
lib/kernels/include/kernels/conv_2d_kernels.h line 25 at r2 (raw file):
ffConvolutionFwdAlgo_t fwdAlgo; ffConvolutionBwdFilterAlgo_t bwdFilterAlgo; ffConvolutionBwdDataAlgo_t bwdDataAlgo;
Done.
lib/kernels/src/cuda/conv_2d_kernels.cu line 46 at r2 (raw file):
} return perfResults[0].algo; }
Done.
lib/kernels/src/cuda/conv_2d_kernels.cu line 85 at r2 (raw file):
} return perfResults[0].algo; }
Done.
lib/kernels/src/cuda/conv_2d_kernels.cu line 124 at r2 (raw file):
} return perfResults[0].algo; }
Done.
lib/kernels/src/cuda/conv_2d_kernels.cu line 185 at r2 (raw file):
kernel_w, input_c / groups, output_c);
Done.
lib/kernels/src/cuda/conv_2d_kernels.cu line 202 at r2 (raw file):
1 /*upscale_y*/, CUDNN_CROSS_CORRELATION, CUDNN_DATA_FLOAT));
Done.
lib/kernels/src/cuda/conv_2d_kernels.cu line 238 at r2 (raw file):
outputTensor, output.get_float_ptr(), &time);
Done.
lib/kernels/src/cuda/conv_2d_kernels.cu line 252 at r2 (raw file):
filterDesc, filter_grad_ptr, &time);
Done.
lib/kernels/src/cuda/conv_2d_kernels.cu line 265 at r2 (raw file):
inputTensor, input.get_float_ptr(), &time);
Done.
lib/kernels/src/cuda/conv_2d_kernels.cu line 282 at r2 (raw file):
fwdAlgo, bwdFilterAlgo, bwdDataAlgo};
Done.
lib/kernels/src/cuda/conv_2d_kernels.cu line 291 at r2 (raw file):
float *output_ptr, float const *filter_ptr, float const *bias_ptr) {
Done.
lib/kernels/src/cuda/conv_2d_kernels.cu line 292 at r2 (raw file):
float const *filter_ptr, float const *bias_ptr) { checkCUDNN(cudnnSetStream(m->handle.dnn, stream));
Done.
lib/kernels/src/cuda/conv_2d_kernels.cu line 307 at r2 (raw file):
&beta, m->outputTensor, output_ptr));
Done.
lib/kernels/src/cuda/conv_2d_kernels.cu line 317 at r2 (raw file):
&alpha, m->outputTensor, output_ptr));
Done.
lib/kernels/src/cuda/conv_2d_kernels.cu line 322 at r2 (raw file):
checkCUDNN(cudnnActivationForward(m->handle.dnn, m->actiDesc, &alpha,
Done.
lib/kernels/src/cuda/conv_2d_kernels.cu line 344 at r2 (raw file):
float alpha = 1.0f; // float beta = 0.0f; if (m->activation) {
Done.
lib/kernels/src/hip/conv_2d_kernels.cpp line at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
See comments in conv_2d_kernels.cu
I'll update the function parameters, but you told me not to worry about the .cpp file for now. Lmk if that changes!
lib/runtime/src/ops/conv_2d.cc line 115 at r2 (raw file):
input.get_float_ptr(), output.get_float_ptr(), filter.get_float_ptr(),
Done.
lib/runtime/src/ops/conv_2d.cc line 147 at r2 (raw file):
input.get_float_ptr(), input_grad.get_float_ptr(), output.get_float_ptr(),
Done.
lockshaw
left a comment
There was a problem hiding this comment.
Reviewed 4 of 4 files at r3, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @KateUnger)
Description of changes:
TODO: Change
conv2d_kernels.cppandconv2d_kernels.cuRelated Issues:
Linked Issues:
Issues closed by this PR:
This change is