Skip to content

Conv 2D Op#1112

Merged
lockshaw merged 5 commits intoflexflow:repo-refactorfrom
KateUnger:conv2d_new
Sep 13, 2023
Merged

Conv 2D Op#1112
lockshaw merged 5 commits intoflexflow:repo-refactorfrom
KateUnger:conv2d_new

Conversation

@KateUnger
Copy link
Contributor

@KateUnger KateUnger commented Sep 7, 2023

Description of changes:

  • Update Conv_2d operator

TODO: Change conv2d_kernels.cpp and conv2d_kernels.cu

Related Issues:

Linked Issues:

Issues closed by this PR:


This change is Reviewable

@KateUnger KateUnger self-assigned this Sep 7, 2023
@KateUnger KateUnger requested a review from lockshaw September 7, 2023 21:13
@KateUnger KateUnger marked this pull request as draft September 7, 2023 21:13
@KateUnger KateUnger changed the base branch from inference to repo-refactor September 7, 2023 21:13
Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly correct except for the interaction with kernels (in particular, init_kernel). Any ff<Thing>_t (like ffTensorDescriptor_t is a cuda/hip thing and should be initialized in init_kernel based on the operator attributes and potentially the tensor shapes, not passed in--initializing these cuda/hip objects from the non-cuda/hip operator description is the essentially the whole point of init_kernel

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @KateUnger)


lib/kernels/include/kernels/conv_2d_kernels.h line 19 at r1 (raw file):

  ffConvolutionBwdFilterAlgo_t bwdFilterAlgo;
  ffConvolutionBwdDataAlgo_t bwdDataAlgo;
  req<optional<Activation>> activation;

Only the last field needs req

Suggestion:

optional<Activation> activation;

lib/kernels/include/kernels/conv_2d_kernels.h line 50 at r1 (raw file):

                                 ffConvolutionBwdFilterAlgo_t bwdFilterAlgo,
                                 ffConvolutionBwdDataAlgo_t bwdDataAlgo,
                                 req<optional<Activation>> relu,

req should only be a struct field and should not be passed around

Suggestion:

Conv2DPerDeviceState init_kernel(PerDeviceFFHandle handle,
                                 ffTensorDescriptor_t inputTensor,
                                 ffTensorDescriptor_t biasTensor,
                                 ffTensorDescriptor_t outputTensor,
                                 ffFilterDescriptor_t filterDesc,
                                 ffActivationDescriptor_t actiDesc,
                                 ffConvolutionDescriptor_t convDesc,
                                 ffConvolutionFwdAlgo_t fwdAlgo,
                                 ffConvolutionBwdFilterAlgo_t bwdFilterAlgo,
                                 ffConvolutionBwdDataAlgo_t bwdDataAlgo,
                                 optional<Activation> relu,

lib/kernels/include/kernels/conv_2d_kernels.h line 50 at r1 (raw file):

                                 ffConvolutionBwdFilterAlgo_t bwdFilterAlgo,
                                 ffConvolutionBwdDataAlgo_t bwdDataAlgo,
                                 req<optional<Activation>> relu,

No cuda objects can be passed in--they should be created in init_kernel

Suggestion:

Conv2DPerDeviceState init_kernel(PerDeviceFFHandle handle,
                                 req<optional<Activation>> relu)

lib/runtime/src/ops/conv_2d.h line 137 at r1 (raw file):

#endif

Delete


lib/runtime/src/ops/conv_2d.cc line 62 at r1 (raw file):

  PerDeviceFFHandle handle = acc.get_argument<PerDeviceFFHandle>(HANDLE);
  auto const &attrs = acc.get_argument<Conv2DAttrs>(ATTRS);

Honestly just copy, the ref isn't worth it

Suggestion:

  auto attrs = acc.get_argument<Conv2DAttrs>(ATTRS);

lib/runtime/src/ops/conv_2d.cc line 72 at r1 (raw file):

  ffConvolutionFwdAlgo_t fwdAlgo;
  ffConvolutionBwdFilterAlgo_t bwdFilterAlgo;
  ffConvolutionBwdDataAlgo_t bwdDataAlgo;

All of these should be create in init_kernel, not passed in

Code quote:

  ffTensorDescriptor_t inputTensor;
  ffTensorDescriptor_t biasTensor;
  ffTensorDescriptor_t outputTensor;
  ffFilterDescriptor_t filterDesc;
  ffActivationDescriptor_t actiDesc;
  ffConvolutionDescriptor_t convDesc;
  ffConvolutionFwdAlgo_t fwdAlgo;
  ffConvolutionBwdFilterAlgo_t bwdFilterAlgo;
  ffConvolutionBwdDataAlgo_t bwdDataAlgo;

lib/runtime/src/ops/conv_2d.cc line 87 at r1 (raw file):

                      bwdDataAlgo,
                      attrs.activation,
                      attrs.use_bias));

init_kernel should just be a function of the handle and the attrs (and any shape info from parallel_tensor_shape etc)

Code quote:

          init_kernel(handle,
                      inputTensor,
                      biasTensor,
                      outputTensor,
                      filterDesc,
                      actiDesc,
                      convDesc,
                      fwdAlgo,
                      bwdFilterAlgo,
                      bwdDataAlgo,
                      attrs.activation,
                      attrs.use_bias));

lib/runtime/src/ops/conv_2d.cc line 114 at r1 (raw file):

                 profiling,
                 "[Conv2d] forward_time = %.2lfms\n",
                 &per_device_state,

Change forward_kernel to take the state by reference instead of by pointer


lib/runtime/src/ops/conv_2d.cc line 146 at r1 (raw file):

                 profiling,
                 "[Conv2d] backward_time = %.2lfms\n",
                 &per_device_state,

Change backward_kernel to take the per device state as const &, not as a pointer

Copy link
Contributor Author

@KateUnger KateUnger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 9 files reviewed, 9 unresolved discussions (waiting on @KateUnger and @lockshaw)


lib/kernels/include/kernels/conv_2d_kernels.h line 19 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Only the last field needs req

Done.


lib/kernels/include/kernels/conv_2d_kernels.h line 50 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

req should only be a struct field and should not be passed around

Done.


lib/kernels/include/kernels/conv_2d_kernels.h line 50 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

No cuda objects can be passed in--they should be created in init_kernel

Done.


lib/runtime/src/ops/conv_2d.h line 137 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Delete

Done.


lib/runtime/src/ops/conv_2d.cc line 62 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Honestly just copy, the ref isn't worth it

Done.


lib/runtime/src/ops/conv_2d.cc line 72 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

All of these should be create in init_kernel, not passed in

Done.


lib/runtime/src/ops/conv_2d.cc line 87 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

init_kernel should just be a function of the handle and the attrs (and any shape info from parallel_tensor_shape etc)

Done.


lib/runtime/src/ops/conv_2d.cc line 114 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Change forward_kernel to take the state by reference instead of by pointer

Done.


lib/runtime/src/ops/conv_2d.cc line 146 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Change backward_kernel to take the per device state as const &, not as a pointer

Done.

@KateUnger KateUnger marked this pull request as ready for review September 11, 2023 19:33
@KateUnger KateUnger requested a review from lockshaw September 11, 2023 22:12
Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r2, all commit messages.
Reviewable status: all files reviewed, 20 unresolved discussions (waiting on @KateUnger)


lib/kernels/include/kernels/conv_2d_kernels.h line 14 at r2 (raw file):

struct Conv2DPerDeviceState {
  PerDeviceFFHandle handle;
  optional<Activation> activation;

Why does this even have an activation field? It should probably be removed, along with Conv2DPerDeviceState::use_bias as both of them are available from the Conv2DAttrs


lib/kernels/include/kernels/conv_2d_kernels.h line 25 at r2 (raw file):

  ffConvolutionFwdAlgo_t fwdAlgo;
  ffConvolutionBwdFilterAlgo_t bwdFilterAlgo;
  ffConvolutionBwdDataAlgo_t bwdDataAlgo;

Suggestion:

  req<ffConvolutionBwdDataAlgo_t> bwdDataAlgo;

lib/kernels/src/cuda/conv_2d_kernels.cu line 46 at r2 (raw file):

  }
  return perfResults[0].algo;
}

Suggestion:

cudnnConvolutionBwdDataAlgo_t selectConvolutionBackwardDataAlgorithm(
    cudnnHandle_t handle,
    const cudnnFilterDescriptor_t wDesc,
    void const *w,
    const cudnnTensorDescriptor_t dyDesc,
    void const *dy,
    const cudnnConvolutionDescriptor_t convDesc,
    void *workSpace,
    size_t workSpaceSize,
    const cudnnTensorDescriptor_t dxDesc,
    void *dx,
    float *time) {
  int const reqAlgCnt = 8;
  int cnt = 0;
  cudnnConvolutionBwdDataAlgoPerf_t perfResults[reqAlgCnt];
  checkCUDNN(cudnnFindConvolutionBackwardDataAlgorithmEx(handle,
                                                         wDesc,
                                                         w,
                                                         dyDesc,
                                                         dy,
                                                         convDesc,
                                                         dxDesc,
                                                         dx,
                                                         reqAlgCnt,
                                                         &cnt,
                                                         perfResults,
                                                         workSpace,
                                                         workSpaceSize));
  assert(cnt > 0);
  checkCUDNN(perfResults[0].status);
  if (time != nullptr) {
    *time = perfResults[0].time;
  }
  return perfResults[0].algo;
}

lib/kernels/src/cuda/conv_2d_kernels.cu line 85 at r2 (raw file):

  }
  return perfResults[0].algo;
}

Suggestion:

  assert(cnt > 0);
  checkCUDNN(perfResults[0].status);
  if (time != nullptr) {
    *time = perfResults[0].time;
  }
  return perfResults[0].algo;
}

lib/kernels/src/cuda/conv_2d_kernels.cu line 124 at r2 (raw file):

  }
  return perfResults[0].algo;
}

Suggestion:

cudnnConvolutionBwdFilterAlgo_t selectConvolutionBackwardFilterAlgorithm(
    cudnnHandle_t handle,
    const cudnnTensorDescriptor_t xDesc,
    void const *x,
    const cudnnTensorDescriptor_t dyDesc,
    void const *dy,
    const cudnnConvolutionDescriptor_t convDesc,
    void *workSpace,
    size_t workSpaceSize,
    const cudnnFilterDescriptor_t dwDesc,
    void *dw,
    float *time) {
  int const reqAlgCnt = 8;
  int cnt = 0;
  cudnnConvolutionBwdFilterAlgoPerf_t perfResults[reqAlgCnt];
  checkCUDNN(cudnnFindConvolutionBackwardFilterAlgorithmEx(handle,
                                                           xDesc,
                                                           x,
                                                           dyDesc,
                                                           dy,
                                                           convDesc,
                                                           dwDesc,
                                                           dw,
                                                           reqAlgCnt,
                                                           &cnt,
                                                           perfResults,
                                                           workSpace,
                                                           workSpaceSize));
  assert(cnt > 0);
  checkCUDNN(perfResults[0].status);
  if (time != nullptr) {
    *time = perfResults[0].time;
  }
  return perfResults[0].algo;
}

lib/kernels/src/cuda/conv_2d_kernels.cu line 185 at r2 (raw file):

         kernel_w,
         input_c / groups,
         output_c);

lib/kernels/src/cuda/conv_2d_kernels.cu line 202 at r2 (raw file):

                                             1 /*upscale_y*/,
                                             CUDNN_CROSS_CORRELATION,
                                             CUDNN_DATA_FLOAT));

Suggestion:

  checkCUDNN(cudnnSetConvolution2dDescriptor(convDesc,
                                             pad_h,
                                             pad_w,
                                             stride_h,
                                             stride_w,
                                             1 /*upscale_x*/,
                                             1 /*upscale_y*/,
                                             CUDNN_CROSS_CORRELATION,
                                             CUDNN_DATA_FLOAT));

lib/kernels/src/cuda/conv_2d_kernels.cu line 238 at r2 (raw file):

                                              outputTensor,
                                              output.get_float_ptr(),
                                              &time);

Suggestion:

  // select forward algorithm
  fwdAlgo = selectConvolutionForwardAlgorithm(handle.dnn,
                                              inputTensor,
                                              input.get_float_ptr(),
                                              filterDesc,
                                              filter_ptr,
                                              convDesc,
                                              handle.workSpace,
                                              handle.workSpaceSize,
                                              outputTensor,
                                              output.get_float_ptr(),
                                              nullptr);

lib/kernels/src/cuda/conv_2d_kernels.cu line 252 at r2 (raw file):

                                               filterDesc,
                                               filter_grad_ptr,
                                               &time);

Suggestion:

  bwdFilterAlgo =
      selectConvolutionBackwardFilterAlgorithm(handle.dnn,
                                               inputTensor,
                                               input.get_float_ptr(),
                                               outputTensor,
                                               output.get_float_ptr(),
                                               convDesc,
                                               handle.workSpace,
                                               handle.workSpaceSize,
                                               filterDesc,
                                               filter_grad_ptr,
                                               nullptr);

lib/kernels/src/cuda/conv_2d_kernels.cu line 265 at r2 (raw file):

                                                       inputTensor,
                                                       input.get_float_ptr(),
                                                       &time);

Suggestion:

  bwdDataAlgo = selectConvolutionBackwardDataAlgorithm(handle.dnn,
                                                       filterDesc,
                                                       filter_ptr,
                                                       outputTensor,
                                                       output.get_float_ptr(),
                                                       convDesc,
                                                       handle.workSpace,
                                                       handle.workSpaceSize,
                                                       inputTensor,
                                                       input.get_float_ptr(),
                                                       nullptr);

lib/kernels/src/cuda/conv_2d_kernels.cu line 282 at r2 (raw file):

                                           fwdAlgo,
                                           bwdFilterAlgo,
                                           bwdDataAlgo};

Suggestion:

  Conv2DPerDeviceState per_device_state = {handle,
                                           inputTensor,
                                           biasTensor,
                                           outputTensor,
                                           filterDesc,
                                           actiDesc,
                                           convDesc,
                                           fwdAlgo,
                                           bwdFilterAlgo,
                                           bwdDataAlgo};

lib/kernels/src/cuda/conv_2d_kernels.cu line 291 at r2 (raw file):

                    float *output_ptr,
                    float const *filter_ptr,
                    float const *bias_ptr) {

Suggestion:

void forward_kernel(cudaStream_t stream,
                    Conv2DPerDeviceState const &m,
                    optional<Activation> const &activation,
                    bool use_bias,
                    float const *input_ptr,
                    float *output_ptr,
                    float const *filter_ptr,
                    float const *bias_ptr) {

lib/kernels/src/cuda/conv_2d_kernels.cu line 292 at r2 (raw file):

                    float const *filter_ptr,
                    float const *bias_ptr) {
  checkCUDNN(cudnnSetStream(m->handle.dnn, stream));

Suggestion:

  checkCUDNN(cudnnSetStream(m.handle.dnn, stream));

lib/kernels/src/cuda/conv_2d_kernels.cu line 307 at r2 (raw file):

                                     &beta,
                                     m->outputTensor,
                                     output_ptr));

Suggestion:

  float alpha = 1.0f, beta = 0.0f;
  checkCUDNN(cudnnConvolutionForward(m.handle.dnn,
                                     &alpha,
                                     m.inputTensor,
                                     input_ptr,
                                     m.filterDesc,
                                     filter_ptr,
                                     m.convDesc,
                                     m.fwdAlgo,
                                     m.handle.workSpace,
                                     m.handle.workSpaceSize,
                                     &beta,
                                     m.outputTensor,
                                     output_ptr));

lib/kernels/src/cuda/conv_2d_kernels.cu line 317 at r2 (raw file):

                              &alpha,
                              m->outputTensor,
                              output_ptr));

Suggestion:

  // use_bias == True
  if (bias_ptr != NULL) {
    checkCUDNN(cudnnAddTensor(m.handle.dnn,
                              &alpha,
                              m.biasTensor,
                              bias_ptr,
                              &alpha,
                              m.outputTensor,
                              output_ptr));

lib/kernels/src/cuda/conv_2d_kernels.cu line 322 at r2 (raw file):

    checkCUDNN(cudnnActivationForward(m->handle.dnn,
                                      m->actiDesc,
                                      &alpha,

Suggestion:

  if (activation.has_value()) {
    checkCUDNN(cudnnActivationForward(m.handle.dnn,
                                      m.actiDesc,
                                      &alpha,

lib/kernels/src/cuda/conv_2d_kernels.cu line 344 at r2 (raw file):

  float alpha = 1.0f;
  // float beta = 0.0f;
  if (m->activation) {

Suggestion:

  if (m.activation) {

lib/kernels/src/hip/conv_2d_kernels.cpp line 0 at r2 (raw file):
See comments in conv_2d_kernels.cu


lib/runtime/src/ops/conv_2d.cc line 115 at r2 (raw file):

                 input.get_float_ptr(),
                 output.get_float_ptr(),
                 filter.get_float_ptr(),

Suggestion:

  return profile(forward_kernel,
                 profiling,
                 "[Conv2d] forward_time = %.2lfms\n",
                 per_device_state,
                 input.get_float_ptr(),
                 output.get_float_ptr(),
                 filter.get_float_ptr(),

lib/runtime/src/ops/conv_2d.cc line 147 at r2 (raw file):

                 input.get_float_ptr(),
                 input_grad.get_float_ptr(),
                 output.get_float_ptr(),

Suggestion:

  return profile(backward_kernel,
                 profiling,
                 "[Conv2d] backward_time = %.2lfms\n",
                 per_device_state,
                 input.get_float_ptr(),
                 input_grad.get_float_ptr(),
                 output.get_float_ptr(),

Copy link
Contributor Author

@KateUnger KateUnger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 20 unresolved discussions (waiting on @lockshaw)


lib/kernels/include/kernels/conv_2d_kernels.h line 14 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why does this even have an activation field? It should probably be removed, along with Conv2DPerDeviceState::use_bias as both of them are available from the Conv2DAttrs

Done.


lib/kernels/include/kernels/conv_2d_kernels.h line 25 at r2 (raw file):

  ffConvolutionFwdAlgo_t fwdAlgo;
  ffConvolutionBwdFilterAlgo_t bwdFilterAlgo;
  ffConvolutionBwdDataAlgo_t bwdDataAlgo;

Done.


lib/kernels/src/cuda/conv_2d_kernels.cu line 46 at r2 (raw file):

  }
  return perfResults[0].algo;
}

Done.


lib/kernels/src/cuda/conv_2d_kernels.cu line 85 at r2 (raw file):

  }
  return perfResults[0].algo;
}

Done.


lib/kernels/src/cuda/conv_2d_kernels.cu line 124 at r2 (raw file):

  }
  return perfResults[0].algo;
}

Done.


lib/kernels/src/cuda/conv_2d_kernels.cu line 185 at r2 (raw file):

         kernel_w,
         input_c / groups,
         output_c);

Done.


lib/kernels/src/cuda/conv_2d_kernels.cu line 202 at r2 (raw file):

                                             1 /*upscale_y*/,
                                             CUDNN_CROSS_CORRELATION,
                                             CUDNN_DATA_FLOAT));

Done.


lib/kernels/src/cuda/conv_2d_kernels.cu line 238 at r2 (raw file):

                                              outputTensor,
                                              output.get_float_ptr(),
                                              &time);

Done.


lib/kernels/src/cuda/conv_2d_kernels.cu line 252 at r2 (raw file):

                                               filterDesc,
                                               filter_grad_ptr,
                                               &time);

Done.


lib/kernels/src/cuda/conv_2d_kernels.cu line 265 at r2 (raw file):

                                                       inputTensor,
                                                       input.get_float_ptr(),
                                                       &time);

Done.


lib/kernels/src/cuda/conv_2d_kernels.cu line 282 at r2 (raw file):

                                           fwdAlgo,
                                           bwdFilterAlgo,
                                           bwdDataAlgo};

Done.


lib/kernels/src/cuda/conv_2d_kernels.cu line 291 at r2 (raw file):

                    float *output_ptr,
                    float const *filter_ptr,
                    float const *bias_ptr) {

Done.


lib/kernels/src/cuda/conv_2d_kernels.cu line 292 at r2 (raw file):

                    float const *filter_ptr,
                    float const *bias_ptr) {
  checkCUDNN(cudnnSetStream(m->handle.dnn, stream));

Done.


lib/kernels/src/cuda/conv_2d_kernels.cu line 307 at r2 (raw file):

                                     &beta,
                                     m->outputTensor,
                                     output_ptr));

Done.


lib/kernels/src/cuda/conv_2d_kernels.cu line 317 at r2 (raw file):

                              &alpha,
                              m->outputTensor,
                              output_ptr));

Done.


lib/kernels/src/cuda/conv_2d_kernels.cu line 322 at r2 (raw file):

    checkCUDNN(cudnnActivationForward(m->handle.dnn,
                                      m->actiDesc,
                                      &alpha,

Done.


lib/kernels/src/cuda/conv_2d_kernels.cu line 344 at r2 (raw file):

  float alpha = 1.0f;
  // float beta = 0.0f;
  if (m->activation) {

Done.


lib/kernels/src/hip/conv_2d_kernels.cpp line at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

See comments in conv_2d_kernels.cu

I'll update the function parameters, but you told me not to worry about the .cpp file for now. Lmk if that changes!


lib/runtime/src/ops/conv_2d.cc line 115 at r2 (raw file):

                 input.get_float_ptr(),
                 output.get_float_ptr(),
                 filter.get_float_ptr(),

Done.


lib/runtime/src/ops/conv_2d.cc line 147 at r2 (raw file):

                 input.get_float_ptr(),
                 input_grad.get_float_ptr(),
                 output.get_float_ptr(),

Done.

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r3, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @KateUnger)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update Conv2D operator

2 participants