synchronize in the beginning of all CUDA functions by njzjz · Pull Request #2661 · deepmodeling/deepmd-kit

njzjz · 2023-07-08T01:24:32Z

TensorFlow made streams non-blocking in tensorflow/tensorflow@9d12620. Our own CUDA functions uses the default streams that is different from TensorFlow's, so we need to synchronize in the beginning of all functions.

In the future, it might be worth using the same stream as TensorFlow's to improve the performance.

Fix deepmodeling#2660. TensorFlow made streams non-blocking in tensorflow/tensorflow@9d12620. Our own CUDA functions uses the default streams that is different from TensorFlow's, so we need to synchronize in the beginning of all functions. In the future, it might be worth using the same stream as TensorFlow's to improve the performance. Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

codecov · 2023-07-08T01:38:07Z

Codecov Report

Patch coverage has no change and project coverage change: -0.01 ⚠️

Comparison is base (37fd8d1) 78.35% compared to head (92fc01c) 78.35%.

Additional details and impacted files

@@            Coverage Diff             @@
##            devel    #2661      +/-   ##
==========================================
- Coverage   78.35%   78.35%   -0.01%     
==========================================
  Files         235      235              
  Lines       24473    24473              
  Branches     1469     1469              
==========================================
- Hits        19177    19176       -1     
- Misses       4906     4907       +1     
  Partials      390      390

see 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Merge `source/lib/src/cuda` and `source/lib/src/rocm` into `source/lib/src/gpu`. - Define macros `gpuGetLastError`, `gpuDeviceSynchronize`, `gpuMemcpy`, `gpuMemcpyDeviceToHost`, `gpuMemcpyHostToDevice`, and `gpuMemset` to make them available for both CUDA and ROCm. - Use `<<< >>> syntax` for both CUDA and ROCm. Per ROCm/hip@cf78d85, it has been supported in HIP since 2018. - Fix several int const numbers that should be double or float. - For tabulate: - Fix `WARP_SIZE` for ROCm. Per pytorch/pytorch#64302, WARP_SIZE can be 32 or 64, so it should not be hardcoded to 64. - Add `GpuShuffleSync`. Per ROCm/hip#1491, `__shfl_sync` is not supported by HIP. - After merging the code, #1274 should also work for ROCm. - Use the same `ii` for #830 and #2357. Although both of them work, `ii` has different meanings in these two PRs, but now it should be the same. - However, `ii` in `tabulate_fusion_se_a_fifth_order_polynomial` (rocm) added by #2532 is wrong. After merging the codes, it should be corrected. - Optimization in #830 was not applied to ROCm. - `__syncwarp` is not supported by ROCm. - After merging the code, #2661 will be applied to ROCm. Although TF ROCm stream is still blocking (https://github.com/tensorflow/tensorflow/blob/9d1262082e761cd85d6726bcbdfdef331d6d72c6/tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc#L566), we don't know whether it will change to non-blocking. - There are several other differences between CUDA and ROCm. --------- Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

njzjz requested review from denghuilu and wanghan-iapcm July 8, 2023 01:24

github-actions bot added Core CUDA labels Jul 8, 2023

njzjz linked an issue Jul 8, 2023 that may be closed by this pull request

[BUG] [critical] TF v2.13.0 calculates wrong GPU results #2660

Closed

wanghan-iapcm approved these changes Jul 9, 2023

View reviewed changes

wanghan-iapcm merged commit d3d3c18 into deepmodeling:devel Jul 9, 2023

njzjz mentioned this pull request Sep 19, 2023

merge cuda and rocm files #2844

Merged

njzjz mentioned this pull request Sep 26, 2023

List of critical bugs giving incorrect results without error messages #2866

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

synchronize in the beginning of all CUDA functions#2661

synchronize in the beginning of all CUDA functions#2661
wanghan-iapcm merged 1 commit intodeepmodeling:develfrom
njzjz:cuda-synchronize

njzjz commented Jul 8, 2023

Uh oh!

codecov bot commented Jul 8, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

njzjz commented Jul 8, 2023

Uh oh!

codecov bot commented Jul 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jul 8, 2023 •

edited

Loading