Skip to content

replace GPU 1./sqrt with rsqrt#1741

Merged
wanghan-iapcm merged 3 commits intodeepmodeling:develfrom
njzjz:cuda-rsqrt
Jun 7, 2022
Merged

replace GPU 1./sqrt with rsqrt#1741
wanghan-iapcm merged 3 commits intodeepmodeling:develfrom
njzjz:cuda-rsqrt

Conversation

@njzjz
Copy link
Member

@njzjz njzjz commented Jun 4, 2022

Per NVIDIA doc:

11.1.3. Reciprocal Square Root
The reciprocal square root should always be invoked explicitly as rsqrtf() for single precision and rsqrt() for double precision. The compiler optimizes 1.0f/sqrtf(x) into rsqrtf() only when this does not violate IEEE-754 semantics.

See https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#reciprocal-square-root

Per NVIDIA doc:
> 11.1.3. Reciprocal Square Root
> The reciprocal square root should always be invoked explicitly as rsqrtf() for single precision and rsqrt() for double precision. The compiler optimizes 1.0f/sqrtf(x) into rsqrtf() only when this does not violate IEEE-754 semantics.

See https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#reciprocal-square-root
@codecov-commenter
Copy link

codecov-commenter commented Jun 4, 2022

Codecov Report

Merging #1741 (486fb8c) into devel (41af8e0) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##            devel    #1741   +/-   ##
=======================================
  Coverage   76.14%   76.14%           
=======================================
  Files          96       96           
  Lines        7939     7939           
=======================================
  Hits         6045     6045           
  Misses       1894     1894           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 41af8e0...486fb8c. Read the comment docs.

@njzjz
Copy link
Member Author

njzjz commented Jun 4, 2022

The unit test with CUDA passed as expected.

@github-actions github-actions bot added the ROCM label Jun 4, 2022
@njzjz njzjz changed the title replace CUDA 1./sqrt with rsqrt replace GPU 1./sqrt with rsqrt Jun 5, 2022
Copy link
Member

@denghuilu denghuilu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test with ROCm also passed as expected.

@wanghan-iapcm wanghan-iapcm merged commit ec1e816 into deepmodeling:devel Jun 7, 2022
mingzhong15 pushed a commit to mingzhong15/deepmd-kit that referenced this pull request Jan 15, 2023
* replace 1./sqrt with rsqrt

Per NVIDIA doc:
> 11.1.3. Reciprocal Square Root
> The reciprocal square root should always be invoked explicitly as rsqrtf() for single precision and rsqrt() for double precision. The compiler optimizes 1.0f/sqrtf(x) into rsqrtf() only when this does not violate IEEE-754 semantics.

See https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#reciprocal-square-root

* remove FPTYPE as it has been FPTYPE

* apply the same opt for ROCM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants