Skip to content

use float constants and functions in float functions#1647

Merged
wanghan-iapcm merged 11 commits intodeepmodeling:develfrom
njzjz:float
Apr 28, 2022
Merged

use float constants and functions in float functions#1647
wanghan-iapcm merged 11 commits intodeepmodeling:develfrom
njzjz:float

Conversation

@njzjz
Copy link
Member

@njzjz njzjz commented Apr 19, 2022

Before this commit, almost all functions use double constants even under float precision. We know that a float will cast to double when adding/multiplying double.
This PR also uses sqrtf and fmodf for GPU device float functions, instead of double sqrt and fmod.

@njzjz njzjz marked this pull request as draft April 19, 2022 01:10
@codecov-commenter
Copy link

codecov-commenter commented Apr 19, 2022

Codecov Report

Merging #1647 (90cb99f) into devel (de7ba72) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##            devel    #1647   +/-   ##
=======================================
  Coverage   76.15%   76.15%           
=======================================
  Files          94       94           
  Lines        7850     7850           
=======================================
  Hits         5978     5978           
  Misses       1872     1872           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de7ba72...90cb99f. Read the comment docs.

@njzjz njzjz marked this pull request as ready for review April 19, 2022 02:17
@njzjz
Copy link
Member Author

njzjz commented Apr 19, 2022

CI failure also happens on the devel... Not sure what happened.

@wanghan-iapcm
Copy link
Collaborator

Does your revision pass the UTs on GPU?

@njzjz
Copy link
Member Author

njzjz commented Apr 20, 2022

Does your revision pass the UTs on GPU?

I cannot build the UTs on GPU even on the devel branch. Here is the error message:

/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc: In member function 'virtual void TestTabulateSeR_tabulate_fusion_se_r_grad_gpu_cuda_Test::TestBody()':
/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc:125:5: error: 'dy_dem_dev' was not declared in this scope
   * dy_dem_dev = NULL, * table_dev = NULL, * em_dev = NULL, * dy_dev = NULL;
     ^~~~~~~~~~
/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc:125:5: note: suggested alternative: 'dy_dem'
   * dy_dem_dev = NULL, * table_dev = NULL, * em_dev = NULL, * dy_dev = NULL;
     ^~~~~~~~~~
     dy_dem
/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc:125:26: error: 'table_dev' was not declared in this scope
   * dy_dem_dev = NULL, * table_dev = NULL, * em_dev = NULL, * dy_dev = NULL;
                          ^~~~~~~~~
/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc:125:26: note: suggested alternative: 'table'
   * dy_dem_dev = NULL, * table_dev = NULL, * em_dev = NULL, * dy_dev = NULL;
                          ^~~~~~~~~
                          table
/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc:125:46: error: 'em_dev' was not declared in this scope
   * dy_dem_dev = NULL, * table_dev = NULL, * em_dev = NULL, * dy_dev = NULL;
                                              ^~~~~~
/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc:125:46: note: suggested alternative: 'dy_dem'
   * dy_dem_dev = NULL, * table_dev = NULL, * em_dev = NULL, * dy_dev = NULL;
                                              ^~~~~~
                                              dy_dem
/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc:125:63: error: 'dy_dev' was not declared in this scope
   * dy_dem_dev = NULL, * table_dev = NULL, * em_dev = NULL, * dy_dev = NULL;
                                                               ^~~~~~
/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc:125:63: note: suggested alternative: 'dy_dem'
   * dy_dem_dev = NULL, * table_dev = NULL, * em_dev = NULL, * dy_dev = NULL;
                                                               ^~~~~~
                                                               dy_dem
make[2]: *** [CMakeFiles/runUnitTests.dir/build.make:426: CMakeFiles/runUnitTests.dir/test_tabulate_se_r.cc.o] Error 1

@wanghan-iapcm
Copy link
Collaborator

Does your revision pass the UTs on GPU?

I cannot build the UTs on GPU even on the devel branch. Here is the error message:

/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc: In member function 'virtual void TestTabulateSeR_tabulate_fusion_se_r_grad_gpu_cuda_Test::TestBody()':
/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc:125:5: error: 'dy_dem_dev' was not declared in this scope
   * dy_dem_dev = NULL, * table_dev = NULL, * em_dev = NULL, * dy_dev = NULL;
     ^~~~~~~~~~
/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc:125:5: note: suggested alternative: 'dy_dem'
   * dy_dem_dev = NULL, * table_dev = NULL, * em_dev = NULL, * dy_dev = NULL;
     ^~~~~~~~~~
     dy_dem
/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc:125:26: error: 'table_dev' was not declared in this scope
   * dy_dem_dev = NULL, * table_dev = NULL, * em_dev = NULL, * dy_dev = NULL;
                          ^~~~~~~~~
/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc:125:26: note: suggested alternative: 'table'
   * dy_dem_dev = NULL, * table_dev = NULL, * em_dev = NULL, * dy_dev = NULL;
                          ^~~~~~~~~
                          table
/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc:125:46: error: 'em_dev' was not declared in this scope
   * dy_dem_dev = NULL, * table_dev = NULL, * em_dev = NULL, * dy_dev = NULL;
                                              ^~~~~~
/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc:125:46: note: suggested alternative: 'dy_dem'
   * dy_dem_dev = NULL, * table_dev = NULL, * em_dev = NULL, * dy_dev = NULL;
                                              ^~~~~~
                                              dy_dem
/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc:125:63: error: 'dy_dev' was not declared in this scope
   * dy_dem_dev = NULL, * table_dev = NULL, * em_dev = NULL, * dy_dev = NULL;
                                                               ^~~~~~
/home/jz748/codes/deepmd-kit/source/lib/tests/test_tabulate_se_r.cc:125:63: note: suggested alternative: 'dy_dem'
   * dy_dem_dev = NULL, * table_dev = NULL, * em_dev = NULL, * dy_dev = NULL;
                                                               ^~~~~~
                                                               dy_dem
make[2]: *** [CMakeFiles/runUnitTests.dir/build.make:426: CMakeFiles/runUnitTests.dir/test_tabulate_se_r.cc.o] Error 1

Should be fixed by #1651

@njzjz
Copy link
Member Author

njzjz commented Apr 22, 2022

Does your revision pass the UTs on GPU?

Yes.

Co-authored-by: Denghui Lu <denghuilu@pku.edu.cn>
Copy link
Member

@denghuilu denghuilu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All UTs within the api_cc and lib folders have passed in an ROCm GPU environment, except this one:

Not found: Op type not registered 'MatrixDiagV3' in binary running on j17r2n14. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
unknown file: Failure
C++ exception with description "DeePMD-kit Error: TensorFlow Error: Not found: Op type not registered 'MatrixDiagV3' in binary running on j17r2n14. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed." thrown in SetUp().
[  FAILED  ] TestInferDeepPolarNew.cpu_build_nlist (378 ms)

It's a TF library problem obviously, so it should have little effect on the correctness of the test program.

@wanghan-iapcm wanghan-iapcm merged commit 18ac81f into deepmodeling:devel Apr 28, 2022
@njzjz njzjz deleted the float branch October 11, 2022 22:41
njzjz added a commit to njzjz/deepmd-kit that referenced this pull request Oct 11, 2022
Same as deepmodeling#1647, but a function was missing.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
wanghan-iapcm pushed a commit that referenced this pull request Oct 12, 2022
Same as #1647, but a function was missing.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
mingzhong15 pushed a commit to mingzhong15/deepmd-kit that referenced this pull request Jan 15, 2023
Same as deepmodeling#1647, but a function was missing.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants