do some small optimization to ops by njzjz · Pull Request #943 · deepmodeling/deepmd-kit

njzjz · 2021-08-10T01:43:12Z

avoid concat or add in loops. Instead, append tensors to a list, and concat or accumulate_n after loops

1. avoid concat or add in loops. Instead, append tensors to a list, and concat or accumulate_n after loops 2. remove a duplicated reshape

codecov-commenter · 2021-08-10T01:47:29Z

Codecov Report

Merging #943 (30f8e7c) into devel (0d8fe0a) will increase coverage by 0.01%.
The diff coverage is 81.25%.

@@            Coverage Diff             @@
##            devel     #943      +/-   ##
==========================================
+ Coverage   75.67%   75.68%   +0.01%     
==========================================
  Files          92       92              
  Lines        7671     7671              
==========================================
+ Hits         5805     5806       +1     
+ Misses       1866     1865       -1

Impacted Files	Coverage Δ
deepmd/fit/polar.py	`49.75% <50.00%> (+0.48%)`	⬆️
deepmd/descriptor/se_a.py	`94.17% <100.00%> (ø)`
deepmd/fit/dipole.py	`93.24% <100.00%> (ø)`
deepmd/fit/ener.py	`90.90% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d8fe0a...30f8e7c. Read the comment docs.

amcadmus · 2021-08-10T02:01:45Z

Have you benchmarked these optimization? Do they help improving the efficiency?

njzjz · 2021-08-10T02:58:19Z

Have you benchmarked these optimization? Do they help improving the efficiency?

I just benchmarked it. The answer is no😂

njzjz · 2022-01-13T13:51:25Z

I think these optimizations may be more important to CPUs, compared to GPUs. I will recheck this PR.

njzjz · 2022-01-14T07:36:32Z

Do some profiling here:
(1) + vs accumulate_n
+ one by one has more ops than accumulate_n once.

+

accumulate_n

(2) concat

wanghan-iapcm · 2022-01-15T05:29:01Z

deepmd/descriptor/se_a.py

                      bavg = bavg,
                      trainable = trainable,
                      suffix = "_"+str(type_i))
-                  if type_i == 0:


Did we have a bug here? if type_i == 0 and (type_input, type_i) in self.exclude_types we had ret accumulated.

wanghan-iapcm · 2022-01-15T05:30:31Z

@denghuilu Would the revised code be faster on GPUs?

njzjz · 2022-01-16T16:14:37Z

I think one cannot see any difference if there are only one or two elements. A system with at least 10 atom types should be tested.

denghuilu · 2022-02-10T03:09:13Z

There is a slight performance penalty on V100 GPU with the water benchmark system:

optimize-ops branch


DEEPMD INFO    batch     100 training time 3.36 s, testing time 2.34 s
DEEPMD INFO    batch     200 training time 1.73 s, testing time 2.32 s
DEEPMD INFO    batch     300 training time 1.75 s, testing time 2.32 s
DEEPMD INFO    batch     400 training time 1.73 s, testing time 2.41 s
DEEPMD INFO    batch     500 training time 1.72 s, testing time 2.37 s
DEEPMD INFO    batch     600 training time 1.74 s, testing time 2.36 s
DEEPMD INFO    batch     700 training time 1.76 s, testing time 2.43 s
DEEPMD INFO    batch     800 training time 1.77 s, testing time 2.48 s
DEEPMD INFO    batch     900 training time 1.75 s, testing time 2.47 s
DEEPMD INFO    batch    1000 training time 1.72 s, testing time 2.41 s

devel branch

DEEPMD INFO    batch     100 training time 3.03 s, testing time 0.02 s
DEEPMD INFO    batch     200 training time 1.60 s, testing time 0.02 s
DEEPMD INFO    batch     300 training time 1.63 s, testing time 0.02 s
DEEPMD INFO    batch     400 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     500 training time 1.58 s, testing time 0.02 s
DEEPMD INFO    batch     600 training time 1.62 s, testing time 0.02 s
DEEPMD INFO    batch     700 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     800 training time 1.58 s, testing time 0.02 s
DEEPMD INFO    batch     900 training time 1.60 s, testing time 0.02 s

Maybe the GPU implementation did not use the stream parallelization.

wanghan-iapcm · 2022-02-10T05:08:55Z

There is a slight performance penalty on V100 GPU with the water benchmark system:

optimize-ops branch


DEEPMD INFO    batch     100 training time 3.36 s, testing time 2.34 s
DEEPMD INFO    batch     200 training time 1.73 s, testing time 2.32 s
DEEPMD INFO    batch     300 training time 1.75 s, testing time 2.32 s
DEEPMD INFO    batch     400 training time 1.73 s, testing time 2.41 s
DEEPMD INFO    batch     500 training time 1.72 s, testing time 2.37 s
DEEPMD INFO    batch     600 training time 1.74 s, testing time 2.36 s
DEEPMD INFO    batch     700 training time 1.76 s, testing time 2.43 s
DEEPMD INFO    batch     800 training time 1.77 s, testing time 2.48 s
DEEPMD INFO    batch     900 training time 1.75 s, testing time 2.47 s
DEEPMD INFO    batch    1000 training time 1.72 s, testing time 2.41 s

devel branch

DEEPMD INFO    batch     100 training time 3.03 s, testing time 0.02 s
DEEPMD INFO    batch     200 training time 1.60 s, testing time 0.02 s
DEEPMD INFO    batch     300 training time 1.63 s, testing time 0.02 s
DEEPMD INFO    batch     400 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     500 training time 1.58 s, testing time 0.02 s
DEEPMD INFO    batch     600 training time 1.62 s, testing time 0.02 s
DEEPMD INFO    batch     700 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     800 training time 1.58 s, testing time 0.02 s
DEEPMD INFO    batch     900 training time 1.60 s, testing time 0.02 s

Maybe the GPU implementation did not use the stream parallelization.

Why the testing time of optimize-ops is so long?

njzjz · 2022-02-10T05:18:38Z

Why the testing time of optimize-ops is so long?

It was fixed by #1419 -- this branch is behind devel.

merge from devel

denghuilu · 2022-02-10T07:30:46Z

It did have some benefits:

DEEPMD INFO    batch     200 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     300 training time 1.56 s, testing time 0.02 s
DEEPMD INFO    batch     400 training time 1.57 s, testing time 0.02 s
DEEPMD INFO    batch     500 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     600 training time 1.59 s, testing time 0.02 s
DEEPMD INFO    batch     700 training time 1.60 s, testing time 0.02 s
DEEPMD INFO    batch     800 training time 1.60 s, testing time 0.02 s
DEEPMD INFO    batch     900 training time 1.60 s, testing time 0.02 s
DEEPMD INFO    batch    1000 training time 1.57 s, testing time 0.02 s

do some small optimization to ops

e72e27b

1. avoid concat or add in loops. Instead, append tensors to a list, and concat or accumulate_n after loops 2. remove a duplicated reshape

amcadmus requested a review from denghuilu August 10, 2021 02:01

njzjz marked this pull request as draft August 10, 2021 02:40

njzjz closed this Aug 10, 2021

njzjz reopened this Jan 13, 2022

njzjz removed the request for review from denghuilu January 13, 2022 13:50

njzjz added 3 commits January 14, 2022 01:52

Merge branch 'devel' into optimize-ops

fdc9436

revert unnecessary changes

0385c47

revert wfc.py as it has been decrepated

1f05364

njzjz marked this pull request as ready for review January 14, 2022 07:36

wanghan-iapcm reviewed Jan 15, 2022

View reviewed changes

wanghan-iapcm requested a review from denghuilu January 15, 2022 05:29

Merge pull request #77 from deepmodeling/devel

30f8e7c

merge from devel

wanghan-iapcm approved these changes Feb 11, 2022

View reviewed changes

wanghan-iapcm merged commit 82c787d into deepmodeling:devel Feb 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

do some small optimization to ops#943

do some small optimization to ops#943
wanghan-iapcm merged 5 commits intodeepmodeling:develfrom
njzjz:optimize-ops

njzjz commented Aug 10, 2021 •

edited

Loading

Uh oh!

codecov-commenter commented Aug 10, 2021 •

edited

Loading

Uh oh!

amcadmus commented Aug 10, 2021

Uh oh!

njzjz commented Aug 10, 2021

Uh oh!

njzjz commented Jan 13, 2022

Uh oh!

njzjz commented Jan 14, 2022

Uh oh!

wanghan-iapcm Jan 15, 2022

Uh oh!

wanghan-iapcm commented Jan 15, 2022 •

edited

Loading

Uh oh!

njzjz commented Jan 16, 2022 •

edited

Loading

Uh oh!

denghuilu commented Feb 10, 2022

Uh oh!

wanghan-iapcm commented Feb 10, 2022

Uh oh!

njzjz commented Feb 10, 2022

Uh oh!

denghuilu commented Feb 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

njzjz commented Aug 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Aug 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

amcadmus commented Aug 10, 2021

Uh oh!

njzjz commented Aug 10, 2021

Uh oh!

njzjz commented Jan 13, 2022

Uh oh!

njzjz commented Jan 14, 2022

Uh oh!

wanghan-iapcm Jan 15, 2022

Choose a reason for hiding this comment

Uh oh!

wanghan-iapcm commented Jan 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njzjz commented Jan 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

denghuilu commented Feb 10, 2022

Uh oh!

wanghan-iapcm commented Feb 10, 2022

Uh oh!

njzjz commented Feb 10, 2022

Uh oh!

denghuilu commented Feb 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

njzjz commented Aug 10, 2021 •

edited

Loading

codecov-commenter commented Aug 10, 2021 •

edited

Loading

wanghan-iapcm commented Jan 15, 2022 •

edited

Loading

njzjz commented Jan 16, 2022 •

edited

Loading