Skip to content

[BUG] Custom activation function 'gelu' on CPU is too much slower than 'tanh' and 'gelu_tf'  #2373

@iProzd

Description

@iProzd

Bug summary

Using example water training script on V100 GPU with se_e2_a desciptor, the training time is 1.85s, 3.13s and 32s for tanh, gelu_tf and gelu per 100 steps respectively, when using custom op gelu on CPU. While on GPU, they are 1.52s, 2.75s, 1.66s, respectively.

Should we replace gelu with gelu_tf on cpu?

DeePMD-kit Version

2.1.5

TensorFlow Version

2.6.0

How did you download the software?

Built from source

Input Files, Running Commands, Error Log, etc.

cd example/water/se_e2_a
dp train input.json

I only modified the activation_function in model/descriptor from tanh to gelu and gelu_tf.

Custom ops on CPU:
tanh:

DEEPMD INFO    batch     100 training time 2.94 s, testing time 0.03 s
DEEPMD INFO    batch     200 training time 1.85 s, testing time 0.03 s
DEEPMD INFO    batch     300 training time 1.87 s, testing time 0.04 s
DEEPMD INFO    batch     400 training time 1.84 s, testing time 0.04 s
DEEPMD INFO    batch     500 training time 1.86 s, testing time 0.05 s

gelu_tf:

DEEPMD INFO    batch     100 training time 5.93 s, testing time 0.05 s
DEEPMD INFO    batch     200 training time 3.12 s, testing time 0.05 s
DEEPMD INFO    batch     300 training time 3.13 s, testing time 0.04 s
DEEPMD INFO    batch     400 training time 3.14 s, testing time 0.04 s
DEEPMD INFO    batch     500 training time 3.17 s, testing time 0.04 s

gelu:

DEEPMD INFO    batch     100 training time 33.86 s, testing time 0.59 s
DEEPMD INFO    batch     200 training time 32.72 s, testing time 0.59 s
DEEPMD INFO    batch     300 training time 32.80 s, testing time 0.59 s
DEEPMD INFO    batch     400 training time 32.95 s, testing time 0.57 s
DEEPMD INFO    batch     500 training time 33.14 s, testing time 0.58 s

Custom ops on GPU:
tanh:

DEEPMD INFO    batch     100 training time 2.57 s, testing time 0.02 s
DEEPMD INFO    batch     200 training time 1.48 s, testing time 0.02 s
DEEPMD INFO    batch     300 training time 1.49 s, testing time 0.02 s
DEEPMD INFO    batch     400 training time 1.52 s, testing time 0.02 s
DEEPMD INFO    batch     500 training time 1.49 s, testing time 0.02 s

gelu_tf:

DEEPMD INFO    batch     100 training time 5.35 s, testing time 0.03 s
DEEPMD INFO    batch     200 training time 2.73 s, testing time 0.03 s
DEEPMD INFO    batch     300 training time 2.74 s, testing time 0.03 s
DEEPMD INFO    batch     400 training time 2.75 s, testing time 0.03 s
DEEPMD INFO    batch     500 training time 2.76 s, testing time 0.03 s

gelu:

DEEPMD INFO    batch     100 training time 2.85 s, testing time 0.03 s
DEEPMD INFO    batch     200 training time 1.64 s, testing time 0.02 s
DEEPMD INFO    batch     300 training time 1.68 s, testing time 0.02 s
DEEPMD INFO    batch     400 training time 1.67 s, testing time 0.02 s
DEEPMD INFO    batch     500 training time 1.68 s, testing time 0.03 s

Steps to Reproduce

See above.

Further Information, Files, and Links

No response

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions