-
Notifications
You must be signed in to change notification settings - Fork 599
Description
Summary
I am confused that the Multi-GPU version is much slower than the single GPU.
Deepmd-kit version, installation way, input file, running commands, error log, etc.
How to install: conda install deepmd-kit=2.0.3=*cpu libdeepmd=2.0.3=*cpu lammps-dp=2.0.0 horovod -c https://conda.deepmodeling.org
GPU: NVIDIA Tesla V100 16Gb
In my system, one core has four GPU
Test results
In both cases, the same input file was used:
If with one GPU:
dp train --mpi-log=master input.json 1>> train.log 2>> train.log
the run speed is like this:

horovodrun -np 2 dp train --mpi-log=master input.json 1>> train.log 2>> train.log
the run speed is like this:

Another problem
when training the model with version 2.0.2/3, It took me more than 1 hour to load the data, but it only took me a few mins if with version 2.0.0.b2. why?