[BUG] _Multi-GPU version is much slower than the single GPU

**Summary**
I am confused that the Multi-GPU version is much slower than the single GPU.

**Deepmd-kit version, installation way, input file, running commands, error log, etc.**
How to install： conda install deepmd-kit=2.0.3=*cpu libdeepmd=2.0.3=*cpu lammps-dp=2.0.0 horovod -c https://conda.deepmodeling.org
GPU: NVIDIA Tesla V100 16Gb
In my system, one core has four GPU

### Test results
In both cases, the same input file was used:
If with one GPU:
`dp train --mpi-log=master input.json   1>> train.log 2>> train.log`
the run speed is like this:
![image](https://user-images.githubusercontent.com/50021753/141460838-b930af0a-5959-4f89-9449-c83889878d19.png)

horovodrun -np 2  dp train --mpi-log=master input.json   1>> train.log 2>> train.log
the run speed is like this:
![image](https://user-images.githubusercontent.com/50021753/141461241-343c3c63-3071-49c6-8c60-37df76d2f543.png)

### Another problem
when training the model with version 2.0.2/3, It took me more than 1 hour to load the data, but it only took me a few mins if with version 2.0.0.b2.  why?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] _Multi-GPU version is much slower than the single GPU #1284

Test results

Another problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] _Multi-GPU version is much slower than the single GPU #1284

Description

Test results

Another problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions