[BUG] Training wall time is abnormally long when sets contain many systems

### Bug summary

Summary
effectively the same sets (~80000 frames)
the same other params in input
single GPU

1)~80000 frames in ~50000 systems, task takes 52 hours,
2)~80000 frames in ~17 systems takes 18 hours, (`type_mixed` is used to collect the data)

### DeePMD-kit Version

DeePMD-kit v2.1.5

### TensorFlow Version

2.9.0

### How did you download the software?

Offline packages

### Input Files, Running Commands, Error Log, etc.

discussed with and sets of 1) has been set to @iProzd previousely,
it's said I/O should not influence the training time after data statistics

~4 hours before the training actually started (data statistics, and lcurve.out starts to write)
"training time" in logs of both cases are effectively the same, note the disp_freq are 100 times larger for 1)

training time for 1)
[train_origin.log](https://github.com/deepmodeling/deepmd-kit/files/10374219/train_origin.log)
```
...
DEEPMD INFO    batch 7800000 training time 1580.50 s, testing time 0.00 s
DEEPMD INFO    batch 8000000 training time 1569.11 s, testing time 0.00 s
...
DEEPMD INFO    wall time: 188106.747 s
```

training time for 2)
[train_typeSel.log](https://github.com/deepmodeling/deepmd-kit/files/10374222/train_typeSel.log)
```
...
DEEPMD INFO    batch 7998000 training time 15.41 s, testing time 0.00 s
DEEPMD INFO    batch 8000000 training time 15.60 s, testing time 0.00 s
...
DEEPMD INFO    wall time: 65437.235 s
```

### Steps to Reproduce

dp train 

### Further Information, Files, and Links

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Training wall time is abnormally long when sets contain many systems #2229

Bug summary

DeePMD-kit Version

TensorFlow Version

How did you download the software?

Input Files, Running Commands, Error Log, etc.

Steps to Reproduce

Further Information, Files, and Links

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Training wall time is abnormally long when sets contain many systems #2229

Description

Bug summary

DeePMD-kit Version

TensorFlow Version

How did you download the software?

Input Files, Running Commands, Error Log, etc.

Steps to Reproduce

Further Information, Files, and Links

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions