[BUG] Nbor list sorting error in lammps with the compressed model

**Summary**

Using the compressed model in Lammps with multiple GPUs leads to "illegal nbor list sorting" error, single GPU does not have this issue.

**Deepmd-kit version, installation way, input file, running commands, error log, etc.**
System: CentOS Linux 7 (Core) with slurm
deepmd-kit:  2.0.0.b0 py39_0_cuda10.1_gpu deepmodeling/label/dev
lammps-dp:  2.0.0.b0 0_cuda10.1_gpu deepmodeling/label/dev
python: 3.9.4 hdb3f193_0
installation: conda 4.10.1
command: srun -n 16 lmp -in in.lammps
Input & output file including:
in.lammps
graph.pb (model not compressed)
graph-compress.pb (after compression)
log for single GPU
log for multiple GPU with srun
log for multiple GPU with mpirun
the model training parameters
g6_sub.lammps -- this is a small test structure
hex_loop_2_new.lammps -- this is a large structure

[Archive.zip](https://github.com/deepmodeling/deepmd-kit/files/6680086/Archive.zip)

**Steps to Reproduce**
1. `srun -n 16 lmp -in in.lammps` with the compressed model will yield illegal nbor list sorting, so does mpirun
2. `lmp -in in.lammps` with compressed model and single GPU can run. 
3. `srun -n 16 lmp -in in.lammps` with the model not compressed and multiple GPUs can also run
4. But in all cases, the output (both mc and md) does not update in the log while dump is working. 

**Further Information, Files, and Links**
For the large structure, I have 58673 atoms in the box and run with 16 V100 GPUs. Running with a not compressed model will give CUDA out of memory error. I am wondering what would be a good estimation for the number of GPUs/atom?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Nbor list sorting error in lammps with the compressed model #773

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Nbor list sorting error in lammps with the compressed model #773

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions