Error when running LAMMPS in the devel branch

### Summary

I created a container node registry.dp.tech/dptech/deepmd-kit:3.0.0b3-cuda12.1 using the Bourium platform. Then I installed the devel branch of DeepMD-kit with:
`conda create -n deepmd-dev python=3.10`
`source activate deepmd-dev`
`pip install git+https://github.com/deepmodeling/deepmd-kit.git@devel`
`rsync -a --ignore-existing /opt/deepmd-kit-3.0.0b3/envs/deepmd-dev/ /opt/deepmd-kit-3.0.0b3/`
The command /opt/deepmd-kit-3.0.0b3/bin/dp --version displays: DeePMD-kit v3.0.0b4.dev56+g0b72dae3.
I trained a model using this version of dp, and the training input file is attached. I used dp --pt freeze to get a .pth file. Then, I used this model to run MD simulations with the command /opt/deepmd-kit-3.0.0b3/bin/lmp -i lammps.in. The input.lammps and conf.lmp files are attached.
An error occurs:
[bohrium-11849-1195151:01982] mca_base_component_repository_open: unable to open mca_btl_openib: librdmacm.so.1: cannot open shared object file: No such file or directory (ignored)
LAMMPS (2 Aug 2023)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
DeePMD-kit: Successfully load libcudart.so.11.0
2024-09-24 15:37:29.837816: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-24 15:37:29.837871: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-24 15:37:29.837882: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Loaded 1 plugins from /opt/deepmd-kit-3.0.0b3/lib/deepmd_lmp
Reading data file ...
  triclinic box = (0 0 0) to (12.4447 12.4447 12.4447) with tilt (0 0 0)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  192 atoms
  read_data CPU = 0.003 seconds
DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
Summary of lammps deepmd module ...
  >>> Info of deepmd-kit:
  installed to:       /opt/deepmd-kit-3.0.0b3
  source:             
  source branch:      HEAD
  source commit:      cbf2de6
  source commit at:   2024-07-27 05:11:58 +0000
  support model ver.: 1.1 
  build variant:      cuda
  build with tf inc:  /opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/tensorflow/include;/opt/deepmd-kit-3.0.0b3/include
  build with tf lib:  /opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/tensorflow/libtensorflow_cc.so.2
  build with pt lib:  torch;torch_library;/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/torch/lib/libc10.so;/usr/local/cuda/lib64/stubs/libcuda.so;/usr/local/cuda/lib64/libnvrtc.so;/usr/local/cuda/lib64/libnvToolsExt.so;/usr/local/cuda/lib64/libcudart.so;/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/torch/lib/libc10_cuda.so
  set tf intra_op_parallelism_threads: 0
  set tf inter_op_parallelism_threads: 0
  >>> Info of lammps module:
  use deepmd-kit at:  /opt/deepmd-kit-3.0.0b3load model from: model.pth to cpu 
DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
  >>> Info of model(s):
  using   1 model(s): model.pth 
  rcut in model:      4.5
  ntypes in model:    118

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:
- USER-DEEPMD package:
The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Neighbor list info ...
  update: every = 10 steps, delay = 0 steps, check = no
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 6.5
  ghost atom cutoff = 6.5
  binsize = 3.25, bins = 4 4 4
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair deepmd, perpetual
      attributes: full, newton on
      pair build: full/bin/atomonly
      stencil: full/bin/3d
      bin: standard
Setting up Verlet run ...
  Unit style    : metal
  Current step  : 0
  Time step     : 0.0005
ERROR on proc 0: DeePMD-kit C API Error: DeePMD-kit Error: DeePMD-kit PyTorch backend JIT error: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/deepmd/pt/model/model/ener_model.py", line 56, in forward_lower
    comm_dict: Optional[Dict[str, Tensor]]=None) -> Dict[str, Tensor]:
    _5 = (self).need_sorted_nlist_for_lower()
    model_ret = (self).forward_common_lower(extended_coord, extended_atype, nlist, mapping, fparam, aparam, do_atomic_virial, comm_dict, _5, )
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _6 = (self).get_fitting_net()
    model_predict = annotate(Dict[str, Tensor], {})
  File "code/__torch__/deepmd/pt/model/model/ener_model.py", line 213, in forward_common_lower
    cc_ext, _36, fp, ap, input_prec, = _35
    atomic_model = self.atomic_model
    atomic_ret = (atomic_model).forward_common_atomic(cc_ext, extended_atype, nlist0, mapping, fp, ap, comm_dict, )
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _37 = (self).atomic_output_def()
    training = self.training
  File "code/__torch__/deepmd/pt/model/atomic_model/energy_atomic_model.py", line 50, in forward_common_atomic
    ext_atom_mask = (self).make_atom_mask(extended_atype, )
    _3 = torch.where(ext_atom_mask, extended_atype, 0)
    ret_dict = (self).forward_atomic(extended_coord, _3, nlist, mapping, fparam, aparam, comm_dict, )
                ~~~~~~~~~~~~~~~~~~~~ <--- HERE
    ret_dict0 = (self).apply_out_stat(ret_dict, atype, )
    _4 = torch.slice(torch.slice(ext_atom_mask), 1, None, nloc)
  File "code/__torch__/deepmd/pt/model/atomic_model/energy_atomic_model.py", line 93, in forward_atomic
      pass
    descriptor = self.descriptor
    _16 = (descriptor).forward(extended_coord, extended_atype, nlist, mapping, comm_dict, )
           ~~~~~~~~~~~~~~~~~~~ <--- HERE
    descriptor0, rot_mat, g2, h2, sw, = _16
    fitting_net = self.fitting_net
  File "code/__torch__/deepmd/pt/model/descriptor/dpa2.py", line 98, in forward
    repformers1 = self.repformers
    _17 = nlist_dict[_1(_16, (repformers1).get_nsel(), )]
    _18 = (repformers).forward(_17, extended_coord, extended_atype, g13, mapping0, comm_dict0, )
           ~~~~~~~~~~~~~~~~~~~ <--- HERE
    g14, g2, h2, rot_mat, sw, = _18
    concat_output_tebd = self.concat_output_tebd
  File "code/__torch__/deepmd/pt/model/descriptor/repformers.py", line 364, in forward
  _65 = "border_op is not available since customized PyTorch OP library is not built when freezing the model."
  _66 = uninitialized(Tensor)
  ops.prim.RaiseException(_65, "builtins.NotImplementedError")
  ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  return _66

Traceback of TorchScript, original code (most recent call last):
  File "/opt/deepmd-kit-3.0.0b3/envs/deepmd-dev/lib/python3.10/site-packages/deepmd/pt/model/model/ener_model.py", line 109, in forward_lower
        comm_dict: Optional[Dict[str, torch.Tensor]] = None,
    ):
        model_ret = self.forward_common_lower(
                    ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
            extended_coord,
            extended_atype,
  File "/opt/deepmd-kit-3.0.0b3/envs/deepmd-dev/lib/python3.10/site-packages/deepmd/pt/model/model/make_model.py", line 261, in forward_common_lower
            )
            del extended_coord, fparam, aparam
            atomic_ret = self.atomic_model.forward_common_atomic(
                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
                cc_ext,
                extended_atype,
  File "/opt/deepmd-kit-3.0.0b3/envs/deepmd-dev/lib/python3.10/site-packages/deepmd/pt/model/atomic_model/base_atomic_model.py", line 242, in forward_common_atomic
    
        ext_atom_mask = self.make_atom_mask(extended_atype)
        ret_dict = self.forward_atomic(
                   ~~~~~~~~~~~~~~~~~~~ <--- HERE
            extended_coord,
            torch.where(ext_atom_mask, extended_atype, 0),
  File "/opt/deepmd-kit-3.0.0b3/envs/deepmd-dev/lib/python3.10/site-packages/deepmd/pt/model/atomic_model/dp_atomic_model.py", line 189, in forward_atomic
        if self.do_grad_r() or self.do_grad_c():
            extended_coord.requires_grad_(True)
        descriptor, rot_mat, g2, h2, sw = self.descriptor(
                                          ~~~~~~~~~~~~~~~ <--- HERE
            extended_coord,
            extended_atype,
  File "/opt/deepmd-kit-3.0.0b3/envs/deepmd-dev/lib/python3.10/site-packages/deepmd/pt/model/descriptor/dpa2.py", line 799, in forward
            g1 = g1_ext
        # repformer
        g1, g2, h2, rot_mat, sw = self.repformers(
                                  ~~~~~~~~~~~~~~~ <--- HERE
            nlist_dict[
                get_multiple_nlist_key(
  File "/opt/deepmd-kit-3.0.0b3/envs/deepmd-dev/lib/python3.10/site-packages/deepmd/pt/model/descriptor/repformers.py", line 62, in forward
        argument8,
    ) -> torch.Tensor:
        raise NotImplementedError(
        ~~~~~~~~~~~~~~~~~~~~~~~~~~
            "border_op is not available since customized PyTorch OP library is not built when freezing the model."
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        )
builtins.NotImplementedError: border_op is not available since customized PyTorch OP library is not built when freezing the model.
 (/home/conda/feedstock_root/build_artifacts/deepmd-kit_1722057353391/work/source/lmp/pair_deepmd.cpp:586)
Last command: run             1000
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------


### DeePMD-kit Version

DeePMD-kit v3.0.0b4.dev56+g0b72dae3

### Backend and its version

PyTorch v2.4.1+cu121-g38b96d3399a

### Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

_No response_

### Details

[input.zip](https://github.com/user-attachments/files/17110378/input.zip)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when running LAMMPS in the devel branch #4161

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error when running LAMMPS in the devel branch #4161

Description

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions