Summary
Dear DeePMD developers,
I am running a machine-learned MD using the DeePMD package from docker. I am facing a memory issue while executing the LAMMPS simulations (provided with the DeePMD package) in the GPU server.
I am getting the following error:
2024-10-04 10:22:31.041258: W tensorflow/core/common_runtime/bfc_allocator.cc:474] _____***********************
2024-10-04 10:22:31.041306: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:681 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[28800000,50] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Please take a look at the attached script file and advise.
DeePMD-kit Version
2.1.1
Backend and its version
TensorFlow Version 2.8.0
Python Version, CUDA Version, GCC Version, LAMMPS Version, etc
No response
Details
Dear DeePMD developers,
I am running a machine-learned MD using the DeePMD package.
I am facing a memory issue while executing the LAMMPS simulations (provided with the DeePMD package) in the GPU server.
I am getting the following error:
2024-10-04 10:22:31.041258: W tensorflow/core/common_runtime/bfc_allocator.cc:474] _____***********************
2024-10-04 10:22:31.041306: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:681 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[28800000,50] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Please take a look at the attached script file and advise.