Skip to content

GPU memory issue while executing lmp #4182

@mayankaditya

Description

@mayankaditya

Summary

Dear DeePMD developers,

I am running a machine-learned MD using the DeePMD package from docker. I am facing a memory issue while executing the LAMMPS simulations (provided with the DeePMD package) in the GPU server.
I am getting the following error:
2024-10-04 10:22:31.041258: W tensorflow/core/common_runtime/bfc_allocator.cc:474] _____***********************
2024-10-04 10:22:31.041306: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:681 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[28800000,50] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

Please take a look at the attached script file and advise.

DeePMD-kit Version

2.1.1

Backend and its version

TensorFlow Version 2.8.0

Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

No response

Details

Dear DeePMD developers,

I am running a machine-learned MD using the DeePMD package.
I am facing a memory issue while executing the LAMMPS simulations (provided with the DeePMD package) in the GPU server.
I am getting the following error:
2024-10-04 10:22:31.041258: W tensorflow/core/common_runtime/bfc_allocator.cc:474] _____***********************
2024-10-04 10:22:31.041306: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at matmul_op_impl.h:681 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[28800000,50] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

Please take a look at the attached script file and advise.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions