Skip to content

[BUG] Training with activation function 'gelu' on CPU results in nan #915

@liangadam

Description

@liangadam

Summary

In a SE_A descriptor's example, when we choose the activation function of the descriptor in input.json to be 'gelu' and train it on CPU, we will get 'nan' in lcurve and in dp test's result.

Deepmd-kit version, installation way, input file, running commands, error log, etc.

version: v2.0.0-beta.4 (devel branch)
installation way: build python interface(tensorflow) and pip install deepmd-kit in a conda environment.
input file: example/water/se_e2_a/input.json ,and specify the activation function of descriptor is gelu.
running command: dp train input.json
error log: nan in lcurve
detail: run it on CPU, without command 'export DP_VARIANT=cuda'

platform: Lebesgue base image

Steps to Reproduce
In example/water/se_e2_a open input.json ,and specify the activation function of descriptor is gelu, then run 'dp train input.json' on CPU.

Further Information, Files, and Links

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions