Skip to content

[BUG] RuntimeError when doing dp --pt change-bias  #4036

@njzjz

Description

@njzjz

Discussed in #4033

Originally posted by BianTieyuan July 29, 2024
Hi developers,
Glad to see the release of new version of DPA-2 and deepmd kit. I tried to perform zero-shot testing following the steps in https://www.aissquare.com/models/detail?pageType=models&name=DPA-2.2.0-v3.0.0b3&id=272. But some error occurs. Here are my steps:

  1. Install the deepmd-kit v3.0.0b3 via offline package
  2. 0 steps training for GST_GAP_22 dataset using dp --pt change-bias OpenLAM_2.2.0_27heads_beta3.pt -s GST_GAP_22/ --model-branch Domains_SemiCond
  3. Error occurs
(base) [polyucmp@localhost DPA-2-2024Q2]$ dp --pt change-bias OpenLAM_2.2.0_27heads_beta3.pt -s GST_GAP_22 --model-branch Domains_SemiCond
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
[2024-07-30 11:17:30,839] DEEPMD INFO    DeePMD version: 3.0.0b3
[2024-07-30 11:17:32,824] DEEPMD INFO    Changing out bias for model Domains_SemiCond.
[2024-07-30 11:17:34,832] DEEPMD INFO    Packing data for statistics from 89 systems
[2024-07-30 11:17:38,850] DEEPMD INFO    If you encounter the error 'an illegal memory access was encountered', this may be due to a TensorFlow issue. To avoid this, set the environment variable DP_INFER_BATCH_SIZE to a smaller value than the last adjusted batch size. The environment variable DP_INFER_BATCH_SIZE controls the inference batch size (nframes * natoms).
[2024-07-30 11:17:39,318] DEEPMD INFO    Adjust batch size from 1024 to 2048
[2024-07-30 11:17:39,960] DEEPMD INFO    Adjust batch size from 2048 to 4096
[2024-07-30 11:17:42,621] DEEPMD INFO    Adjust batch size from 4096 to 8192
[2024-07-30 11:17:44,142] DEEPMD INFO    Adjust batch size from 8192 to 16384
[2024-07-30 11:17:53,832] DEEPMD INFO    Adjust batch size from 16384 to 8192
Traceback (most recent call last):
  File "/run/media/polyucmp/hdd1/BIAN_Tieyuan/software/deepmd-v3.0.0b3/build/bin/dp", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/run/media/polyucmp/hdd1/BIAN_Tieyuan/software/deepmd-v3.0.0b3/build/lib/python3.11/site-packages/deepmd/main.py", line 923, in main
    deepmd_main(args)
  File "/run/media/polyucmp/hdd1/BIAN_Tieyuan/software/deepmd-v3.0.0b3/build/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/run/media/polyucmp/hdd1/BIAN_Tieyuan/software/deepmd-v3.0.0b3/build/lib/python3.11/site-packages/deepmd/pt/entrypoints/main.py", line 575, in main
    change_bias(FLAGS)
  File "/run/media/polyucmp/hdd1/BIAN_Tieyuan/software/deepmd-v3.0.0b3/build/lib/python3.11/site-packages/deepmd/pt/entrypoints/main.py", line 511, in change_bias
    updated_model = training.model_change_out_bias(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/polyucmp/hdd1/BIAN_Tieyuan/software/deepmd-v3.0.0b3/build/lib/python3.11/site-packages/deepmd/pt/train/training.py", line 1277, in model_change_out_bias
    _model.change_out_bias(
  File "/run/media/polyucmp/hdd1/BIAN_Tieyuan/software/deepmd-v3.0.0b3/build/lib/python3.11/site-packages/deepmd/pt/model/model/make_model.py", line 203, in change_out_bias
    self.atomic_model.change_out_bias(
  File "/run/media/polyucmp/hdd1/BIAN_Tieyuan/software/deepmd-v3.0.0b3/build/lib/python3.11/site-packages/deepmd/pt/model/atomic_model/base_atomic_model.py", line 453, in change_out_bias
    delta_bias, out_std = compute_output_stats(
                          ^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/polyucmp/hdd1/BIAN_Tieyuan/software/deepmd-v3.0.0b3/build/lib/python3.11/site-packages/deepmd/pt/utils/stat.py", line 282, in compute_output_stats
    model_pred = _compute_model_predict(sampled, keys, model_forward)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/polyucmp/hdd1/BIAN_Tieyuan/software/deepmd-v3.0.0b3/build/lib/python3.11/site-packages/deepmd/pt/utils/stat.py", line 173, in _compute_model_predict
    sample_predict = model_forward_auto_batch_size(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/polyucmp/hdd1/BIAN_Tieyuan/software/deepmd-v3.0.0b3/build/lib/python3.11/site-packages/deepmd/pt/utils/stat.py", line 165, in model_forward_auto_batch_size
    return auto_batch_size.execute_all(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/polyucmp/hdd1/BIAN_Tieyuan/software/deepmd-v3.0.0b3/build/lib/python3.11/site-packages/deepmd/pt/utils/auto_batch_size.py", line 153, in execute_all
    r_list = [concate_result(r) for r in zip(*results)]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/polyucmp/hdd1/BIAN_Tieyuan/software/deepmd-v3.0.0b3/build/lib/python3.11/site-packages/deepmd/pt/utils/auto_batch_size.py", line 153, in <listcomp>
    r_list = [concate_result(r) for r in zip(*results)]
              ^^^^^^^^^^^^^^^^^
  File "/run/media/polyucmp/hdd1/BIAN_Tieyuan/software/deepmd-v3.0.0b3/build/lib/python3.11/site-packages/deepmd/pt/utils/auto_batch_size.py", line 149, in concate_result
    raise RuntimeError(f"Unexpected result type {type(r[0])}")
RuntimeError: Unexpected result type <class 'dict'>

Does this version pf deepmd-kit have bugs?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugreproducedThis bug has been reproduced by developers

    Type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions