Skip to content

[BUG] PT parallel training print summary on each node #4595

@njzjz

Description

@njzjz

Bug summary

It should be only printed once, i.e. on the rank 0.

DeePMD-kit Version

v3.0.1

Backend and its version

PyTorch v2.4.1.post302

How did you download the software?

Offline packages

Input Files, Running Commands, Error Log, etc.

[2025-02-10 17:58:17,472] DEEPMD INFO     _____               _____   __  __  _____           _     _  _
[2025-02-10 17:58:17,472] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |
[2025-02-10 17:58:17,472] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_
[2025-02-10 17:58:17,472] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2025-02-10 17:58:17,472] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_
[2025-02-10 17:58:17,472] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2025-02-10 17:58:17,473] DEEPMD INFO    Please read and cite:
[2025-02-10 17:58:17,473] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2025-02-10 17:58:17,473] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2025-02-10 17:58:17,473] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2025-02-10 17:58:17,473] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,473] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2025-02-10 17:58:17,473] DEEPMD INFO    source:
[2025-02-10 17:58:17,473] DEEPMD INFO    source branch:         HEAD
[2025-02-10 17:58:17,473] DEEPMD INFO    source commit:         c314f1b
[2025-02-10 17:58:17,473] DEEPMD INFO    source commit at:      2024-12-23 16:45:06 -0800
[2025-02-10 17:58:17,473] DEEPMD INFO    use float prec:        double
[2025-02-10 17:58:17,473] DEEPMD INFO    build variant:         cuda
[2025-02-10 17:58:17,473] DEEPMD INFO    Backend:               PyTorch
[2025-02-10 17:58:17,473] DEEPMD INFO    PT ver:                v2.4.1.post302-gUnknown
[2025-02-10 17:58:17,473] DEEPMD INFO    Enable custom OP:      True
[2025-02-10 17:58:17,473] DEEPMD INFO    build with PT ver:     2.4.1
[2025-02-10 17:58:17,473] DEEPMD INFO    build with PT inc:     /root/deepmd-kit/lib/python3.12/site-packages/torch/include
[2025-02-10 17:58:17,473] DEEPMD INFO                           /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2025-02-10 17:58:17,473] DEEPMD INFO    build with PT lib:     /root/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2025-02-10 17:58:17,473] DEEPMD INFO    running on:            bohrium-156-1256408
[2025-02-10 17:58:17,473] DEEPMD INFO    computing device:      cuda:1
[2025-02-10 17:58:17,473] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2025-02-10 17:58:17,473] DEEPMD INFO    Count of visible GPUs: 8
[2025-02-10 17:58:17,473] DEEPMD INFO    num_intra_threads:     0
[2025-02-10 17:58:17,473] DEEPMD INFO    num_inter_threads:     0
[2025-02-10 17:58:17,473] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,502] DEEPMD INFO     _____               _____   __  __  _____           _     _  _
[2025-02-10 17:58:17,503] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |
[2025-02-10 17:58:17,503] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_
[2025-02-10 17:58:17,503] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2025-02-10 17:58:17,503] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_
[2025-02-10 17:58:17,503] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2025-02-10 17:58:17,503] DEEPMD INFO    Please read and cite:
[2025-02-10 17:58:17,503] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2025-02-10 17:58:17,503] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2025-02-10 17:58:17,503] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2025-02-10 17:58:17,503] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,503] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2025-02-10 17:58:17,503] DEEPMD INFO    source:
[2025-02-10 17:58:17,503] DEEPMD INFO    source branch:         HEAD
[2025-02-10 17:58:17,503] DEEPMD INFO    source commit:         c314f1b
[2025-02-10 17:58:17,503] DEEPMD INFO    source commit at:      2024-12-23 16:45:06 -0800
[2025-02-10 17:58:17,503] DEEPMD INFO    use float prec:        double
[2025-02-10 17:58:17,503] DEEPMD INFO    build variant:         cuda
[2025-02-10 17:58:17,503] DEEPMD INFO    Backend:               PyTorch
[2025-02-10 17:58:17,503] DEEPMD INFO    PT ver:                v2.4.1.post302-gUnknown
[2025-02-10 17:58:17,503] DEEPMD INFO    Enable custom OP:      True
[2025-02-10 17:58:17,503] DEEPMD INFO    build with PT ver:     2.4.1
[2025-02-10 17:58:17,503] DEEPMD INFO    build with PT inc:     /root/deepmd-kit/lib/python3.12/site-packages/torch/include
[2025-02-10 17:58:17,503] DEEPMD INFO                           /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2025-02-10 17:58:17,503] DEEPMD INFO    build with PT lib:     /root/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2025-02-10 17:58:17,503] DEEPMD INFO    running on:            bohrium-156-1256408
[2025-02-10 17:58:17,503] DEEPMD INFO    computing device:      cuda:2
[2025-02-10 17:58:17,503] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2025-02-10 17:58:17,503] DEEPMD INFO    Count of visible GPUs: 8
[2025-02-10 17:58:17,503] DEEPMD INFO    num_intra_threads:     0
[2025-02-10 17:58:17,503] DEEPMD INFO    num_inter_threads:     0
[2025-02-10 17:58:17,503] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,510] DEEPMD INFO     _____               _____   __  __  _____           _     _  _
[2025-02-10 17:58:17,510] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |
[2025-02-10 17:58:17,510] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_
[2025-02-10 17:58:17,510] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2025-02-10 17:58:17,510] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_
[2025-02-10 17:58:17,510] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2025-02-10 17:58:17,510] DEEPMD INFO    Please read and cite:
[2025-02-10 17:58:17,510] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2025-02-10 17:58:17,510] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2025-02-10 17:58:17,510] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2025-02-10 17:58:17,510] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,510] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2025-02-10 17:58:17,510] DEEPMD INFO    source:
[2025-02-10 17:58:17,510] DEEPMD INFO    source branch:         HEAD
[2025-02-10 17:58:17,510] DEEPMD INFO    source commit:         c314f1b
[2025-02-10 17:58:17,510] DEEPMD INFO    source commit at:      2024-12-23 16:45:06 -0800
[2025-02-10 17:58:17,510] DEEPMD INFO    use float prec:        double
[2025-02-10 17:58:17,510] DEEPMD INFO    build variant:         cuda
[2025-02-10 17:58:17,511] DEEPMD INFO    Backend:               PyTorch
[2025-02-10 17:58:17,511] DEEPMD INFO    PT ver:                v2.4.1.post302-gUnknown
[2025-02-10 17:58:17,511] DEEPMD INFO    Enable custom OP:      True
[2025-02-10 17:58:17,511] DEEPMD INFO    build with PT ver:     2.4.1
[2025-02-10 17:58:17,511] DEEPMD INFO    build with PT inc:     /root/deepmd-kit/lib/python3.12/site-packages/torch/include
[2025-02-10 17:58:17,511] DEEPMD INFO                           /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2025-02-10 17:58:17,511] DEEPMD INFO    build with PT lib:     /root/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2025-02-10 17:58:17,511] DEEPMD INFO    running on:            bohrium-156-1256408
[2025-02-10 17:58:17,511] DEEPMD INFO    computing device:      cuda:5
[2025-02-10 17:58:17,511] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2025-02-10 17:58:17,511] DEEPMD INFO    Count of visible GPUs: 8
[2025-02-10 17:58:17,511] DEEPMD INFO    num_intra_threads:     0
[2025-02-10 17:58:17,511] DEEPMD INFO    num_inter_threads:     0
[2025-02-10 17:58:17,511] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,519] DEEPMD INFO     _____               _____   __  __  _____           _     _  _
[2025-02-10 17:58:17,519] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |
[2025-02-10 17:58:17,519] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_
[2025-02-10 17:58:17,519] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2025-02-10 17:58:17,519] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_
[2025-02-10 17:58:17,519] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2025-02-10 17:58:17,519] DEEPMD INFO    Please read and cite:
[2025-02-10 17:58:17,519] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2025-02-10 17:58:17,519] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2025-02-10 17:58:17,519] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2025-02-10 17:58:17,519] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,519] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2025-02-10 17:58:17,519] DEEPMD INFO    source:
[2025-02-10 17:58:17,519] DEEPMD INFO    source branch:         HEAD
[2025-02-10 17:58:17,519] DEEPMD INFO    source commit:         c314f1b
[2025-02-10 17:58:17,519] DEEPMD INFO    source commit at:      2024-12-23 16:45:06 -0800
[2025-02-10 17:58:17,519] DEEPMD INFO    use float prec:        double
[2025-02-10 17:58:17,519] DEEPMD INFO    build variant:         cuda
[2025-02-10 17:58:17,519] DEEPMD INFO    Backend:               PyTorch
[2025-02-10 17:58:17,519] DEEPMD INFO    PT ver:                v2.4.1.post302-gUnknown
[2025-02-10 17:58:17,519] DEEPMD INFO    Enable custom OP:      True
[2025-02-10 17:58:17,519] DEEPMD INFO    build with PT ver:     2.4.1
[2025-02-10 17:58:17,519] DEEPMD INFO    build with PT inc:     /root/deepmd-kit/lib/python3.12/site-packages/torch/include
[2025-02-10 17:58:17,519] DEEPMD INFO                           /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2025-02-10 17:58:17,519] DEEPMD INFO    build with PT lib:     /root/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2025-02-10 17:58:17,519] DEEPMD INFO    running on:            bohrium-156-1256408
[2025-02-10 17:58:17,519] DEEPMD INFO    computing device:      cuda:3
[2025-02-10 17:58:17,520] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2025-02-10 17:58:17,520] DEEPMD INFO    Count of visible GPUs: 8
[2025-02-10 17:58:17,520] DEEPMD INFO    num_intra_threads:     0
[2025-02-10 17:58:17,520] DEEPMD INFO    num_inter_threads:     0
[2025-02-10 17:58:17,520] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,540] DEEPMD INFO     _____               _____   __  __  _____           _     _  _
[2025-02-10 17:58:17,540] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |
[2025-02-10 17:58:17,540] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_
[2025-02-10 17:58:17,540] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2025-02-10 17:58:17,540] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_
[2025-02-10 17:58:17,540] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2025-02-10 17:58:17,540] DEEPMD INFO    Please read and cite:
[2025-02-10 17:58:17,540] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2025-02-10 17:58:17,540] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2025-02-10 17:58:17,540] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2025-02-10 17:58:17,540] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,540] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2025-02-10 17:58:17,540] DEEPMD INFO    source:
[2025-02-10 17:58:17,541] DEEPMD INFO    source branch:         HEAD
[2025-02-10 17:58:17,541] DEEPMD INFO    source commit:         c314f1b
[2025-02-10 17:58:17,541] DEEPMD INFO    source commit at:      2024-12-23 16:45:06 -0800
[2025-02-10 17:58:17,541] DEEPMD INFO    use float prec:        double
[2025-02-10 17:58:17,541] DEEPMD INFO    build variant:         cuda
[2025-02-10 17:58:17,541] DEEPMD INFO    Backend:               PyTorch
[2025-02-10 17:58:17,541] DEEPMD INFO    PT ver:                v2.4.1.post302-gUnknown
[2025-02-10 17:58:17,541] DEEPMD INFO    Enable custom OP:      True
[2025-02-10 17:58:17,541] DEEPMD INFO    build with PT ver:     2.4.1
[2025-02-10 17:58:17,541] DEEPMD INFO    build with PT inc:     /root/deepmd-kit/lib/python3.12/site-packages/torch/include
[2025-02-10 17:58:17,541] DEEPMD INFO                           /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2025-02-10 17:58:17,541] DEEPMD INFO    build with PT lib:     /root/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2025-02-10 17:58:17,541] DEEPMD INFO    running on:            bohrium-156-1256408
[2025-02-10 17:58:17,541] DEEPMD INFO    computing device:      cuda:7
[2025-02-10 17:58:17,541] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2025-02-10 17:58:17,541] DEEPMD INFO    Count of visible GPUs: 8
[2025-02-10 17:58:17,541] DEEPMD INFO    num_intra_threads:     0
[2025-02-10 17:58:17,541] DEEPMD INFO    num_inter_threads:     0
[2025-02-10 17:58:17,541] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,580] DEEPMD INFO     _____               _____   __  __  _____           _     _  _
[2025-02-10 17:58:17,580] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |
[2025-02-10 17:58:17,580] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_
[2025-02-10 17:58:17,580] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2025-02-10 17:58:17,580] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_
[2025-02-10 17:58:17,580] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2025-02-10 17:58:17,580] DEEPMD INFO    Please read and cite:
[2025-02-10 17:58:17,580] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2025-02-10 17:58:17,580] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2025-02-10 17:58:17,580] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2025-02-10 17:58:17,580] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,580] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2025-02-10 17:58:17,580] DEEPMD INFO    source:
[2025-02-10 17:58:17,580] DEEPMD INFO    source branch:         HEAD
[2025-02-10 17:58:17,581] DEEPMD INFO    source commit:         c314f1b
[2025-02-10 17:58:17,581] DEEPMD INFO    source commit at:      2024-12-23 16:45:06 -0800
[2025-02-10 17:58:17,581] DEEPMD INFO    use float prec:        double
[2025-02-10 17:58:17,581] DEEPMD INFO    build variant:         cuda
[2025-02-10 17:58:17,581] DEEPMD INFO    Backend:               PyTorch
[2025-02-10 17:58:17,581] DEEPMD INFO    PT ver:                v2.4.1.post302-gUnknown
[2025-02-10 17:58:17,581] DEEPMD INFO    Enable custom OP:      True
[2025-02-10 17:58:17,581] DEEPMD INFO    build with PT ver:     2.4.1
[2025-02-10 17:58:17,581] DEEPMD INFO    build with PT inc:     /root/deepmd-kit/lib/python3.12/site-packages/torch/include
[2025-02-10 17:58:17,581] DEEPMD INFO                           /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2025-02-10 17:58:17,581] DEEPMD INFO    build with PT lib:     /root/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2025-02-10 17:58:17,581] DEEPMD INFO    running on:            bohrium-156-1256408
[2025-02-10 17:58:17,581] DEEPMD INFO    computing device:      cuda:4
[2025-02-10 17:58:17,581] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2025-02-10 17:58:17,581] DEEPMD INFO    Count of visible GPUs: 8
[2025-02-10 17:58:17,581] DEEPMD INFO    num_intra_threads:     0
[2025-02-10 17:58:17,581] DEEPMD INFO    num_inter_threads:     0
[2025-02-10 17:58:17,581] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[bohrium-156-1256408:01441] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1256408.0/jf.0/1753350144/shared_mem_cuda_pool.bohrium-156-1256408 could be created.
[bohrium-156-1256408:01441] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728
[bohrium-156-1256408:01435] shmem: mmap: an error occurred while determining whether or not /tmp/ompi.bohrium-156-1256408.0/jf.0/4004642816/shared_mem_cuda_pool.bohrium-156-1256408 could be created.
[bohrium-156-1256408:01435] create_and_attach: unable to create shared memory BTL coordinating structure :: size 134217728
[2025-02-10 17:58:17,708] DEEPMD INFO     _____               _____   __  __  _____           _     _  _
[2025-02-10 17:58:17,708] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |
[2025-02-10 17:58:17,708] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_
[2025-02-10 17:58:17,708] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2025-02-10 17:58:17,708] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_
[2025-02-10 17:58:17,708] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2025-02-10 17:58:17,708] DEEPMD INFO    Please read and cite:
[2025-02-10 17:58:17,708] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2025-02-10 17:58:17,708] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2025-02-10 17:58:17,708] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2025-02-10 17:58:17,708] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,708] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2025-02-10 17:58:17,708] DEEPMD INFO    source:
[2025-02-10 17:58:17,708] DEEPMD INFO    source branch:         HEAD
[2025-02-10 17:58:17,708] DEEPMD INFO    source commit:         c314f1b
[2025-02-10 17:58:17,708] DEEPMD INFO    source commit at:      2024-12-23 16:45:06 -0800
[2025-02-10 17:58:17,708] DEEPMD INFO    use float prec:        double
[2025-02-10 17:58:17,708] DEEPMD INFO    build variant:         cuda
[2025-02-10 17:58:17,708] DEEPMD INFO    Backend:               PyTorch
[2025-02-10 17:58:17,708] DEEPMD INFO    PT ver:                v2.4.1.post302-gUnknown
[2025-02-10 17:58:17,708] DEEPMD INFO    Enable custom OP:      True
[2025-02-10 17:58:17,708] DEEPMD INFO    build with PT ver:     2.4.1
[2025-02-10 17:58:17,708] DEEPMD INFO    build with PT inc:     /root/deepmd-kit/lib/python3.12/site-packages/torch/include
[2025-02-10 17:58:17,708] DEEPMD INFO                           /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2025-02-10 17:58:17,708] DEEPMD INFO    build with PT lib:     /root/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2025-02-10 17:58:17,708] DEEPMD INFO    running on:            bohrium-156-1256408
[2025-02-10 17:58:17,708] DEEPMD INFO    computing device:      cuda:6
[2025-02-10 17:58:17,708] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2025-02-10 17:58:17,708] DEEPMD INFO    Count of visible GPUs: 8
[2025-02-10 17:58:17,709] DEEPMD INFO    num_intra_threads:     0
[2025-02-10 17:58:17,709] DEEPMD INFO    num_inter_threads:     0
[2025-02-10 17:58:17,709] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,806] DEEPMD INFO     _____               _____   __  __  _____           _     _  _
[2025-02-10 17:58:17,806] DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |
[2025-02-10 17:58:17,806] DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_
[2025-02-10 17:58:17,806] DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
[2025-02-10 17:58:17,806] DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_
[2025-02-10 17:58:17,806] DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
[2025-02-10 17:58:17,806] DEEPMD INFO    Please read and cite:
[2025-02-10 17:58:17,806] DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2025-02-10 17:58:17,806] DEEPMD INFO    Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2025-02-10 17:58:17,806] DEEPMD INFO    See https://deepmd.rtfd.io/credits/ for details.
[2025-02-10 17:58:17,806] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------
[2025-02-10 17:58:17,806] DEEPMD INFO    installed to:          /root/deepmd-kit/lib/python3.12/site-packages/deepmd
[2025-02-10 17:58:17,806] DEEPMD INFO    source:
[2025-02-10 17:58:17,806] DEEPMD INFO    source branch:         HEAD
[2025-02-10 17:58:17,806] DEEPMD INFO    source commit:         c314f1b
[2025-02-10 17:58:17,806] DEEPMD INFO    source commit at:      2024-12-23 16:45:06 -0800
[2025-02-10 17:58:17,806] DEEPMD INFO    use float prec:        double
[2025-02-10 17:58:17,806] DEEPMD INFO    build variant:         cuda
[2025-02-10 17:58:17,806] DEEPMD INFO    Backend:               PyTorch
[2025-02-10 17:58:17,806] DEEPMD INFO    PT ver:                v2.4.1.post302-gUnknown
[2025-02-10 17:58:17,806] DEEPMD INFO    Enable custom OP:      True
[2025-02-10 17:58:17,806] DEEPMD INFO    build with PT ver:     2.4.1
[2025-02-10 17:58:17,806] DEEPMD INFO    build with PT inc:     /root/deepmd-kit/lib/python3.12/site-packages/torch/include
[2025-02-10 17:58:17,806] DEEPMD INFO                           /root/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2025-02-10 17:58:17,807] DEEPMD INFO    build with PT lib:     /root/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2025-02-10 17:58:17,807] DEEPMD INFO    running on:            bohrium-156-1256408
[2025-02-10 17:58:17,807] DEEPMD INFO    computing device:      cuda:0
[2025-02-10 17:58:17,807] DEEPMD INFO    CUDA_VISIBLE_DEVICES:  unset
[2025-02-10 17:58:17,807] DEEPMD INFO    Count of visible GPUs: 8
[2025-02-10 17:58:17,807] DEEPMD INFO    num_intra_threads:     0
[2025-02-10 17:58:17,807] DEEPMD INFO    num_inter_threads:     0
[2025-02-10 17:58:17,807] DEEPMD INFO    ---------------------------------------------------------------------------------------------------------

Steps to Reproduce

cd examples/water/se_atten
torchrun --nproc_per_node=4 --no-python dp --pt train input.json

Further Information, Files, and Links

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions