Skip to content

[BUG] pt: setting batch_size to mixed:N throws errors #3474

@njzjz

Description

@njzjz

Bug summary

Setting batch_size to mixed:N throws errors as shown below.

DeePMD-kit Version

v3.0.0a0-28-ged831c88

TensorFlow Version

PT v2.2.0+cu121-g8ac9b20d4b0

How did you download the software?

Built from source

Input Files, Running Commands, Error Log, etc.

[2024-03-17 00:59:58,348] DEEPMD ERROR   Unsupported batch size type
Traceback (most recent call last):
  File "/home/jz748/anaconda3/bin/dp", line 8, in <module>
    sys.exit(main())
  File "/home/jz748/codes/deepmd-kit/deepmd/main.py", line 807, in main
    deepmd_main(args)
  File "/home/jz748/anaconda3/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 306, in main
    train(FLAGS)
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 270, in train
    trainer = get_trainer(
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 166, in get_trainer
    ) = prepare_trainer_input_single(
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 134, in prepare_trainer_input_single
    DpLoaderSet(
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/utils/dataloader.py", line 123, in __init__
    self.batch_size = rule // system._natoms
TypeError: unsupported operand type(s) for //: 'NoneType' and 'int'

Steps to Reproduce

cd examples/water/se_atten

Do the following modification:

diff --git a/examples/water/se_atten/input_torch.json b/examples/water/se_atten/input_torch.json
index 7e9cf06f..501b5a77 100644
--- a/examples/water/se_atten/input_torch.json
+++ b/examples/water/se_atten/input_torch.json
@@ -75,7 +75,7 @@
       "systems": [
         "../data/data_3"
       ],
-      "batch_size": 1,
+      "batch_size": "mixed:2",
       "numb_btch": 3,
       "_comment": "that's all"
     },

Then run

dp --pt train input_torch.json

Further Information, Files, and Links

Need to update documentation if it cannot be resolved before the stable release.
https://docs.deepmodeling.com/projects/deepmd/en/latest/train/train-input.html

Metadata

Metadata

Assignees

Labels

Type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions