Devel update by iProzd · Pull Request #17 · iProzd/deepmd-kit

iProzd · 2021-08-09T03:37:43Z

No description provided.

* adapt changes to auditwheel directory in manylinux See pypa/manylinux#1143. * find auditwheel path via `auditwheel --version` * use custom image instead * Update .github/workflows/build_wheel.yml

Update the compiling arguments in c++ interface example.

* build low and high precision at the same time We can only provide one package containing both precisions. BREAKING CHANGES: Python: Python package will build both precisions, and DP_FLOAT_PREC is now runtime envrionmental variables C++: CMake will build both library, which will be called something like libdeepmd and libdeepmd_low LAMMPS: generate two directory USER-DEEPMD and USER-DEEPMD_low ipi: generate two execuate dp_ipi and dp_ipi_low * fix LAMMPS build script * fix lammps cmake file * install LIB_DEEPMD_OP_VARIANT * remove FLOAT_PREC argument * change DP_FLOAT_PREC to DP_INTERFACE_PREC * revert some libraries as they do not need to build twice * update error message * change the implementation of LAMMPS variant now `env.sh` and `env_low.sh` will be generated in the same directory. Users can easily `mv env_low.sh env.sh` if they need low precision.

* Replace PS-Worker mode with multi-worker one. * Remove deprecated `try_distrib` argument in tests. * Limit reference of mpi4py to logger.py. * Add tutorial on parallel training. * Refine words & tokens used. * Only limit sub sessions to CPU when distributed training. * Add description of `mpi4py` in tutorial. * Explain linear relationship between batch size and learning rate. * Fine documents & comments. * Let TensorFlow choose device when CUDA_VISIBLE_DEVICES is unset. Co-authored-by: Han Wang <amcadmus@gmail.com>

…rix (#900) * fix `InvalidArgumentError` caused by zero `sel` Fix #899. See comments in the code for details. * directly return zero matrix for exclude_types * also optimize for se_r

* enhance the cli to generate doc json file * bump dargs version; add argument to tests * correct the type hint of `out_type`

* Find available GPUs in an elegant way. * Clean codes of preparing parallel context. * Fix code style and typo. * Use a subprocess to detect GPU. * Use Popen as a context manager. * Do not use `tf.test.built_with_gpu_support`. Co-authored-by: Han Wang <amcadmus@gmail.com>

The default value of `type_map` is `None`, so when you don't set `type_map`, you'll get this error. https://github.com/deepmodeling/deepmd-kit/blob/043ac869bfcdc7f3a20aa24d04bb7c7b88abcc0b/deepmd/entrypoints/train.py#L225

Currently I can't see any warning during the training if sel is not enough, so it's a good idea to check it before training, and tell the user what to do. Also fix #874.

…r parallel training. (#913) * Add unit tests of `cluster` and `env`. * Fix the expanding logic of `SLURM_JOB_NODELIST`.

Starting from v2.0.0.b4, `libdeepmd` and `lammps-dp` will decouple. The idea is that both C++ API and LAMMPS are usually stable, so we do not need to build LAMMPS in every release. Also, CPU version and GPU version share the same API and LAMMPS itself does not need CUDA, so we do not need to build LAMMPS twice.

Co-authored-by: Han Wang <wang_han@iapcm.ac.cn>

* Passing error to TF instead of exit This commit does three little things: (1) create an exception called `deepmd::deepmd_exception` (based on `std::runtime_error`); (2) throw this exception instead of `exit` or `std::runtime_error`; (3) catch this exception in the op, and pass to TF using `OP_REQUIRES_OK`. One more, the OOM error will raise ResourceExhausted, as the same as TF ops. The benifit of doing so is that the TF side and Python side can processing other things, catch the error, and print the traceback. This commit can also fix #802, where the Python didn't save the buffer to the file before exit. * define try catch function * replace std::runtime_error * add headers * clean useless line * add custom_op.cc to api_cc tests and rename save_compute to safe_compute

* add lammps compute style for deep tensor * support the choice of floating point precision * update doc for deeptensor/atom Co-authored-by: Han Wang <wang_han@iapcm.ac.cn>

* add aliases to Arguments Fix #846. move for cherry-pick add aliases to Arguments (#932) fix #846. move back * fix typo

* remove dependences on training script and data from model compression * reset function update_one_sel in train.py * update the doc of model compression * fix bug in UT * optimize code for reviewer's comments * undo changes to constant variables * Update common.py * update code structure of DPTrainer * fix lint warnings in common.py * fix duplicated lines within trainer.py * Update trainer.py * rm default values with False optional in argcheck.py

Co-authored-by: Han Wang <wang_han@iapcm.ac.cn>

api branch update

njzjz and others added 18 commits July 28, 2021 20:36

adapt changes to auditwheel directory in manylinux (#889)

c4b9c9e

* adapt changes to auditwheel directory in manylinux See pypa/manylinux#1143. * find auditwheel path via `auditwheel --version` * use custom image instead * Update .github/workflows/build_wheel.yml

Update getting-started.md (#898)

953621f

Update the compiling arguments in c++ interface example.

fix InvalidArgumentError caused by zero sel and optimize zero mat…

70508a5

…rix (#900) * fix `InvalidArgumentError` caused by zero `sel` Fix #899. See comments in the code for details. * directly return zero matrix for exclude_types * also optimize for se_r

enhance the cli to generate doc json file (#891)

043ac86

* enhance the cli to generate doc json file * bump dargs version; add argument to tests * correct the type hint of `out_type`

fix 'NoneType' has no len() in auto_sel (#911)

b5b15fa

The default value of `type_map` is `None`, so when you don't set `type_map`, you'll get this error. https://github.com/deepmodeling/deepmd-kit/blob/043ac869bfcdc7f3a20aa24d04bb7c7b88abcc0b/deepmd/entrypoints/train.py#L225

raise warning before training if sel is not enough (#914)

689ffa4

Currently I can't see any warning during the training if sel is not enough, so it's a good idea to check it before training, and tell the user what to do. Also fix #874.

Fix the expanding logic of SLURM_JOB_NODELIST and add unit tests fo…

ee0ed99

…r parallel training. (#913) * Add unit tests of `cluster` and `env`. * Fix the expanding logic of `SLURM_JOB_NODELIST`.

Fix member declartion of deepmd and deepmd.entrypoints. (#922)

3ae80b3

set input DeepmdData.type_map to input type_map (#924)

4ced020

Co-authored-by: Han Wang <wang_han@iapcm.ac.cn>

add lammps compute style for atomic deep tensor (#927)

b30a75e

* add lammps compute style for deep tensor * support the choice of floating point precision * update doc for deeptensor/atom Co-authored-by: Han Wang <wang_han@iapcm.ac.cn>

add aliases to Arguments (#933)

994cc11

* add aliases to Arguments Fix #846. move for cherry-pick add aliases to Arguments (#932) fix #846. move back * fix typo

rm load_ckpt (#935)

382e92a

Co-authored-by: Han Wang <wang_han@iapcm.ac.cn>

iProzd merged commit 7072344 into iProzd:devel Aug 9, 2021

iProzd pushed a commit that referenced this pull request Sep 18, 2021

Merge pull request #17 from deepmodeling/api

147732e

api branch update

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Devel update#17

Devel update#17
iProzd merged 18 commits intoiProzd:develfrom
deepmodeling:devel

iProzd commented Aug 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

iProzd commented Aug 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants