DPRc Tutorial

This tutorial shows a simple example to train a DPRc model and perform simulations.

This tutorial is aimed to show readers how to use DPRc. It is not for production. For simplicity, the tutorial takes only one window from the ethylene phosphate reaction. The detail of this reaction is described in J. Phys. Chem. A 2022, 126, 45, 8519–8533.

Before this tutorial, assume you have some basic knowledge of AMBER. If not, follow AMBER tutorials, especially the sections for setting up a system, running umbrella sampling, and running QM/MM simulations.

Software

Assume you have a local machine and a remote machine separately, you need to install the following software.

Software	Machine	Version	Documentation	Additional notes
DP-GEN	local	>= 0.12.0	DPRc Paper
dpamber	local and remote	>= 0.3.0	README
DeePMD-kit	remote	>= 2.2.8	DPRc Paper	Python interface; C++ interface to Ambertools
AmberTools	remote	>= 2024	Manual AmberDPRc	Enable DeePMD-kit and QUICK interface

Please give the proper credits to all the software above.

Initial data

Prepare initial systems

Firstly, one needs to use tleap to generate a parm7 file, minimize the structure, and then use a semi-empirical QM/MM method to generate the starting structures for umbrella windows. This process has been introduced in the AMBER tutorials, so this tutorial will not pay attention to it.

For convenience, the parm7 file and the starting structure have been added to this repository, located in parm7/ETP_ETH.parm7 and rst7/init_-1.50.rst7.

Generate initial training data

Here assume you have had ETP_ETH.parm7 and init_-1.50.rst7. Then you need to use sander to run several fast, semi-empirical QM/MM simulations in different random seeds, and print both energy and forces. The mdin file is provided in mdin/low_level_md.mdin.

sander -O -p ETP_ETH.parm7 -c init_-1.50.rst7 -i low_level_md.mdin -o rc.mdout -r rc.rst7 -x mndod.nc -inf rc.mdinfo -ref init_-1.50.rst7 -frc mndod.mdfrc -e mndod.mden

Then, you need to use sander to run ab initio QM/MM calculation from the given trajectory mndod.nc with imin = 6 (note: imin=6 may not be supported in old AMBER versions). An example mdin file is provided in mdin/high_level.mdin, but you need to modify it to match your DFT software.

sander -O -p ETP_ETH.parm7 -c init_-1.50.rst7 -i high_level_relabel.mdin -o high_level.mdout -r high_level.rst7 -x high_level.nc -y mndod.nc -frc high_level.mdfrc -inf high_level.mdinfo -e high_level.mden

Now you have both high-level and low-level data. Then use dpamber corr to generate the initial training data:

dpamber corr --cutoff 6. --qm_region ":1-2" --parm7_file ETP_ETH.param7 --nc mndod.nc --hl pbe0 --ll mndod --out init_data.hdf5

For convenience, we provide an example of the initial data in init_data/init_data.hdf5.tar.bz2, and you can extract it and jump to the next step.

cd init_data
tar vxjf init_data.hdf5.tar.bz2

Setup initial files

Here we have five directories:

init_data: initial training data.
parm7: contains parm7 files.
mdin: contains mdin files for simulations and relabeling.
- ml.mdin is the template for DPRc QM/MM simulation;
- high_level.mdin is the template for high-level ab initio QM/MM calculation;
- low_level.mdin is the template for low-level semi-empirical QM/MM calculation.
rst7: contains starting structures.
disang: contains distance and angle constraints for umbrella sampling.

These files have been prepared in advance.

Running DP-GEN

You need to prepare two JSON files, one for parameters and one for the machine. You need to modify the machine file to match your machines. Click the link for a detailed explanation.

The explanation for DeePMD-kit training parameters can be found here.

After these files are ready, run DP-GEN on the local machine:

dpgen run param.json machine.json

For simplicity, we only run two iterations, and the output will look like

INFO:dpgen:-------------------------iter.000000 task 06--------------------------
INFO:dpgen:system 000 candidate :   1112 in   2000  55.60 %
INFO:dpgen:system 000 failed    :     28 in   2000   1.40 %
INFO:dpgen:system 000 accurate  :    860 in   2000  43.00 %
INFO:dpgen:system 000 accurate_ratio:   0.4300    thresholds: 1.0000 and 1.0000   eff. task min and max   -1 1000   number of fp tasks:   1000
INFO:dpgen:-------------------------iter.000001 task 06--------------------------
INFO:dpgen:system 000 candidate :    619 in   2000  30.95 %
INFO:dpgen:system 000 failed    :      0 in   2000   0.00 %
INFO:dpgen:system 000 accurate  :   1381 in   2000  69.05 %
INFO:dpgen:system 000 accurate_ratio:   0.6905    thresholds: 1.0000 and 1.0000   eff. task min and max   -1 1000   number of fp tasks:    619

The ratio of accurate frames is increased in the second iteration. If the active learning cycle continues, the accurate ratio will coverage to 100%.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DPRc Tutorial

Software

Initial data

Prepare initial systems

Generate initial training data

Setup initial files

Running DP-GEN

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
disang		disang
init_data		init_data
mdin		mdin
parm7		parm7
rst7		rst7
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
machine.json		machine.json
param.json		param.json

Folders and files

Latest commit

History

Repository files navigation

DPRc Tutorial

Software

Initial data

Prepare initial systems

Generate initial training data

Setup initial files

Running DP-GEN

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages