This tutorial shows a simple example to train a DPRc model and perform simulations.
This tutorial is aimed to show readers how to use DPRc. It is not for production. For simplicity, the tutorial takes only one window from the ethylene phosphate reaction. The detail of this reaction is described in J. Phys. Chem. A 2022, 126, 45, 8519–8533.
Before this tutorial, assume you have some basic knowledge of AMBER. If not, follow AMBER tutorials, especially the sections for setting up a system, running umbrella sampling, and running QM/MM simulations.
Assume you have a local machine and a remote machine separately, you need to install the following software.
| Software | Machine | Version | Documentation | Additional notes |
|---|---|---|---|---|
| DP-GEN | local | >= 0.12.0 | DPRc Paper |
|
| dpamber | local and remote | >= 0.3.0 | README | |
| DeePMD-kit | remote | >= 2.2.8 | DPRc Paper |
Python interface; C++ interface to Ambertools |
| AmberTools | remote | >= 2024 | Manual AmberDPRc |
Enable DeePMD-kit and QUICK interface |
Please give the proper credits to all the software above.
Firstly, one needs to use tleap to generate a parm7 file, minimize the structure, and then use a semi-empirical QM/MM method to generate the starting structures for umbrella windows. This process has been introduced in the AMBER tutorials, so this tutorial will not pay attention to it.
For convenience, the parm7 file and the starting structure have been added to this repository, located in parm7/ETP_ETH.parm7 and rst7/init_-1.50.rst7.
Here assume you have had ETP_ETH.parm7 and init_-1.50.rst7.
Then you need to use sander to run several fast, semi-empirical QM/MM simulations in different random seeds, and print both energy and forces. The mdin file is provided in mdin/low_level_md.mdin.
sander -O -p ETP_ETH.parm7 -c init_-1.50.rst7 -i low_level_md.mdin -o rc.mdout -r rc.rst7 -x mndod.nc -inf rc.mdinfo -ref init_-1.50.rst7 -frc mndod.mdfrc -e mndod.mdenThen, you need to use sander to run ab initio QM/MM calculation from the given trajectory mndod.nc with imin = 6 (note: imin=6 may not be supported in old AMBER versions). An example mdin file is provided in mdin/high_level.mdin, but you need to modify it to match your DFT software.
sander -O -p ETP_ETH.parm7 -c init_-1.50.rst7 -i high_level_relabel.mdin -o high_level.mdout -r high_level.rst7 -x high_level.nc -y mndod.nc -frc high_level.mdfrc -inf high_level.mdinfo -e high_level.mdenNow you have both high-level and low-level data. Then use dpamber corr to generate the initial training data:
dpamber corr --cutoff 6. --qm_region ":1-2" --parm7_file ETP_ETH.param7 --nc mndod.nc --hl pbe0 --ll mndod --out init_data.hdf5For convenience, we provide an example of the initial data in init_data/init_data.hdf5.tar.bz2, and you can extract it and jump to the next step.
cd init_data
tar vxjf init_data.hdf5.tar.bz2Here we have five directories:
init_data: initial training data.parm7: contains parm7 files.mdin: contains mdin files for simulations and relabeling.ml.mdinis the template for DPRc QM/MM simulation;high_level.mdinis the template for high-level ab initio QM/MM calculation;low_level.mdinis the template for low-level semi-empirical QM/MM calculation.
rst7: contains starting structures.disang: contains distance and angle constraints for umbrella sampling.
These files have been prepared in advance.
You need to prepare two JSON files, one for parameters and one for the machine. You need to modify the machine file to match your machines. Click the link for a detailed explanation.
The explanation for DeePMD-kit training parameters can be found here.
After these files are ready, run DP-GEN on the local machine:
dpgen run param.json machine.jsonFor simplicity, we only run two iterations, and the output will look like
INFO:dpgen:-------------------------iter.000000 task 06--------------------------
INFO:dpgen:system 000 candidate : 1112 in 2000 55.60 %
INFO:dpgen:system 000 failed : 28 in 2000 1.40 %
INFO:dpgen:system 000 accurate : 860 in 2000 43.00 %
INFO:dpgen:system 000 accurate_ratio: 0.4300 thresholds: 1.0000 and 1.0000 eff. task min and max -1 1000 number of fp tasks: 1000
INFO:dpgen:-------------------------iter.000001 task 06--------------------------
INFO:dpgen:system 000 candidate : 619 in 2000 30.95 %
INFO:dpgen:system 000 failed : 0 in 2000 0.00 %
INFO:dpgen:system 000 accurate : 1381 in 2000 69.05 %
INFO:dpgen:system 000 accurate_ratio: 0.6905 thresholds: 1.0000 and 1.0000 eff. task min and max -1 1000 number of fp tasks: 619
The ratio of accurate frames is increased in the second iteration. If the active learning cycle continues, the accurate ratio will coverage to 100%.