Skip to content

njzjz/dprc-tutorial

Repository files navigation

DPRc Tutorial

This tutorial shows a simple example to train a DPRc model and perform simulations.

This tutorial is aimed to show readers how to use DPRc. It is not for production. For simplicity, the tutorial takes only one window from the ethylene phosphate reaction. The detail of this reaction is described in J. Phys. Chem. A 2022, 126, 45, 8519–8533.

Before this tutorial, assume you have some basic knowledge of AMBER. If not, follow AMBER tutorials, especially the sections for setting up a system, running umbrella sampling, and running QM/MM simulations.

Software

Assume you have a local machine and a remote machine separately, you need to install the following software.

Software Machine Version Documentation Additional notes
DP-GEN local >= 0.12.0 DPRc
Paper
dpamber local and remote >= 0.3.0 README
DeePMD-kit remote >= 2.2.8 DPRc
Paper
Python interface; C++ interface to Ambertools
AmberTools remote >= 2024 Manual
AmberDPRc
Enable DeePMD-kit and QUICK interface

Please give the proper credits to all the software above.

Initial data

Prepare initial systems

Firstly, one needs to use tleap to generate a parm7 file, minimize the structure, and then use a semi-empirical QM/MM method to generate the starting structures for umbrella windows. This process has been introduced in the AMBER tutorials, so this tutorial will not pay attention to it.

For convenience, the parm7 file and the starting structure have been added to this repository, located in parm7/ETP_ETH.parm7 and rst7/init_-1.50.rst7.

Generate initial training data

Here assume you have had ETP_ETH.parm7 and init_-1.50.rst7. Then you need to use sander to run several fast, semi-empirical QM/MM simulations in different random seeds, and print both energy and forces. The mdin file is provided in mdin/low_level_md.mdin.

sander -O -p ETP_ETH.parm7 -c init_-1.50.rst7 -i low_level_md.mdin -o rc.mdout -r rc.rst7 -x mndod.nc -inf rc.mdinfo -ref init_-1.50.rst7 -frc mndod.mdfrc -e mndod.mden

Then, you need to use sander to run ab initio QM/MM calculation from the given trajectory mndod.nc with imin = 6 (note: imin=6 may not be supported in old AMBER versions). An example mdin file is provided in mdin/high_level.mdin, but you need to modify it to match your DFT software.

sander -O -p ETP_ETH.parm7 -c init_-1.50.rst7 -i high_level_relabel.mdin -o high_level.mdout -r high_level.rst7 -x high_level.nc -y mndod.nc -frc high_level.mdfrc -inf high_level.mdinfo -e high_level.mden

Now you have both high-level and low-level data. Then use dpamber corr to generate the initial training data:

dpamber corr --cutoff 6. --qm_region ":1-2" --parm7_file ETP_ETH.param7 --nc mndod.nc --hl pbe0 --ll mndod --out init_data.hdf5

For convenience, we provide an example of the initial data in init_data/init_data.hdf5.tar.bz2, and you can extract it and jump to the next step.

cd init_data
tar vxjf init_data.hdf5.tar.bz2

Setup initial files

Here we have five directories:

  • init_data: initial training data.
  • parm7: contains parm7 files.
  • mdin: contains mdin files for simulations and relabeling.
    • ml.mdin is the template for DPRc QM/MM simulation;
    • high_level.mdin is the template for high-level ab initio QM/MM calculation;
    • low_level.mdin is the template for low-level semi-empirical QM/MM calculation.
  • rst7: contains starting structures.
  • disang: contains distance and angle constraints for umbrella sampling.

These files have been prepared in advance.

Running DP-GEN

You need to prepare two JSON files, one for parameters and one for the machine. You need to modify the machine file to match your machines. Click the link for a detailed explanation.

The explanation for DeePMD-kit training parameters can be found here.

After these files are ready, run DP-GEN on the local machine:

dpgen run param.json machine.json

For simplicity, we only run two iterations, and the output will look like

INFO:dpgen:-------------------------iter.000000 task 06--------------------------
INFO:dpgen:system 000 candidate :   1112 in   2000  55.60 %
INFO:dpgen:system 000 failed    :     28 in   2000   1.40 %
INFO:dpgen:system 000 accurate  :    860 in   2000  43.00 %
INFO:dpgen:system 000 accurate_ratio:   0.4300    thresholds: 1.0000 and 1.0000   eff. task min and max   -1 1000   number of fp tasks:   1000
INFO:dpgen:-------------------------iter.000001 task 06--------------------------
INFO:dpgen:system 000 candidate :    619 in   2000  30.95 %
INFO:dpgen:system 000 failed    :      0 in   2000   0.00 %
INFO:dpgen:system 000 accurate  :   1381 in   2000  69.05 %
INFO:dpgen:system 000 accurate_ratio:   0.6905    thresholds: 1.0000 and 1.0000   eff. task min and max   -1 1000   number of fp tasks:    619

The ratio of accurate frames is increased in the second iteration. If the active learning cycle continues, the accurate ratio will coverage to 100%.

About

DPRc tutorial

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors