add multi task by yhliu918 · Pull Request #929 · deepmodeling/deepmd-kit

yhliu918 · 2021-08-06T08:52:44Z

No description provided.

njzjz · 2021-08-06T18:53:35Z

Please merge with devel.

codecov-commenter · 2021-08-07T15:34:09Z

Codecov Report

Merging #929 (0fd3ca7) into multi-task (4ced020) will decrease coverage by 11.13%.
The diff coverage is n/a.

❗ Current head 0fd3ca7 differs from pull request most recent head 2c2f651. Consider uploading reports for the commit 2c2f651 to get more accurate results

@@               Coverage Diff               @@
##           multi-task     #929       +/-   ##
===============================================
- Coverage       75.41%   64.28%   -11.14%     
===============================================
  Files              85        5       -80     
  Lines            6729       14     -6715     
===============================================
- Hits             5075        9     -5066     
+ Misses           1654        5     -1649

Impacted Files	Coverage Δ
deepmd/common.py
deepmd/cluster/slurm.py
source/op/_tabulate_grad.py
deepmd/infer/deep_dipole.py
deepmd/env.py
deepmd/__about__.py
deepmd/descriptor/hybrid.py
deepmd/entrypoints/freeze.py
deepmd/utils/convert.py
source/op/_prod_force_se_a_grad.py
... and 55 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ced020...2c2f651. Read the comment docs.

njzjz · 2021-08-08T00:03:25Z

err... We also need to update the target branch.

deepmd/entrypoints/main.py

deepmd/entrypoints/train.py

deepmd/entrypoints/main.py

deepmd/utils/data.py

deepmd/train/trainer.py

deepmd/train/trainer_mt.py

deepmd/descriptor/se_a.py

deepmd/utils/data_system.py

deepmd/utils/argcheck_mt.py

njzjz · 2021-08-09T05:05:32Z

deepmd/train/trainer_mt.py

+
+
+
+class DPTrainer_mt (object):


I think this class can inherit DPTrainer and remove duplicate methods.

And I still keep this advice: this class can inherit DPTrainer and remove duplicate methods. Or you can consider to create a new class DPTrainerBase to merge the same methods.

deepmd/entrypoints/train.py

njzjz

Not all previous comments are resolved.

deepmd/fit/ener.py

deepmd/model/ener.py

deepmd/utils/argcheck_mt.py

deepmd/utils/data_system.py

njzjz · 2021-08-11T20:59:15Z

deepmd/utils/argcheck_mt.py

+    doc_descrpt_type = f'The type of the descritpor. See explanation below. \n\n\
+- `loc_frame`: Defines a local frame at each atom, and the compute the descriptor as local coordinates under this frame.\n\n\
+- `se_e2_a`: Used by the smooth edition of Deep Potential. The full relative coordinates are used to construct the descriptor.\n\n\
+- `se_e2_r`: Used by the smooth edition of Deep Potential. Only the distance between atoms is used to construct the descriptor.\n\n\
+- `se_e3`: Used by the smooth edition of Deep Potential. The full relative coordinates are used to construct the descriptor. Three-body embedding will be used by this descriptor.\n\n\
+- `se_a_tpe`: Used by the smooth edition of Deep Potential. The full relative coordinates are used to construct the descriptor. Type embedding will be used by this descriptor.\n\n\
+- `hybrid`: Concatenate of a list of descriptors as a new descriptor.'


se_conv1d and se_conv_geo are not added here.

I think I can deal with the argcheck problem.
As for inheriting DPTrainer, I definitely think there's a must to do it. However, there are not much duplicate method we can remove, because each build and training process is modified and DPTrainer doesn't split some big function into small ones, which means I can't apply them, instead I need to go through like the whole _init_param to change some details

njzjz · 2021-08-11T21:09:24Z

deepmd/train/trainer_mt.py

+
+
+
+class DPTrainer_mt (object):


And I still keep this advice: this class can inherit DPTrainer and remove duplicate methods. Or you can consider to create a new class DPTrainerBase to merge the same methods.

amcadmus · 2021-08-12T00:43:31Z

It is definitely not a good practice to write commit messages like "add multi-task v4". There are many instructions on how to write good commit message for example.

Please describe what you have done in the description of this PR. Good examples are #945 #921 Please follow them to update your PR. Thanks!

amcadmus

other fit and loss (deep tensor) should be added with attribute name
I am not sure if the data source/tests/multi-task/data/ is necessary for your test cases, please check.

amcadmus · 2021-08-16T01:03:24Z

deepmd/entrypoints/train.py

        log_level=log_level,
-        mpi_log=mpi_log
+        mpi_log=mpi_log,
+        try_distrib=jdata.get("with_distrib", False),


This would not pass, RunOptions does not take try_distrib anymore

amcadmus · 2021-08-16T01:05:19Z

deepmd/entrypoints/train.py

+
+

do not add blank lines to the code.

amcadmus · 2021-08-16T01:16:26Z

deepmd/model/ener.py

                                         all_stat['box'],
                                         all_stat['type'],
-                                         all_stat['natoms_vec'],
+                                         all_stat['natoms_vec'], # this should be the global one(dim 14) or local one(dim 11)?


Why do you make this comment?

I'm thinking about how to organize data more reasonably, because now we are using global type map everywhere, and each descriptor deal with its own atom type

amcadmus · 2021-08-16T01:21:12Z

deepmd/train/run_options.py

        log_level: int = 0,
-        mpi_log: str = "master"
+        mpi_log: str = "master",
+        try_distrib: bool = False,


Why adding try_distrib?

This is not added by me, I merged the devel (maybe) and there are some inconsistence here in run options

I see that may because that you didn't solve the conflict correctly.

amcadmus · 2021-08-16T01:28:17Z

deepmd/train/trainer.py


+
    def valid_on_the_fly(self,
                         fp,


the fp parameter is not used anymore , pleas remove.

I check out the devel branch, but fp is not deprecated?

amcadmus · 2021-08-16T01:35:11Z

deepmd/utils/data.py

-            self.type_map = type_map
+            ntypes = len(type_map)
+            self.type_map = type_map[:ntypes]
+


You are reverting bug fixings!!

amcadmus · 2021-08-16T02:22:17Z

deepmd/utils/data.py

-    def get_data_dict(self) -> dict:
+    def get_data_dict(self) -> (dict, str):
        """
        Get the `data_dict`
        """
-        return self.data_dict
+        return self.data_dict, self.name


You already provide the get_name method, I think it is not necessary to return self.name by get_data_dict

amcadmus · 2021-08-16T02:32:17Z

deepmd/utils/data_system.py

+        #print("self.type_map",self.type_map)
+        #print("type_map",type_map)


plz remove the debug code

amcadmus · 2021-08-16T02:32:33Z

deepmd/utils/data_system.py

-        return b_data
+
+        return  b_data


plz revert this change

amcadmus · 2021-08-16T02:32:59Z

deepmd/utils/data_system.py

        return sys_probs


+class DeepmdDataDocker() :


Should be implemented in a separate file.

yhliu918 · 2021-08-16T07:11:54Z

deepmd/entrypoints/train.py

+        for model_name in model.model_dict.keys():
+            sub_model = model.model_dict[model_name]
+            rcut_list.append(sub_model.get_rcut())
+            type_map = sub_model.get_type_map()


they are the same for now

shishaochen · 2021-08-16T07:16:13Z

examples/mix/input.json

+	},	    
+	"descriptor" :[
+		{
+	    	"type":		"se_e2_a",


Please take care of indents.

yhliu918 · 2021-08-17T09:12:49Z

Sorry, to call the multi task training, it is still using 'dp train -mt True input.json', however train and train_mt is separated

amcadmus · 2021-08-18T01:17:16Z

Sorry, to call the multi task training, it is still using 'dp train -mt True input.json', however train and train_mt is separated

I see

njzjz · 2021-08-18T03:52:10Z

deepmd/entrypoints/main.py

+        type=bool,
+        default=False,


I suggest to use action="store_true" instead.

Suggested change

type=bool,

default=False,

action="store_true",

amcadmus

plz revert all changes on adding/removing blank lines.

amcadmus · 2021-08-18T07:56:29Z

deepmd/entrypoints/train.py

    mpi_log: str,
    log_level: int,
    log_path: Optional[str],
+    multi_task: bool,


Why do we still have the option in train?

amcadmus · 2021-08-18T08:00:40Z

deepmd/entrypoints/train.py

+from deepmd.train.trainer_mt import DPMultitaskTrainer
 from deepmd.utils.argcheck import normalize
+from deepmd.utils.argcheck_mt import normalize_mt
 from deepmd.utils.compat import updata_deepmd_input
 from deepmd.utils.data_system import DeepmdDataSystem
+from deepmd.utils.data_docker import DeepmdDataDocker


These imports should not be necessary

amcadmus · 2021-08-18T08:01:54Z

deepmd/entrypoints/train.py

-def get_rcut(jdata):
-    descrpt_data = jdata['model']['descriptor']
+def parse_rcut(descrpt_data):
    rcut_list = []
    if descrpt_data['type'] == 'hybrid':
        for ii in descrpt_data['list']:
            rcut_list.append(ii['rcut'])
    else:
        rcut_list.append(descrpt_data['rcut'])
+    return rcut_list
+
+def get_rcut(jdata):
+    descrpt_data = jdata['model']['descriptor']
+    rcut_list = []
+    rcut_list.extend(parse_rcut(descrpt_data))


I do not think you need to revise the get_rcut.

if I don't split get_rcut into two function, I can't reuse it in the train_mt

amcadmus · 2021-08-18T08:03:11Z

deepmd/model/ener.py

                                         all_stat['box'],
                                         all_stat['type'],
-                                         all_stat['natoms_vec'],
+                                         all_stat['natoms_vec'], 


plz avoid introduce such revision.

amcadmus · 2021-08-18T08:06:54Z

deepmd/utils/data_system.py

        b_data["natoms_vec"] = self.natoms_vec[self.pick_idx]
        b_data["default_mesh"] = self.default_mesh[self.pick_idx]
-        return b_data
+        return  b_data


a void such change

amcadmus · 2021-08-18T08:07:16Z

deepmd/utils/data_system.py

-
-

avoid such change

amcadmus · 2021-08-19T07:13:43Z

deepmd/model/ener.py

               mesh,
               input_dict,
-               suffix = '', 
+               suffix, # a dict of suffix, including type_embed, descrpt and fitting.


I think we should preserve the default value ''

amcadmus · 2021-08-19T07:23:47Z

source/tests/test_multitask.py

+        dd = dd.reshape([3,-1])
+        ref_loss = [24.8,19.6,
+                    3.79,1.81,
+                    2110.0,2110.0]
+
+        for ii in range(3):
+            for jj in range(2):
+                self.assertAlmostEqual(dd[ii][jj+1], ref_loss[ii*2+jj], places = 8)


why only compare the total loss?

see deepmodeling#928

add multi task

4144c9b

add multi task

3c09f77

njzjz mentioned this pull request Aug 8, 2021

merge devel into multi-task #931

Merged

njzjz changed the base branch from multi-task to devel August 8, 2021 21:29

njzjz changed the base branch from devel to multi-task August 8, 2021 21:29