Skip to content

add multi task#929

Merged
amcadmus merged 12 commits intodeepmodeling:multi-taskfrom
yhliu918:multi-task
Aug 19, 2021
Merged

add multi task#929
amcadmus merged 12 commits intodeepmodeling:multi-taskfrom
yhliu918:multi-task

Conversation

@yhliu918
Copy link
Contributor

@yhliu918 yhliu918 commented Aug 6, 2021

No description provided.

@njzjz
Copy link
Member

njzjz commented Aug 6, 2021

Please merge with devel.

@codecov-commenter
Copy link

codecov-commenter commented Aug 7, 2021

Codecov Report

Merging #929 (0fd3ca7) into multi-task (4ced020) will decrease coverage by 11.13%.
The diff coverage is n/a.

❗ Current head 0fd3ca7 differs from pull request most recent head 2c2f651. Consider uploading reports for the commit 2c2f651 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##           multi-task     #929       +/-   ##
===============================================
- Coverage       75.41%   64.28%   -11.14%     
===============================================
  Files              85        5       -80     
  Lines            6729       14     -6715     
===============================================
- Hits             5075        9     -5066     
+ Misses           1654        5     -1649     
Impacted Files Coverage Δ
deepmd/common.py
deepmd/cluster/slurm.py
source/op/_tabulate_grad.py
deepmd/infer/deep_dipole.py
deepmd/env.py
deepmd/__about__.py
deepmd/descriptor/hybrid.py
deepmd/entrypoints/freeze.py
deepmd/utils/convert.py
source/op/_prod_force_se_a_grad.py
... and 55 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ced020...2c2f651. Read the comment docs.

@njzjz
Copy link
Member

njzjz commented Aug 8, 2021

err... We also need to update the target branch.

@njzjz njzjz mentioned this pull request Aug 8, 2021
@njzjz njzjz changed the base branch from multi-task to devel August 8, 2021 21:29
@njzjz njzjz changed the base branch from devel to multi-task August 8, 2021 21:29



class DPTrainer_mt (object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this class can inherit DPTrainer and remove duplicate methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I still keep this advice: this class can inherit DPTrainer and remove duplicate methods. Or you can consider to create a new class DPTrainerBase to merge the same methods.

@amcadmus amcadmus requested a review from njzjz August 11, 2021 02:46
Copy link
Member

@njzjz njzjz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all previous comments are resolved.

@amcadmus amcadmus requested a review from njzjz August 11, 2021 11:07
Comment on lines +57 to +63
doc_descrpt_type = f'The type of the descritpor. See explanation below. \n\n\
- `loc_frame`: Defines a local frame at each atom, and the compute the descriptor as local coordinates under this frame.\n\n\
- `se_e2_a`: Used by the smooth edition of Deep Potential. The full relative coordinates are used to construct the descriptor.\n\n\
- `se_e2_r`: Used by the smooth edition of Deep Potential. Only the distance between atoms is used to construct the descriptor.\n\n\
- `se_e3`: Used by the smooth edition of Deep Potential. The full relative coordinates are used to construct the descriptor. Three-body embedding will be used by this descriptor.\n\n\
- `se_a_tpe`: Used by the smooth edition of Deep Potential. The full relative coordinates are used to construct the descriptor. Type embedding will be used by this descriptor.\n\n\
- `hybrid`: Concatenate of a list of descriptors as a new descriptor.'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

se_conv1d and se_conv_geo are not added here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can deal with the argcheck problem.
As for inheriting DPTrainer, I definitely think there's a must to do it. However, there are not much duplicate method we can remove, because each build and training process is modified and DPTrainer doesn't split some big function into small ones, which means I can't apply them, instead I need to go through like the whole _init_param to change some details




class DPTrainer_mt (object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I still keep this advice: this class can inherit DPTrainer and remove duplicate methods. Or you can consider to create a new class DPTrainerBase to merge the same methods.

@amcadmus
Copy link
Member

It is definitely not a good practice to write commit messages like "add multi-task v4". There are many instructions on how to write good commit message for example.

Please describe what you have done in the description of this PR. Good examples are #945 #921 Please follow them to update your PR. Thanks!

Copy link
Member

@amcadmus amcadmus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. other fit and loss (deep tensor) should be added with attribute name
  2. I am not sure if the data source/tests/multi-task/data/ is necessary for your test cases, please check.

log_level=log_level,
mpi_log=mpi_log
mpi_log=mpi_log,
try_distrib=jdata.get("with_distrib", False),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would not pass, RunOptions does not take try_distrib anymore

Comment on lines +103 to +104


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not add blank lines to the code.

all_stat['box'],
all_stat['type'],
all_stat['natoms_vec'],
all_stat['natoms_vec'], # this should be the global one(dim 14) or local one(dim 11)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you make this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about how to organize data more reasonably, because now we are using global type map everywhere, and each descriptor deal with its own atom type

log_level: int = 0,
mpi_log: str = "master"
mpi_log: str = "master",
try_distrib: bool = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why adding try_distrib?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not added by me, I merged the devel (maybe) and there are some inconsistence here in run options

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that may because that you didn't solve the conflict correctly.



def valid_on_the_fly(self,
fp,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fp parameter is not used anymore , pleas remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I check out the devel branch, but fp is not deprecated?

Comment on lines -60 to +66
self.type_map = type_map
ntypes = len(type_map)
self.type_map = type_map[:ntypes]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are reverting bug fixings!!

Comment on lines +156 to +166
def get_data_dict(self) -> dict:
def get_data_dict(self) -> (dict, str):
"""
Get the `data_dict`
"""
return self.data_dict
return self.data_dict, self.name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already provide the get_name method, I think it is not necessary to return self.name by get_data_dict

Comment on lines +129 to +130
#print("self.type_map",self.type_map)
#print("type_map",type_map)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz remove the debug code

Comment on lines +336 to +347
return b_data

return b_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz revert this change

return sys_probs


class DeepmdDataDocker() :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be implemented in a separate file.

for model_name in model.model_dict.keys():
sub_model = model.model_dict[model_name]
rcut_list.append(sub_model.get_rcut())
type_map = sub_model.get_type_map()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are the same for now

},
"descriptor" :[
{
"type": "se_e2_a",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take care of indents.

@yhliu918
Copy link
Contributor Author

Sorry, to call the multi task training, it is still using 'dp train -mt True input.json', however train and train_mt is separated

@yhliu918 yhliu918 requested a review from amcadmus August 17, 2021 10:12
@amcadmus
Copy link
Member

amcadmus commented Aug 18, 2021

Sorry, to call the multi task training, it is still using 'dp train -mt True input.json', however train and train_mt is separated

I see

Comment on lines +169 to +170
type=bool,
default=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to use action="store_true" instead.

Suggested change
type=bool,
default=False,
action="store_true",

Copy link
Member

@amcadmus amcadmus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz revert all changes on adding/removing blank lines.

mpi_log: str,
log_level: int,
log_path: Optional[str],
multi_task: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we still have the option in train?

Comment on lines +18 to +23
from deepmd.train.trainer_mt import DPMultitaskTrainer
from deepmd.utils.argcheck import normalize
from deepmd.utils.argcheck_mt import normalize_mt
from deepmd.utils.compat import updata_deepmd_input
from deepmd.utils.data_system import DeepmdDataSystem
from deepmd.utils.data_docker import DeepmdDataDocker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These imports should not be necessary

Comment on lines -213 to +229
def get_rcut(jdata):
descrpt_data = jdata['model']['descriptor']
def parse_rcut(descrpt_data):
rcut_list = []
if descrpt_data['type'] == 'hybrid':
for ii in descrpt_data['list']:
rcut_list.append(ii['rcut'])
else:
rcut_list.append(descrpt_data['rcut'])
return rcut_list

def get_rcut(jdata):
descrpt_data = jdata['model']['descriptor']
rcut_list = []
rcut_list.extend(parse_rcut(descrpt_data))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think you need to revise the get_rcut.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I don't split get_rcut into two function, I can't reuse it in the train_mt

all_stat['box'],
all_stat['type'],
all_stat['natoms_vec'],
all_stat['natoms_vec'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz avoid introduce such revision.

b_data["natoms_vec"] = self.natoms_vec[self.pick_idx]
b_data["default_mesh"] = self.default_mesh[self.pick_idx]
return b_data
return b_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a void such change

Comment on lines -511 to -512


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid such change

mesh,
input_dict,
suffix = '',
suffix, # a dict of suffix, including type_embed, descrpt and fitting.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should preserve the default value ''

Comment on lines +45 to +52
dd = dd.reshape([3,-1])
ref_loss = [24.8,19.6,
3.79,1.81,
2110.0,2110.0]

for ii in range(3):
for jj in range(2):
self.assertAlmostEqual(dd[ii][jj+1], ref_loss[ii*2+jj], places = 8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only compare the total loss?

@amcadmus amcadmus merged commit af2169c into deepmodeling:multi-task Aug 19, 2021
njzjz pushed a commit to njzjz/deepmd-kit that referenced this pull request Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants