Skip to content

Adding the elements attribute to TPRParser.py (#2553)#2628

Closed
AmeyaHarmalkar wants to merge 1 commit intoMDAnalysis:developfrom
AmeyaHarmalkar:tpr_attr
Closed

Adding the elements attribute to TPRParser.py (#2553)#2628
AmeyaHarmalkar wants to merge 1 commit intoMDAnalysis:developfrom
AmeyaHarmalkar:tpr_attr

Conversation

@AmeyaHarmalkar
Copy link
Contributor

@AmeyaHarmalkar AmeyaHarmalkar commented Mar 15, 2020

Fixes #2553

Changes made in this Pull Request:

  • Added two guess functions in the guesser.py to guess elements from atomic mass
  • Added an element attribute to the TPRParser.py to provide the elements attribute

Need suggestions on adding Tests for tpr files.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@AmeyaHarmalkar AmeyaHarmalkar changed the title Adding the elements attribute to TPRParser.py Adding the elements attribute to TPRParser.py (#2553) Mar 15, 2020
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An initial review. There are tests failing, so these will probably need to be addressed first.

atom_names = tpr_top.names.values.copy()
atom_mass = tpr_top.masses.values.copy()

if any(atom_names):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, a guess is being made therefore a warning should be thrown to users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can add a :
warnings.warn() in the atomic mass guess conditional


if any(atom_names):
elements = guess_types(atom_names)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be a silly question, but is there a defined case where atom_names will not be returned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, I am actually unsure. I was of the understanding that TPR always has coordinates, structure and mass inputs and velocities. Hence, I created a if else around atomnames and atomic-mass. I might be completely wrong though. Would really look forward to your suggestions as my experience with TPR files has been limited

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're returned. The place to look for element-related stuff like atomic numbers is probably in atomkind somewhere:

for j in range(mb.molb_nmol):
mt = mtop.moltypes[mb.molb_type] # mt: molecule type
for atomkind in mt.atomkinds:
atomids.append(atomkind.id + atom_start_ndx)
segids.append(segid)
resids.append(atomkind.resid + res_start_ndx)
resnames.append(atomkind.resname.decode())
atomnames.append(atomkind.name.decode())
atomtypes.append(atomkind.type.decode())
moltypes.append(molblock)
molnums.append(molnum)
charges.append(atomkind.charge)
masses.append(atomkind.mass)
molnum += 1

@jbarnoud would know more.

atom_mass = tpr_top.masses.values.copy()

if any(atom_names):
elements = guess_types(atom_names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that guess_types is in more accurate as say an atomic_number guesser here. This probably warrants further discussion, as I don't think a concensus was agreed to in #2553. Tagging @lilyminium here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess_types just passes it down to guess_atom_elements from the name, which futzes around with strings until one of them maybe matches an element, and returns just returns the (non-numeric) input if nothing is found. Using atomic numbers would be way better.

elements = guess_types(atom_names)
else:
warnings.warn("Guessing elements from atomic mass.")
elements = guess_types(guess_elements(masses))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As previously discussed, #2553 (comment), mass guessers tend to be rather messy for things like HMR. Whilst there would still be quite a few failure cases, in light of #2553 (comment) an atomic_number guesser might be more appropriate (at least it would allow us to capture Martini CG atoms properly).


if any(atom_names):
elements = guess_types(atom_names)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're returned. The place to look for element-related stuff like atomic numbers is probably in atomkind somewhere:

for j in range(mb.molb_nmol):
mt = mtop.moltypes[mb.molb_type] # mt: molecule type
for atomkind in mt.atomkinds:
atomids.append(atomkind.id + atom_start_ndx)
segids.append(segid)
resids.append(atomkind.resid + res_start_ndx)
resnames.append(atomkind.resname.decode())
atomnames.append(atomkind.name.decode())
atomtypes.append(atomkind.type.decode())
moltypes.append(molblock)
molnums.append(molnum)
charges.append(atomkind.charge)
masses.append(atomkind.mass)
molnum += 1

@jbarnoud would know more.

except:
# Approximating to the closest element atomic mass
for k,v in tables.masses.items():
if round(atommass) == round(v):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about this one, I don't really want my united-atom CH2 group to become nitrogen. Plus all the general concerns about mass guessing that @IAlibay has already raised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the united atom CH2 would already be identified as a non-physical element from the atomnames and get an empty string '' right?(as per the changes in the guess_atom_element from the PDBParser.py)
Should I move to atomic number completely instead if mass is too messy as I seem it is from the discussion?

for k,v in tables.masses.items():
if atommass == v:
return k
except:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you wrapping this in a try/except? Which error are you expecting? It's better to specifically guard against your expected error than generally catch all of them.

atom_mass = tpr_top.masses.values.copy()

if any(atom_names):
elements = guess_types(atom_names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess_types just passes it down to guess_atom_elements from the name, which futzes around with strings until one of them maybe matches an element, and returns just returns the (non-numeric) input if nothing is found. Using atomic numbers would be way better.

@jbarnoud
Copy link
Contributor

Thank you for the contribution, however the TPR format is an exact description of a simulation. There should be no guessing involved in its parsing.

The masses are always present in the file, otherwise the simulation engine cannot work. The names are also always there.

In many many cases, the elements are not pertinent and therefore ommited. Though they could be read directly from the file when appropriate. That would be a much better approach than guessing.

@AmeyaHarmalkar
Copy link
Contributor Author

AmeyaHarmalkar commented Mar 16, 2020

@IAlibay @lilyminium thanks a lot for the comments and discussion. It clears a lot of my questions about the code hierarchy in topology.
@jbarnoud that really clears a few questions in my mind about the TPR format.
So, just an element parser based on names would suffice is what your opinion is? I can add the functionality to check on the basis of names and decide whether : 1. It's a valid element and physically present. 2. It's physically present but a non-physical element (like the CG atoms) or 3. The element info is not present or can't be deduced from the Atom name?

@jbarnoud
Copy link
Contributor

jbarnoud commented Mar 16, 2020

The object in the following line does contain the atomic number of the atoms, read from the file itself, if they are relevant:

atoms_obj = do_atoms(data, symtab, fver)
I do not know what they default to, but it should not be too difficult to figure out. When the value here is not an atomic number, then it means we should not provide an element as it would be meaningless.

@orbeckst
Copy link
Member

@AmeyaHarmalkar – is there an existing issue that this PR and PR #2627 are addressing? If so, please reference it.

If there isn't an issue yet then it would be great if you raised one in the issue tracker and then reference your new issue. In this way we have a much better record of problems and the problem definition is neatly separated from the solution.

@lilyminium
Copy link
Member

@AmeyaHarmalkar I know it's in your title but if you put the issue number #2553 after "Fixes" in the first comment, GitHub magically links it to the issue.

Also, according to #2553 (comment) the atomic number for a non-element is -1.

jbarnoud added a commit that referenced this pull request Jul 19, 2020
Elements are read from the TPR. TPR files contain the atomic number for
each atoms, when an atom does not match an element (some virtual sites,
coarse-grained particles), the atomic number is set to -1.

The TPR parser adds the 'elements' attribute only if at least 1 atom has
its atomic number correspomding to an element. In that case, atoms that
do not match an element have theire element set to an empty string.

See #2553
Closes #2628
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parsers should try and provide an Elements attribute

5 participants