Adding the elements attribute to TPRParser.py (#2553)#2628
Adding the elements attribute to TPRParser.py (#2553)#2628AmeyaHarmalkar wants to merge 1 commit intoMDAnalysis:developfrom
Conversation
IAlibay
left a comment
There was a problem hiding this comment.
An initial review. There are tests failing, so these will probably need to be addressed first.
| atom_names = tpr_top.names.values.copy() | ||
| atom_mass = tpr_top.masses.values.copy() | ||
|
|
||
| if any(atom_names): |
There was a problem hiding this comment.
Either way, a guess is being made therefore a warning should be thrown to users.
There was a problem hiding this comment.
Yes, I can add a :
warnings.warn() in the atomic mass guess conditional
|
|
||
| if any(atom_names): | ||
| elements = guess_types(atom_names) | ||
| else: |
There was a problem hiding this comment.
May be a silly question, but is there a defined case where atom_names will not be returned?
There was a problem hiding this comment.
Umm, I am actually unsure. I was of the understanding that TPR always has coordinates, structure and mass inputs and velocities. Hence, I created a if else around atomnames and atomic-mass. I might be completely wrong though. Would really look forward to your suggestions as my experience with TPR files has been limited
There was a problem hiding this comment.
They're returned. The place to look for element-related stuff like atomic numbers is probably in atomkind somewhere:
mdanalysis/package/MDAnalysis/topology/tpr/utils.py
Lines 341 to 354 in 95d1909
@jbarnoud would know more.
| atom_mass = tpr_top.masses.values.copy() | ||
|
|
||
| if any(atom_names): | ||
| elements = guess_types(atom_names) |
There was a problem hiding this comment.
I'm not sure that guess_types is in more accurate as say an atomic_number guesser here. This probably warrants further discussion, as I don't think a concensus was agreed to in #2553. Tagging @lilyminium here.
There was a problem hiding this comment.
guess_types just passes it down to guess_atom_elements from the name, which futzes around with strings until one of them maybe matches an element, and returns just returns the (non-numeric) input if nothing is found. Using atomic numbers would be way better.
| elements = guess_types(atom_names) | ||
| else: | ||
| warnings.warn("Guessing elements from atomic mass.") | ||
| elements = guess_types(guess_elements(masses)) |
There was a problem hiding this comment.
As previously discussed, #2553 (comment), mass guessers tend to be rather messy for things like HMR. Whilst there would still be quite a few failure cases, in light of #2553 (comment) an atomic_number guesser might be more appropriate (at least it would allow us to capture Martini CG atoms properly).
|
|
||
| if any(atom_names): | ||
| elements = guess_types(atom_names) | ||
| else: |
There was a problem hiding this comment.
They're returned. The place to look for element-related stuff like atomic numbers is probably in atomkind somewhere:
mdanalysis/package/MDAnalysis/topology/tpr/utils.py
Lines 341 to 354 in 95d1909
@jbarnoud would know more.
| except: | ||
| # Approximating to the closest element atomic mass | ||
| for k,v in tables.masses.items(): | ||
| if round(atommass) == round(v): |
There was a problem hiding this comment.
I don't know about this one, I don't really want my united-atom CH2 group to become nitrogen. Plus all the general concerns about mass guessing that @IAlibay has already raised.
There was a problem hiding this comment.
But the united atom CH2 would already be identified as a non-physical element from the atomnames and get an empty string '' right?(as per the changes in the guess_atom_element from the PDBParser.py)
Should I move to atomic number completely instead if mass is too messy as I seem it is from the discussion?
| for k,v in tables.masses.items(): | ||
| if atommass == v: | ||
| return k | ||
| except: |
There was a problem hiding this comment.
Why are you wrapping this in a try/except? Which error are you expecting? It's better to specifically guard against your expected error than generally catch all of them.
| atom_mass = tpr_top.masses.values.copy() | ||
|
|
||
| if any(atom_names): | ||
| elements = guess_types(atom_names) |
There was a problem hiding this comment.
guess_types just passes it down to guess_atom_elements from the name, which futzes around with strings until one of them maybe matches an element, and returns just returns the (non-numeric) input if nothing is found. Using atomic numbers would be way better.
|
Thank you for the contribution, however the TPR format is an exact description of a simulation. There should be no guessing involved in its parsing. The masses are always present in the file, otherwise the simulation engine cannot work. The names are also always there. In many many cases, the elements are not pertinent and therefore ommited. Though they could be read directly from the file when appropriate. That would be a much better approach than guessing. |
|
@IAlibay @lilyminium thanks a lot for the comments and discussion. It clears a lot of my questions about the code hierarchy in topology. |
|
The object in the following line does contain the atomic number of the atoms, read from the file itself, if they are relevant: I do not know what they default to, but it should not be too difficult to figure out. When the value here is not an atomic number, then it means we should not provide an element as it would be meaningless. |
|
@AmeyaHarmalkar – is there an existing issue that this PR and PR #2627 are addressing? If so, please reference it. If there isn't an issue yet then it would be great if you raised one in the issue tracker and then reference your new issue. In this way we have a much better record of problems and the problem definition is neatly separated from the solution. |
|
@AmeyaHarmalkar I know it's in your title but if you put the issue number #2553 after "Fixes" in the first comment, GitHub magically links it to the issue. Also, according to #2553 (comment) the atomic number for a non-element is -1. |
Elements are read from the TPR. TPR files contain the atomic number for each atoms, when an atom does not match an element (some virtual sites, coarse-grained particles), the atomic number is set to -1. The TPR parser adds the 'elements' attribute only if at least 1 atom has its atomic number correspomding to an element. In that case, atoms that do not match an element have theire element set to an empty string. See #2553 Closes #2628
Fixes #2553
Changes made in this Pull Request:
Need suggestions on adding Tests for tpr files.
PR Checklist