Skip to content

Removes element guessing for TOPParser#2714

Merged
lilyminium merged 10 commits intoMDAnalysis:developfrom
IAlibay:top-noguess
Jun 12, 2020
Merged

Removes element guessing for TOPParser#2714
lilyminium merged 10 commits intoMDAnalysis:developfrom
IAlibay:top-noguess

Conversation

@IAlibay
Copy link
Member

@IAlibay IAlibay commented Jun 6, 2020

Fixes #2449, Partly fixes #2651

Changes made in this Pull Request:

  • Removes the element guessing default behaviour of the TOPParser

Other changes:

  • Cleans up the test_top.py to remove duplicate code
  • Adds new test datafile PRM19SBOPC which contains EPW atoms and CMAP entries (the latter may be useful in the future).

Note: Considering we are likely to go to 1.0.0 soon, I haven't updated the CHANGELOG and left the version as X.0.0. This PR is not necessary to go through before 1.0.0 is finalised, so I'll update this once it is reviewed or 1.0.0 happens (whichever happens first).

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending green CI -- it would be nice to have warnings and docs that point to the element guesser and/or give an example of how to use it (or to a tutorial for it), for users who have been relying on guessing elements.

IIRC they also don't actually add in the topology attribute for you, so I think users might have to add_TopologyAttr('elements', xxx) which is not immediately obvious. Prmtops are a pretty major format, lots of people don't mess around with dummy atoms/virtual sites/hard things, so I think some users who don't religiously track issues here could be unhappily surprised by this change.

# Warn user if elements not in topology
if 'elements' not in attrs:
msg = ("ATOMIC_NUMBER record not found, elements attribute will "
"not be populated")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add something about guess_elements (or whatever the method's called?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings now direct users to MDAnalysis.topology.guessers in 95cb49e


TODO:
Add support for Chamber-style topologies
Add support for atomnum topology attributes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this atomic number style attributes? Not really relevant here but we might want to avoid calling them atomnum to a) avoid confusion with resnum and b) avoid confusion with whatever Desmond calls atomnum

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might have misread #2362 (comment), I was under the impression that atomnum was our topology attribute for atomic numbers. As you mentioned in #2651 (comment) it probably would be an idea to have an atomic number attribute. I can probably just change this to "Add support for storing atomic numbers", and we can decide what we should call this topology attribute at a later date?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! We have an atomnum, but it's only used in the Desmond parser and is all 0s; not sure if that's a bug or real information.

>>> from MDAnalysis.tests.datafiles import DMS
>>> dms = mda.Universe(DMS)
>>> dms.atoms.names
array(['N', 'HT1', 'HT2', ..., 'C', 'OT1', 'OT2'], dtype=object)
>>> dms.atoms.atomnums
array([0, 0, 0, ..., 0, 0, 0], dtype=int32)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, the desmond thing is a sqlite database if I remember right, so not sure how we're not getting that right. I'd sooner coerce atomic numbers into elements for some uniformity rather than literalism in what the file provided.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd sooner coerce atomic numbers into elements for some uniformity rather than literalism in what the file provided.

So my thought there was that as @lilyminium mentioned #2651 (comment), whilst we have to trust that the information provided in topology files is accurate, there might be some use cases where (i.e. dummy atoms) where one wants to have a copy of the atomic number record to do some kind of further processing. Additionally, I think @cbouy mentioned that it might be good to have this available for RDKIT parsing. So it might be a good idea to have both attributes available.

That being said, there's an argument that having both is redundant and maybe it would be easier to assume atomic number -> element conversion is a "guess" and drop populating elements in the TOPParser (I think at the moment, it's the only one that does this kind of automatic conversion)?

My suggested way forward here would be that we pin this PR until after v1.0 is out, then we go ahead with this PR (my view is that if anything not guessing from atom names here is going to reduce user errors), and then we have a longer discussion on atomic numbers vs elements as a separate issue aimed at v2.0? This way if we happen to change our mind, it won't be reflected too much on users.


As of version X.0.0, elements are no longer guessed if ATOMIC_NUMBER records
are missing. In those scenarios, if elements are necessary, users will have
to invoke the element guessers after parsing the topology file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give an example or link to the relevant method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a link to MDAnalysis.topology.guessers, and then added detailed example uses of element guessing there. If this is reasonable, I'll try to add a more detailed tutorial to the user guide in the long term?

@codecov
Copy link

codecov bot commented Jun 6, 2020

Codecov Report

Merging #2714 into develop will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2714      +/-   ##
===========================================
- Coverage    91.31%   91.30%   -0.01%     
===========================================
  Files          176      176              
  Lines        24018    24009       -9     
  Branches      3160     3159       -1     
===========================================
- Hits         21931    21922       -9     
  Misses        1459     1459              
  Partials       628      628              
Impacted Files Coverage Δ
package/MDAnalysis/topology/guessers.py 100.00% <ø> (ø)
package/MDAnalysis/topology/TOPParser.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3da643f...a964695. Read the comment docs.

@IAlibay IAlibay requested a review from lilyminium June 10, 2020 17:40
@IAlibay
Copy link
Member Author

IAlibay commented Jun 10, 2020

All comments should have been addressed, and I've change the version number to 2.0. I've also gone ahead and taken "Add support for atomnum topology attributes" off the TODO list. I'll raise a separate issue on atomic numbers & the TOPParser.

Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just that version note should be changed. Thank you :-)


@pytest.mark.parametrize("parm, errmsgs", (
[PRM, [ATOMIC_NUMBER_MSG, COORDINATE_READER_MSG]],
[PRM7, [ATOMIC_NUMBER_MSG, COORDINATE_READER_MSG]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, much more sustainable way to check warnings than we've been doing before

- X.0.0 -> 2.0.0
- Missing ","

Co-authored-by: Lily Wang <31115101+lilyminium@users.noreply.github.com>
@lilyminium lilyminium merged commit 8710b5c into MDAnalysis:develop Jun 12, 2020
@IAlibay IAlibay deleted the top-noguess branch June 12, 2020 11:13
@orbeckst orbeckst added this to the 2.0 milestone Jun 26, 2020
@orbeckst
Copy link
Member

@IAlibay – should this PR be backported for 1.0.1 #2768 ?

I am currently assembling PR #2798 with the backport fixes for 1.0.1. If you want to add it to the backport, please do so. Any fix may go in.

@orbeckst
Copy link
Member

Also, feel free to add anything for the backport to the 1.0.x milestone for keeping track.

@IAlibay
Copy link
Member Author

IAlibay commented Jun 26, 2020

@orbeckst it makes things safer for users, so I'm not against backporting it. It might break a few workflows if they relied on automatically getting elements on reading prmtop files though.

I can make the PR and we can judge if it's too disruptive?

@orbeckst
Copy link
Member

orbeckst commented Jun 26, 2020

I don't think we need a PR (edit: to judge if it should be included). The question is if it is a fix that is needed for correctness (then it has a place in 1.0.1) or we consider it a better choice (and we're making a design decision – then it's more a change than a fix). The latter is perfectly suited for 2.0.0.

Edit: You know best what the impact is – decide based on the consideration if it's a fix or a change. (Disrupting workflows is only ok if it fixes incorrect behavior – at least that's my interpretation of semantic versioning.)

@IAlibay
Copy link
Member Author

IAlibay commented Jun 26, 2020

It's a bit of a tough call, technically MDA's own example files were leading to incorrect elements being guessed, however the decision to use the guesser to generate elements from atom types was intentional and we were warning users that this was happening. Since we're not exactly fixing the guessers but preventing users from automatically making what could be a bad decision, I'm going to says it's not "unintended behaviour" (if that makes sense) and therefore doesn't need to be backported to 1.0.1.

@orbeckst
Copy link
Member

Ok, thanks!

PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021
- Removes the element guessing default behaviour of the TOPParser
- Cleans up the test_top.py to remove duplicate code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Removing guess by default in TOPParser EP handling in parm7 files

5 participants