Removes element guessing for TOPParser#2714
Conversation
There was a problem hiding this comment.
LGTM pending green CI -- it would be nice to have warnings and docs that point to the element guesser and/or give an example of how to use it (or to a tutorial for it), for users who have been relying on guessing elements.
IIRC they also don't actually add in the topology attribute for you, so I think users might have to add_TopologyAttr('elements', xxx) which is not immediately obvious. Prmtops are a pretty major format, lots of people don't mess around with dummy atoms/virtual sites/hard things, so I think some users who don't religiously track issues here could be unhappily surprised by this change.
| # Warn user if elements not in topology | ||
| if 'elements' not in attrs: | ||
| msg = ("ATOMIC_NUMBER record not found, elements attribute will " | ||
| "not be populated") |
There was a problem hiding this comment.
Could you add something about guess_elements (or whatever the method's called?)
There was a problem hiding this comment.
Warnings now direct users to MDAnalysis.topology.guessers in 95cb49e
|
|
||
| TODO: | ||
| Add support for Chamber-style topologies | ||
| Add support for atomnum topology attributes |
There was a problem hiding this comment.
Is this atomic number style attributes? Not really relevant here but we might want to avoid calling them atomnum to a) avoid confusion with resnum and b) avoid confusion with whatever Desmond calls atomnum
There was a problem hiding this comment.
I think I might have misread #2362 (comment), I was under the impression that atomnum was our topology attribute for atomic numbers. As you mentioned in #2651 (comment) it probably would be an idea to have an atomic number attribute. I can probably just change this to "Add support for storing atomic numbers", and we can decide what we should call this topology attribute at a later date?
There was a problem hiding this comment.
Sounds good! We have an atomnum, but it's only used in the Desmond parser and is all 0s; not sure if that's a bug or real information.
>>> from MDAnalysis.tests.datafiles import DMS
>>> dms = mda.Universe(DMS)
>>> dms.atoms.names
array(['N', 'HT1', 'HT2', ..., 'C', 'OT1', 'OT2'], dtype=object)
>>> dms.atoms.atomnums
array([0, 0, 0, ..., 0, 0, 0], dtype=int32)There was a problem hiding this comment.
Hm, the desmond thing is a sqlite database if I remember right, so not sure how we're not getting that right. I'd sooner coerce atomic numbers into elements for some uniformity rather than literalism in what the file provided.
There was a problem hiding this comment.
I'd sooner coerce atomic numbers into elements for some uniformity rather than literalism in what the file provided.
So my thought there was that as @lilyminium mentioned #2651 (comment), whilst we have to trust that the information provided in topology files is accurate, there might be some use cases where (i.e. dummy atoms) where one wants to have a copy of the atomic number record to do some kind of further processing. Additionally, I think @cbouy mentioned that it might be good to have this available for RDKIT parsing. So it might be a good idea to have both attributes available.
That being said, there's an argument that having both is redundant and maybe it would be easier to assume atomic number -> element conversion is a "guess" and drop populating elements in the TOPParser (I think at the moment, it's the only one that does this kind of automatic conversion)?
My suggested way forward here would be that we pin this PR until after v1.0 is out, then we go ahead with this PR (my view is that if anything not guessing from atom names here is going to reduce user errors), and then we have a longer discussion on atomic numbers vs elements as a separate issue aimed at v2.0? This way if we happen to change our mind, it won't be reflected too much on users.
|
|
||
| As of version X.0.0, elements are no longer guessed if ATOMIC_NUMBER records | ||
| are missing. In those scenarios, if elements are necessary, users will have | ||
| to invoke the element guessers after parsing the topology file. |
There was a problem hiding this comment.
Could you give an example or link to the relevant method?
There was a problem hiding this comment.
I've added a link to MDAnalysis.topology.guessers, and then added detailed example uses of element guessing there. If this is reasonable, I'll try to add a more detailed tutorial to the user guide in the long term?
Codecov Report
@@ Coverage Diff @@
## develop #2714 +/- ##
===========================================
- Coverage 91.31% 91.30% -0.01%
===========================================
Files 176 176
Lines 24018 24009 -9
Branches 3160 3159 -1
===========================================
- Hits 21931 21922 -9
Misses 1459 1459
Partials 628 628
Continue to review full report at Codecov.
|
Adds example uses of how to guess elements
|
All comments should have been addressed, and I've change the version number to 2.0. I've also gone ahead and taken "Add support for atomnum topology attributes" off the TODO list. I'll raise a separate issue on atomic numbers & the TOPParser. |
lilyminium
left a comment
There was a problem hiding this comment.
LGTM, just that version note should be changed. Thank you :-)
|
|
||
| @pytest.mark.parametrize("parm, errmsgs", ( | ||
| [PRM, [ATOMIC_NUMBER_MSG, COORDINATE_READER_MSG]], | ||
| [PRM7, [ATOMIC_NUMBER_MSG, COORDINATE_READER_MSG]], |
There was a problem hiding this comment.
Nice, much more sustainable way to check warnings than we've been doing before
- X.0.0 -> 2.0.0 - Missing "," Co-authored-by: Lily Wang <31115101+lilyminium@users.noreply.github.com>
|
Also, feel free to add anything for the backport to the 1.0.x milestone for keeping track. |
|
@orbeckst it makes things safer for users, so I'm not against backporting it. It might break a few workflows if they relied on automatically getting elements on reading prmtop files though. I can make the PR and we can judge if it's too disruptive? |
|
I don't think we need a PR (edit: to judge if it should be included). The question is if it is a fix that is needed for correctness (then it has a place in 1.0.1) or we consider it a better choice (and we're making a design decision – then it's more a change than a fix). The latter is perfectly suited for 2.0.0. Edit: You know best what the impact is – decide based on the consideration if it's a fix or a change. (Disrupting workflows is only ok if it fixes incorrect behavior – at least that's my interpretation of semantic versioning.) |
|
It's a bit of a tough call, technically MDA's own example files were leading to incorrect elements being guessed, however the decision to use the guesser to generate elements from atom types was intentional and we were warning users that this was happening. Since we're not exactly fixing the guessers but preventing users from automatically making what could be a bad decision, I'm going to says it's not "unintended behaviour" (if that makes sense) and therefore doesn't need to be backported to 1.0.1. |
|
Ok, thanks! |
- Removes the element guessing default behaviour of the TOPParser - Cleans up the test_top.py to remove duplicate code
Fixes #2449, Partly fixes #2651
Changes made in this Pull Request:
Other changes:
Note: Considering we are likely to go to 1.0.0 soon, I haven't updated the CHANGELOG and left the version as X.0.0. This PR is not necessary to go through before 1.0.0 is finalised, so I'll update this once it is reviewed or 1.0.0 happens (whichever happens first).
PR Checklist