Removes element guessing for TOPParser by IAlibay · Pull Request #2714 · MDAnalysis/mdanalysis

IAlibay · 2020-06-06T21:25:31Z

Fixes #2449, Partly fixes #2651

Changes made in this Pull Request:

Removes the element guessing default behaviour of the TOPParser

Other changes:

Cleans up the test_top.py to remove duplicate code
Adds new test datafile PRM19SBOPC which contains EPW atoms and CMAP entries (the latter may be useful in the future).

Note: Considering we are likely to go to 1.0.0 soon, I haven't updated the CHANGELOG and left the version as X.0.0. This PR is not necessary to go through before 1.0.0 is finalised, so I'll update this once it is reviewed or 1.0.0 happens (whichever happens first).

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

lilyminium

LGTM pending green CI -- it would be nice to have warnings and docs that point to the element guesser and/or give an example of how to use it (or to a tutorial for it), for users who have been relying on guessing elements.

IIRC they also don't actually add in the topology attribute for you, so I think users might have to add_TopologyAttr('elements', xxx) which is not immediately obvious. Prmtops are a pretty major format, lots of people don't mess around with dummy atoms/virtual sites/hard things, so I think some users who don't religiously track issues here could be unhappily surprised by this change.

lilyminium · 2020-06-06T22:20:35Z

package/MDAnalysis/topology/TOPParser.py

+        # Warn user if elements not in topology
+        if 'elements' not in attrs:
+            msg = ("ATOMIC_NUMBER record not found, elements attribute will "
+                   "not be populated")


Could you add something about guess_elements (or whatever the method's called?)

Warnings now direct users to MDAnalysis.topology.guessers in 95cb49e

lilyminium · 2020-06-06T22:23:39Z

package/MDAnalysis/topology/TOPParser.py


 TODO:
  Add support for Chamber-style topologies
+  Add support for atomnum topology attributes


Is this atomic number style attributes? Not really relevant here but we might want to avoid calling them atomnum to a) avoid confusion with resnum and b) avoid confusion with whatever Desmond calls atomnum

I think I might have misread #2362 (comment), I was under the impression that atomnum was our topology attribute for atomic numbers. As you mentioned in #2651 (comment) it probably would be an idea to have an atomic number attribute. I can probably just change this to "Add support for storing atomic numbers", and we can decide what we should call this topology attribute at a later date?

Sounds good! We have an atomnum, but it's only used in the Desmond parser and is all 0s; not sure if that's a bug or real information.

>>> from MDAnalysis.tests.datafiles import DMS >>> dms = mda.Universe(DMS) >>> dms.atoms.names array(['N', 'HT1', 'HT2', ..., 'C', 'OT1', 'OT2'], dtype=object) >>> dms.atoms.atomnums array([0, 0, 0, ..., 0, 0, 0], dtype=int32)

Hm, the desmond thing is a sqlite database if I remember right, so not sure how we're not getting that right. I'd sooner coerce atomic numbers into elements for some uniformity rather than literalism in what the file provided.

I'd sooner coerce atomic numbers into elements for some uniformity rather than literalism in what the file provided.

So my thought there was that as @lilyminium mentioned #2651 (comment), whilst we have to trust that the information provided in topology files is accurate, there might be some use cases where (i.e. dummy atoms) where one wants to have a copy of the atomic number record to do some kind of further processing. Additionally, I think @cbouy mentioned that it might be good to have this available for RDKIT parsing. So it might be a good idea to have both attributes available.

That being said, there's an argument that having both is redundant and maybe it would be easier to assume atomic number -> element conversion is a "guess" and drop populating elements in the TOPParser (I think at the moment, it's the only one that does this kind of automatic conversion)?

My suggested way forward here would be that we pin this PR until after v1.0 is out, then we go ahead with this PR (my view is that if anything not guessing from atom names here is going to reduce user errors), and then we have a longer discussion on atomic numbers vs elements as a separate issue aimed at v2.0? This way if we happen to change our mind, it won't be reflected too much on users.

lilyminium · 2020-06-06T22:24:24Z

package/MDAnalysis/topology/TOPParser.py

+
+   As of version X.0.0, elements are no longer guessed if ATOMIC_NUMBER records
+   are missing. In those scenarios, if elements are necessary, users will have
+   to invoke the element guessers after parsing the topology file.


Could you give an example or link to the relevant method?

I've added a link to MDAnalysis.topology.guessers, and then added detailed example uses of element guessing there. If this is reasonable, I'll try to add a more detailed tutorial to the user guide in the long term?

codecov · 2020-06-06T23:01:07Z

Codecov Report

Merging #2714 into develop will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop    #2714      +/-   ##
===========================================
- Coverage    91.31%   91.30%   -0.01%     
===========================================
  Files          176      176              
  Lines        24018    24009       -9     
  Branches      3160     3159       -1     
===========================================
- Hits         21931    21922       -9     
  Misses        1459     1459              
  Partials       628      628

Impacted Files	Coverage Δ
package/MDAnalysis/topology/guessers.py	`100.00% <ø> (ø)`
package/MDAnalysis/topology/TOPParser.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3da643f...a964695. Read the comment docs.

Adds example uses of how to guess elements

IAlibay · 2020-06-10T17:42:38Z

All comments should have been addressed, and I've change the version number to 2.0. I've also gone ahead and taken "Add support for atomnum topology attributes" off the TODO list. I'll raise a separate issue on atomic numbers & the TOPParser.

lilyminium

LGTM, just that version note should be changed. Thank you :-)

package/MDAnalysis/topology/TOPParser.py

testsuite/MDAnalysisTests/topology/test_top.py

lilyminium · 2020-06-10T23:04:31Z

testsuite/MDAnalysisTests/topology/test_top.py

+
+    @pytest.mark.parametrize("parm, errmsgs", (
+        [PRM, [ATOMIC_NUMBER_MSG, COORDINATE_READER_MSG]],
+        [PRM7, [ATOMIC_NUMBER_MSG, COORDINATE_READER_MSG]],


Nice, much more sustainable way to check warnings than we've been doing before

- X.0.0 -> 2.0.0 - Missing "," Co-authored-by: Lily Wang <31115101+lilyminium@users.noreply.github.com>

orbeckst · 2020-06-26T18:26:29Z

@IAlibay – should this PR be backported for 1.0.1 #2768 ?

I am currently assembling PR #2798 with the backport fixes for 1.0.1. If you want to add it to the backport, please do so. Any fix may go in.

orbeckst · 2020-06-26T18:27:27Z

Also, feel free to add anything for the backport to the 1.0.x milestone for keeping track.

IAlibay · 2020-06-26T18:31:36Z

@orbeckst it makes things safer for users, so I'm not against backporting it. It might break a few workflows if they relied on automatically getting elements on reading prmtop files though.

I can make the PR and we can judge if it's too disruptive?

orbeckst · 2020-06-26T18:34:23Z

I don't think we need a PR (edit: to judge if it should be included). The question is if it is a fix that is needed for correctness (then it has a place in 1.0.1) or we consider it a better choice (and we're making a design decision – then it's more a change than a fix). The latter is perfectly suited for 2.0.0.

Edit: You know best what the impact is – decide based on the consideration if it's a fix or a change. (Disrupting workflows is only ok if it fixes incorrect behavior – at least that's my interpretation of semantic versioning.)

IAlibay · 2020-06-26T20:53:21Z

It's a bit of a tough call, technically MDA's own example files were leading to incorrect elements being guessed, however the decision to use the guesser to generate elements from atom types was intentional and we were warning users that this was happening. Since we're not exactly fixing the guessers but preventing users from automatically making what could be a bad decision, I'm going to says it's not "unintended behaviour" (if that makes sense) and therefore doesn't need to be backported to 1.0.1.

orbeckst · 2020-06-26T22:02:43Z

Ok, thanks!

- Removes the element guessing default behaviour of the TOPParser - Cleans up the test_top.py to remove duplicate code

IAlibay and others added 3 commits June 6, 2020 18:42

remove TOPParser elements guessing

fa18234

Adds tests for Issue 2449 and 2651

574091a

PEP8 fixes

2e21c49

lilyminium requested changes Jun 6, 2020

View reviewed changes

IAlibay and others added 5 commits June 7, 2020 01:19

guess elements examples documentation

95cb49e

Adds example uses of how to guess elements

Changes :mod: to :func:

dec0221

Example code typos

2a4e852

Merge branch 'develop' into top-noguess

5234de5

Updates for version 2.0

8204c28

IAlibay requested a review from lilyminium June 10, 2020 17:40

lilyminium requested changes Jun 10, 2020

View reviewed changes

Apply suggestions from code review

203c78b

- X.0.0 -> 2.0.0 - Missing "," Co-authored-by: Lily Wang <31115101+lilyminium@users.noreply.github.com>

lilyminium approved these changes Jun 10, 2020

View reviewed changes

Merge branch 'develop' into top-noguess

a964695

lilyminium merged commit 8710b5c into MDAnalysis:develop Jun 12, 2020

IAlibay deleted the top-noguess branch June 12, 2020 11:13

IAlibay mentioned this pull request Jun 15, 2020

Adding atomic number attributes to topology readers #2758

Open

orbeckst added this to the 2.0 milestone Jun 26, 2020

PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021

Removes element guessing for TOPParser (MDAnalysis#2714)

a6b34d5

- Removes the element guessing default behaviour of the TOPParser - Cleans up the test_top.py to remove duplicate code

fiona-naughton added defect Component-Topology labels Sep 26, 2023

Conversation

IAlibay commented Jun 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Uh oh!

lilyminium left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jun 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

IAlibay commented Jun 10, 2020

Uh oh!

lilyminium left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

orbeckst commented Jun 26, 2020

Uh oh!

orbeckst commented Jun 26, 2020

Uh oh!

IAlibay commented Jun 26, 2020

Uh oh!

orbeckst commented Jun 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IAlibay commented Jun 26, 2020

Uh oh!

orbeckst commented Jun 26, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

IAlibay commented Jun 6, 2020 •

edited

Loading

lilyminium left a comment •

edited

Loading

codecov bot commented Jun 6, 2020 •

edited

Loading

orbeckst commented Jun 26, 2020 •

edited

Loading