Skip to content

Adds PDB element writing and fixes reading#3001

Merged
lilyminium merged 18 commits intoMDAnalysis:developfrom
IAlibay:new-pdb-elems
Nov 16, 2020
Merged

Adds PDB element writing and fixes reading#3001
lilyminium merged 18 commits intoMDAnalysis:developfrom
IAlibay:new-pdb-elems

Conversation

@IAlibay
Copy link
Member

@IAlibay IAlibay commented Oct 20, 2020

Completes/supersedes #2442 Fixes #2422 #2423

Changes made in this Pull Request:

  • PDB parser now allows for partial element parsing, setting an empty record if the element is not recognised.
  • PDB writer now uses the elements attribute instead of guessing.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@pep8speaks
Copy link

pep8speaks commented Oct 20, 2020

Hello @IAlibay! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-15 23:07:08 UTC

@IAlibay IAlibay requested a review from jbarnoud October 20, 2020 20:09
@IAlibay
Copy link
Member Author

IAlibay commented Oct 20, 2020

There was quite a lot of discussion on #2422 about what to do when there's empty/unknown elements records.

The approach taken here reflects the discussions in #2422, and other places where we've taken the conservative stance that unknown elements are not assigned an element record on reading, and therefore would not have an elements record written down either.

I'm sure many will have strong opinions here, including but not limited to; @lilyminium, @RMeli, and @richardjgowers.

@codecov
Copy link

codecov bot commented Oct 20, 2020

Codecov Report

Merging #3001 (97bad92) into develop (51014e4) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #3001   +/-   ##
========================================
  Coverage    93.09%   93.09%           
========================================
  Files          186      186           
  Lines        24663    24665    +2     
  Branches      3197     3195    -2     
========================================
+ Hits         22959    22961    +2     
  Misses        1656     1656           
  Partials        48       48           
Impacted Files Coverage Δ
package/MDAnalysis/coordinates/PDB.py 94.45% <100.00%> (+0.01%) ⬆️
package/MDAnalysis/topology/PDBParser.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51014e4...97bad92. Read the comment docs.

vals['tempFactor'] = tempfactors[i]
vals['segID'] = segids[i][:4]
vals['element'] = guess_atom_element(atomnames[i].strip())[:2]
vals['element'] = elements[i][:2]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have an upper() call? I've seen PDB files with both mixed case or just all upper case. We enforce UpperLower within MDA, but it would be best to write out in the manner that's consistent, thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like .capitalize(). That being said, given that we already validate and capitalize elements on input, if the user reeeeeeally wants bespoke capitalisation then maybe we should let them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we normalize all the inputs to be capitalize(), which is where this gets complicated, I prefer then converting back to all caps/upper() elements because I think we see these a bit more often in PDBs, but if users had a specific preference then we'd be overriding that :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that all-caps used to be the spec, and here is an example of capitalized symbols causing issues in NGLView -- I vote for upper case unless/until someone specifically raises an issue about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's convincing enough for me, I've changed it to upper in bd220dc :)

@RMeli
Copy link
Member

RMeli commented Oct 20, 2020

IMHO, not assigning unknown elements is very sensible. I'd rather have the code fail cleanly because the elements attribute is missing instead of getting erroneous results because of an incorrect guess.

Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a couple nitpicks. So the Atomtypes attribute remains unchecked or guessed elements, and the Elements attribute is the same information run through a sensibility filter?

vals['tempFactor'] = tempfactors[i]
vals['segID'] = segids[i][:4]
vals['element'] = guess_atom_element(atomnames[i].strip())[:2]
vals['element'] = elements[i][:2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like .capitalize(). That being said, given that we already validate and capitalize elements on input, if the user reeeeeeally wants bespoke capitalisation then maybe we should let them.

Co-authored-by: Lily Wang <31115101+lilyminium@users.noreply.github.com>
@IAlibay
Copy link
Member Author

IAlibay commented Oct 21, 2020

Looks good, a couple nitpicks. So the Atomtypes attribute remains unchecked or guessed elements, and the Elements attribute is the same information run through a sensibility filter?

So this is an interesting one that links back to #2918, personally I wouldn't ever guess on read, but we have this default need for atomtypes, and what an atom type isn't properly defined. Given that atom types can be "atom name, atom element, or force field atom type" maybe the answer here is to assign atom types to the raw input atom name?

My only worry here is how much would break if we did this (I have no idea for how widely used atomtype is downstream).

@lilyminium
Copy link
Member

I would also be concerned that changing atomtype to name would break something, especially as elements weren't in our PDB format until recently iirc. Let's just keep as is until we figure out what to do with the Atomtype attribute as a whole?



@pytest.fixture
def dummy_universe_without_elements():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you look at minimizing the number of warnings that gets issued by your new tests? I used 2 different approaches for that in #2886: filling all the required attributes in the fixture or filtering the warnings in the tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3d7857d should have done the trick, only warnings left are parmed's ABCs from collections import.

@IAlibay IAlibay requested a review from lilyminium October 28, 2020 16:25
Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @IAlibay and sorry for the delay! 😶

@lilyminium lilyminium merged commit e9d0e88 into MDAnalysis:develop Nov 16, 2020
@IAlibay IAlibay deleted the new-pdb-elems branch November 19, 2020 00:00
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021
- Fixes MDAnalysis#2423 and MDAnalysis#2422 
- PDB parser now allows for partial element parsing, setting an empty record if the element is not recognised.
- PDB writer now uses the elements attribute instead of guessing.
@IAlibay IAlibay mentioned this pull request Feb 8, 2022
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PDB parser ignores the element column

6 participants