Adds PDB element writing and fixes reading by IAlibay · Pull Request #3001 · MDAnalysis/mdanalysis

IAlibay · 2020-10-20T20:08:48Z

Completes/supersedes #2442 Fixes #2422 #2423

Changes made in this Pull Request:

PDB parser now allows for partial element parsing, setting an empty record if the element is not recognised.
PDB writer now uses the elements attribute instead of guessing.

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

See MDAnalysis#2422

…db-elems

pep8speaks · 2020-10-20T20:08:57Z

Hello @IAlibay! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-15 23:07:08 UTC

IAlibay · 2020-10-20T20:18:32Z

There was quite a lot of discussion on #2422 about what to do when there's empty/unknown elements records.

The approach taken here reflects the discussions in #2422, and other places where we've taken the conservative stance that unknown elements are not assigned an element record on reading, and therefore would not have an elements record written down either.

I'm sure many will have strong opinions here, including but not limited to; @lilyminium, @RMeli, and @richardjgowers.

codecov · 2020-10-20T21:42:24Z

Codecov Report

Merging #3001 (97bad92) into develop (51014e4) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           develop    #3001   +/-   ##
========================================
  Coverage    93.09%   93.09%           
========================================
  Files          186      186           
  Lines        24663    24665    +2     
  Branches      3197     3195    -2     
========================================
+ Hits         22959    22961    +2     
  Misses        1656     1656           
  Partials        48       48

Impacted Files	Coverage Δ
package/MDAnalysis/coordinates/PDB.py	`94.45% <100.00%> (+0.01%)`	⬆️
package/MDAnalysis/topology/PDBParser.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51014e4...97bad92. Read the comment docs.

IAlibay · 2020-10-20T21:50:45Z

package/MDAnalysis/coordinates/PDB.py

            vals['tempFactor'] = tempfactors[i]
            vals['segID'] = segids[i][:4]
-            vals['element'] = guess_atom_element(atomnames[i].strip())[:2]
+            vals['element'] = elements[i][:2]


Should this have an upper() call? I've seen PDB files with both mixed case or just all upper case. We enforce UpperLower within MDA, but it would be best to write out in the manner that's consistent, thoughts?

Yeah I like .capitalize(). That being said, given that we already validate and capitalize elements on input, if the user reeeeeeally wants bespoke capitalisation then maybe we should let them.

So we normalize all the inputs to be capitalize(), which is where this gets complicated, I prefer then converting back to all caps/upper() elements because I think we see these a bit more often in PDBs, but if users had a specific preference then we'd be overriding that :/

It seems that all-caps used to be the spec, and here is an example of capitalized symbols causing issues in NGLView -- I vote for upper case unless/until someone specifically raises an issue about it.

That's convincing enough for me, I've changed it to upper in bd220dc :)

RMeli · 2020-10-20T22:00:02Z

IMHO, not assigning unknown elements is very sensible. I'd rather have the code fail cleanly because the elements attribute is missing instead of getting erroneous results because of an incorrect guess.

lilyminium

Looks good, a couple nitpicks. So the Atomtypes attribute remains unchecked or guessed elements, and the Elements attribute is the same information run through a sensibility filter?

package/MDAnalysis/coordinates/PDB.py

lilyminium · 2020-10-21T22:48:08Z

package/MDAnalysis/coordinates/PDB.py

            vals['tempFactor'] = tempfactors[i]
            vals['segID'] = segids[i][:4]
-            vals['element'] = guess_atom_element(atomnames[i].strip())[:2]
+            vals['element'] = elements[i][:2]


Yeah I like .capitalize(). That being said, given that we already validate and capitalize elements on input, if the user reeeeeeally wants bespoke capitalisation then maybe we should let them.

package/MDAnalysis/topology/PDBParser.py

Co-authored-by: Lily Wang <31115101+lilyminium@users.noreply.github.com>

IAlibay · 2020-10-21T23:25:03Z

Looks good, a couple nitpicks. So the Atomtypes attribute remains unchecked or guessed elements, and the Elements attribute is the same information run through a sensibility filter?

So this is an interesting one that links back to #2918, personally I wouldn't ever guess on read, but we have this default need for atomtypes, and what an atom type isn't properly defined. Given that atom types can be "atom name, atom element, or force field atom type" maybe the answer here is to assign atom types to the raw input atom name?

My only worry here is how much would break if we did this (I have no idea for how widely used atomtype is downstream).

lilyminium · 2020-10-22T05:47:28Z

I would also be concerned that changing atomtype to name would break something, especially as elements weren't in our PDB format until recently iirc. Let's just keep as is until we figure out what to do with the Atomtype attribute as a whole?

jbarnoud · 2020-10-22T08:33:08Z

testsuite/MDAnalysisTests/coordinates/test_pdb.py



+@pytest.fixture
+def dummy_universe_without_elements():


Could you look at minimizing the number of warnings that gets issued by your new tests? I used 2 different approaches for that in #2886: filling all the required attributes in the fixture or filtering the warnings in the tests.

3d7857d should have done the trick, only warnings left are parmed's ABCs from collections import.

…w-pdb-elems

lilyminium

LGTM, thank you @IAlibay and sorry for the delay! 😶

- Fixes MDAnalysis#2423 and MDAnalysis#2422 - PDB parser now allows for partial element parsing, setting an empty record if the element is not recognised. - PDB writer now uses the elements attribute instead of guessing.

jbarnoud and others added 8 commits January 6, 2020 18:54

Add tests for elements attribute from PDB

6fa9bbc

See MDAnalysis#2422

Populate the 'elements' topology attribute when reading PDB

daf6af2

Add tests for MDAnalysis#2423

2cc3a7e

Use elements topology attr in PDB writer when available

60088bd

Small simplification

c8717c5

Merge branch 'develop' of github.com:MDAnalysis/mdanalysis into new-p…

4347a81

…db-elems

Updates PDB element parsing to avoid guessing

59e2607

Updates PDB elements docs and changelog

8e38a9a

IAlibay requested a review from jbarnoud October 20, 2020 20:09

PEP8 fixes

f174321

IAlibay added Component-Topology Component-Writers Format-PDB labels Oct 20, 2020

Fixes failing hole2 warning test

3d97d56

IAlibay commented Oct 20, 2020

View reviewed changes

Merge branch 'develop' into new-pdb-elems

943308b

lilyminium requested changes Oct 21, 2020

View reviewed changes

Apply suggestions from code review

7223867

Co-authored-by: Lily Wang <31115101+lilyminium@users.noreply.github.com>

jbarnoud reviewed Oct 22, 2020

View reviewed changes

IAlibay added 5 commits October 22, 2020 22:13

Merge branch 'develop' into new-pdb-elems

ae6e4b2

get rid of non-necessary warnings test_pdb coordinates

357fda1

fix pdb topology tests and quick PEP8 fixes

3d7857d

Merge branch 'new-pdb-elems' of github.com:IAlibay/mdanalysis into ne…

8789ce5

…w-pdb-elems

Enfore upper elements in PDB files

bd220dc

IAlibay requested a review from lilyminium October 28, 2020 16:25

lilyminium approved these changes Nov 15, 2020

View reviewed changes

Merge branch 'develop' into new-pdb-elems

97bad92

lilyminium merged commit e9d0e88 into MDAnalysis:develop Nov 16, 2020

IAlibay deleted the new-pdb-elems branch November 19, 2020 00:00

This was referenced Nov 19, 2020

PDBWriter writes some C as Ca and some N as Na #2732

Closed

[WIP] PDB parser and writer use the "elements" topology attribute #2442

Closed

IAlibay mentioned this pull request Mar 14, 2021

Write out chainID to PDB instead of segID #3157

Merged

4 tasks

IAlibay mentioned this pull request Feb 8, 2022

Update tables.py #1808

Closed

4 tasks

fiona-naughton added the defect label Sep 26, 2023



		@pytest.fixture
		def dummy_universe_without_elements():

Conversation

IAlibay commented Oct 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Uh oh!

pep8speaks commented Oct 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2020-11-15 23:07:08 UTC

Uh oh!

IAlibay commented Oct 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RMeli commented Oct 20, 2020

Uh oh!

lilyminium left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

IAlibay commented Oct 21, 2020

Uh oh!

lilyminium commented Oct 22, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lilyminium left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

IAlibay commented Oct 20, 2020 •

edited

Loading

pep8speaks commented Oct 20, 2020 •

edited

Loading

IAlibay commented Oct 20, 2020 •

edited

Loading

codecov bot commented Oct 20, 2020 •

edited

Loading