[WIP] PDB parser and writer use the "elements" topology attribute by jbarnoud · Pull Request #2442 · MDAnalysis/mdanalysis

jbarnoud · 2020-01-09T18:36:33Z

Changes made in this Pull Request:

The PDB parser populates the elements topology attribute if the appropriate column is filled in the input. The content of elements is the same as atomtypes.
The PDB reader uses the elements topology attribute if available.

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

jbarnoud · 2020-01-09T18:51:02Z

The changes cause a difference in the behaviour of the writer which I do not know how to handle. The default for a missing element symbol is an empty string. When the element symbols are available to the writer, then the writer omits the missing one, leaving the field empty. This is different from the current behaviour that guesses all the element symbols.

This difference is made visible by a test where the initial PDB input fills some, but not all, the element symbols, and the output PDB is expected to have the missing ones.

My feeling is that guessing the missing symbols should be an active decision of the user, and I would be tempted to adapt the failing test. Though, I would like to hear what others would think about this.

if a PDB file specifies some, but not all, element symbols, then the missing symbols default to an empty string. If you then write a new PDB file

richardjgowers · 2020-01-16T10:27:03Z

I think it makes most sense to either have the column or not. Can we guess elements ahead of time and only write if they are all defined?

RMeli · 2020-06-10T21:22:17Z

My feeling is that guessing the missing symbols should be an active decision of the user, and I would be tempted to adapt the failing test.

@jbarnoud I think this sounds very reasonable, especially because the guessers can have problems with some atom names found in the wild (see CAL->Ca and NAN->Na in #2732).

If the elements attribute is present chances are that it comes from a file (especially after #2627), so I think that if such record is incomplete the writer should leave the field empty (as it is in your current implementation) instead of guessing. IMO, it is nice to have a file unchanged by a read immediately followed by a write. Guessing atom elements from the atom name should be decided/done by the user.

haroldgrosjean · 2020-06-16T07:44:20Z

Hello,

I am writing this message because I am experiencing similar issues. I am using MDAnalysis to separate protein-ligand pairs to, later, parametrize for MD where correct atom typing and valencies are essential. It essentially transforms some Bromides (Br) into boron (B)

my code goes as follow:

import MDAnalysis
print(MDAnalysis.__version__)

0.20.1

An excerpt from this input file is shown below.

##This is ./pdb.pdb
#more protein atoms above
ANISOU 1016  NH2 ARG A1434     2607   3648   2290   -197    323   -307       N  
TER    1017      ARG A1434                                                      
HETATM 1018  C4  HH8 A1501     -20.776  12.574  25.611  0.59 11.56           C  
ANISOU 1018  C4  HH8 A1501     1524   1603   1266    -34    163      3       C  
HETATM 1019  C5  HH8 A1501     -18.698  13.529  26.059  0.59 13.32           C  
ANISOU 1019  C5  HH8 A1501     1742   1861   1459    -49    143    -23       C  
HETATM 1020  C6  HH8 A1501     -19.171  14.104  27.289  0.59 13.28           C  
ANISOU 1020  C6  HH8 A1501     1752   1860   1436    -66    163    -39       C  
HETATM 1021  C7  HH8 A1501     -20.476  13.883  27.651  0.59 12.78           C  
ANISOU 1021  C7  HH8 A1501     1698   1776   1381    -62    184    -31       C  
HETATM 1022  C8  HH8 A1501     -21.313  13.106  26.800  0.59 11.94           C  
ANISOU 1022  C8  HH8 A1501     1585   1651   1299    -46    184     -9       C  
HETATM 1023  N1  HH8 A1501     -21.537  11.824  24.697  0.59 10.48           N  
ANISOU 1023  N1  HH8 A1501     1379   1453   1151    -26    163     18       N  
HETATM 1024  N2  HH8 A1501     -19.448  12.773  25.252  0.59 12.38           N  
ANISOU 1024  N2  HH8 A1501     1620   1722   1363    -34    142     -3       N  
HETATM 1025  C3  HH8 A1501     -22.795  11.625  24.992  0.59 11.02           C  
ANISOU 1025  C3  HH8 A1501     1449   1514   1225    -30    183     21       C  
HETATM 1026 BR1  HH8 A1501     -23.498  13.338  28.667  0.59 17.51           BR  
ANISOU 1026 BR1  HH8 A1501     2312   2347   1994    -60    237    -17       BR  
HETATM 1027  C1  HH8 A1501     -22.672  12.805  27.033  0.59 13.02           C  
ANISOU 1027  C1  HH8 A1501     1725   1776   1447    -45    205     -2       C  
HETATM 1028  C2  HH8 A1501     -23.434  12.102  26.169  0.59 11.96           C  
ANISOU 1028  C2  HH8 A1501     1579   1633   1333    -38    205     13       C  
HETATM 1040  O   HOH A1601      -6.356  19.116  22.312  1.00 38.64           O 
#more water atoms bellow

u = mda.Universe('./pdb.pdb')    
protein = u.select_atoms('protein or resname HOH')
LIG = u.select_atoms('all') - protein
LIG.write("./LIG.pdb")

The code returns for ./LIG.pdb

ATOM      1  C4  HH8 A1501     -20.776  12.574  25.611  0.59 11.56      A    C
ATOM      2  C5  HH8 A1501     -18.698  13.529  26.059  0.59 13.32      A    C
ATOM      3  C6  HH8 A1501     -19.171  14.104  27.289  0.59 13.28      A    C
ATOM      4  C7  HH8 A1501     -20.476  13.883  27.651  0.59 12.78      A    C
ATOM      5  C8  HH8 A1501     -21.313  13.106  26.800  0.59 11.94      A    C
ATOM      6  N1  HH8 A1501     -21.537  11.824  24.697  0.59 10.48      A    N
ATOM      7  N2  HH8 A1501     -19.448  12.773  25.252  0.59 12.38      A    N
ATOM      8  C3  HH8 A1501     -22.795  11.625  24.992  0.59 11.02      A    C
ATOM      9 BR1  HH8 A1501     -23.498  13.338  28.667  0.59 17.51      A    B
ATOM     10  C1  HH8 A1501     -22.672  12.805  27.033  0.59 13.02      A    C
ATOM     11  C2  HH8 A1501     -23.434  12.102  26.169  0.59 11.96      A    C
CONECT    1    5    6    7
CONECT    2    3    7
CONECT    3    2    4
CONECT    4    3    5
CONECT    5    1    4   10
CONECT    6    1    8
CONECT    7    1    2
CONECT    8    6   11
CONECT    9   10
CONECT   10    5    9   11
CONECT   11    8   10
END
``

IAlibay · 2020-10-19T21:09:04Z

@jbarnoud sorry to be digging up an old PR. It'd be really great to get this fixed (it's an ongoing issue for a few things I deal with downstream). Is this something you're still willing to work on?

Otherwise, given the large changes to the PDB parser since this PR started, I'd be willing to create a superseding PR if that was ok with you?

jbarnoud · 2020-10-20T08:01:58Z

@IAlibay This is still on my radar and bumped up in my TODO recently, but I cannot guarantee a timeframe. Please go ahead if you think you can send some time on it soon. Ping me when you need your code reviewed.

IAlibay · 2020-11-19T00:08:19Z

Closing this now that superseding PR #3001 has been merged.

jbarnoud added 4 commits January 6, 2020 18:54

Add tests for elements attribute from PDB

6fa9bbc

See #2422

Populate the 'elements' topology attribute when reading PDB

daf6af2

Add tests for #2423

2cc3a7e

Use elements topology attr in PDB writer when available

60088bd

jbarnoud added Format-PDB Component-Writers Component-Topology labels Jan 9, 2020

Small simplification

c8717c5

RMeli mentioned this pull request Jan 17, 2020

XYZ parser guesses the elements #2420

Closed

orbeckst added the Work in progress label Jan 22, 2020

jbarnoud mentioned this pull request Mar 20, 2020

Providing Elements attribute to parsers (#2553) #2627

Merged

4 tasks

RMeli mentioned this pull request Jun 10, 2020

PDBWriter writes some C as Ca and some N as Na #2732

Closed

IAlibay mentioned this pull request Oct 20, 2020

Adds PDB element writing and fixes reading #3001

Merged

4 tasks

IAlibay closed this Nov 19, 2020

RMeli deleted the pdb-elements branch December 23, 2023 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] PDB parser and writer use the "elements" topology attribute#2442

[WIP] PDB parser and writer use the "elements" topology attribute#2442
jbarnoud wants to merge 5 commits intodevelopfrom
pdb-elements

jbarnoud commented Jan 9, 2020

Uh oh!

jbarnoud commented Jan 9, 2020

Uh oh!

richardjgowers commented Jan 16, 2020

Uh oh!

RMeli commented Jun 10, 2020

Uh oh!

haroldgrosjean commented Jun 16, 2020

Uh oh!

IAlibay commented Oct 19, 2020

Uh oh!

jbarnoud commented Oct 20, 2020

Uh oh!

IAlibay commented Nov 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

jbarnoud commented Jan 9, 2020

PR Checklist

Uh oh!

jbarnoud commented Jan 9, 2020

Uh oh!

richardjgowers commented Jan 16, 2020

Uh oh!

RMeli commented Jun 10, 2020

Uh oh!

haroldgrosjean commented Jun 16, 2020

Uh oh!

IAlibay commented Oct 19, 2020

Uh oh!

jbarnoud commented Oct 20, 2020

Uh oh!

IAlibay commented Nov 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants