Added elements attributes to the PDBParser.py#2624
Added elements attributes to the PDBParser.py#2624AmeyaHarmalkar wants to merge 1 commit intoMDAnalysis:developfrom
Conversation
|
@richardjgowers I added the elements attribute to the PDBParser.py. I am wondering if you could review it for me? |
lilyminium
left a comment
There was a problem hiding this comment.
This is a good start @AmeyaHarmalkar, thanks! Just a note -- the tests check for expected and guessed attributes, so you'll need to change those for this and any other classes you add attributes to.
class PDBBase(ParserBase):
expected_attrs = ['ids', 'names', 'record_types', 'resids',
'resnames', 'altLocs', 'icodes', 'occupancies',
'bonds', 'tempfactors', 'chainIDs']
guessed_attrs = ['types', 'masses']
You should also add a test that the resulting elements are correct.
| resnames.append(line[17:21].strip()) | ||
| chainids.append(line[21:22].strip()) | ||
| #Saving the elements type in a list | ||
| elements.append(line[76:78].strip()) |
There was a problem hiding this comment.
This column isn't guaranteed in PDB files -- MDAnalysis itself doesn't write it out (fixed in #2609 because it was breaking DSSP). You need to check the line length to avoid IndexErrors.
There was a problem hiding this comment.
Python lets you slice a string beyond it's end ie 'cat'[100:110] == '', so these will just be blank strings if the column isn't there
There was a problem hiding this comment.
Yes, I missed that. So, an alternative can be:
if not any(elements):
do (element parsing)
else if not any(atomnames):
# Guess elements types from Atom names
do (same task of parsing to get elements)
else:
return False for element type
Does this sound reasonable?
| # Similar to the check for atomtypes function | ||
| if not any(elements): | ||
| # Can further add a check_cg_atom(elements) function in guessers.py to check for CG atom. | ||
| elements = guess_atom_element(elements) |
There was a problem hiding this comment.
guess_atom_element guesses from names -- this will return a list of empty strings. It also checks one value at a time. You could use guess_types here instead, which just passes it down to guess_atom_element anyway.
There was a problem hiding this comment.
Just adding to @lilyminium's comment. As per #2222, it would be good to throw a user warning when guessing here. That way users are aware of what MDA is doing being the scenes.
There was a problem hiding this comment.
Oh, gotcha. I will pass it via guess_types.
Also, I added a warning in the following if conditional. Should I shift that a loop above?
| if not any(elements): | ||
| # Can further add a check_cg_atom(elements) function in guessers.py to check for CG atom. | ||
| elements = guess_atom_element(elements) | ||
| if elements == '': |
There was a problem hiding this comment.
Yeah I think you'll want something like any(not e for e in elements) to detect missing elements
| if elements == '': | ||
| warnings.warn("Element record found to be non-physical.") | ||
| #Nomenclature that X will denote any non-physical element. | ||
| elements = 'X' |
There was a problem hiding this comment.
From #2553 I think the consensus is that element should be a False-y value if it's non-physical, not a special name.
There was a problem hiding this comment.
I thought of it as 3 conditionals. 1. In case it is an existing and valid element. 2. The element exists, but is not valid. .i.e. non-physical maybe because of the limitations of the tables or being a CG type? 3. It does not exist.
Should I just have 2 instead?
| attrs.append(Elements(elements, guessed=True)) | ||
| else: | ||
| attrs.append(Elements(guessers.guess_types(elements), | ||
| guessed=True)) |
There was a problem hiding this comment.
Why are you guessing the elements again?
There was a problem hiding this comment.
I did it as a double check. I was think of providing a None/False in the else loop
|
@AmeyaHarmalkar I just noticed you're making changes on your Create a new development branch and check out all new branches from there
or Create a new PR and delete this one.
The user guide describes the typical workflow for contributing to MDAnalysis. |
richardjgowers
left a comment
There was a problem hiding this comment.
This looks like a good start. In addition to @lilyminium 's comments, you'll need to write some tests that capture what this change does. You'll want to test that:
- a pdb file with elements correctly provides them
- a pdb file with bizarre elements correctly fails & warns
|
Thanks @lilyminium and @richardjgowers for the review. I will implement your suggestions right away and create a different branch as well. |
|
@lilyminium I am closing this Pull request and submitting a different one like you suggested. Sorry for the inconvenience! |
Fixes #
Changes made in this Pull Request:
-Added the elements attributes to the PDBParser.py
-Non-physical atoms types will be denoted by an X
-Can make it more specific to discriminate between CG and non-physical by adding a check_CG_atom function in guesser.py
PR Checklist