Skip to content

Added elements attributes to the PDBParser.py#2624

Closed
AmeyaHarmalkar wants to merge 1 commit intoMDAnalysis:developfrom
AmeyaHarmalkar:develop
Closed

Added elements attributes to the PDBParser.py#2624
AmeyaHarmalkar wants to merge 1 commit intoMDAnalysis:developfrom
AmeyaHarmalkar:develop

Conversation

@AmeyaHarmalkar
Copy link
Contributor

Fixes #

Changes made in this Pull Request:
-Added the elements attributes to the PDBParser.py
-Non-physical atoms types will be denoted by an X
-Can make it more specific to discriminate between CG and non-physical by adding a check_CG_atom function in guesser.py

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@AmeyaHarmalkar
Copy link
Contributor Author

@richardjgowers I added the elements attribute to the PDBParser.py. I am wondering if you could review it for me?

Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start @AmeyaHarmalkar, thanks! Just a note -- the tests check for expected and guessed attributes, so you'll need to change those for this and any other classes you add attributes to.

class PDBBase(ParserBase):
    expected_attrs = ['ids', 'names', 'record_types', 'resids',
                      'resnames', 'altLocs', 'icodes', 'occupancies',
                      'bonds', 'tempfactors', 'chainIDs']
    guessed_attrs = ['types', 'masses']

You should also add a test that the resulting elements are correct.

resnames.append(line[17:21].strip())
chainids.append(line[21:22].strip())
#Saving the elements type in a list
elements.append(line[76:78].strip())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This column isn't guaranteed in PDB files -- MDAnalysis itself doesn't write it out (fixed in #2609 because it was breaking DSSP). You need to check the line length to avoid IndexErrors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python lets you slice a string beyond it's end ie 'cat'[100:110] == '', so these will just be blank strings if the column isn't there

Copy link
Contributor Author

@AmeyaHarmalkar AmeyaHarmalkar Mar 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I missed that. So, an alternative can be:
if not any(elements):
do (element parsing)
else if not any(atomnames):
# Guess elements types from Atom names
do (same task of parsing to get elements)
else:
return False for element type

Does this sound reasonable?

# Similar to the check for atomtypes function
if not any(elements):
# Can further add a check_cg_atom(elements) function in guessers.py to check for CG atom.
elements = guess_atom_element(elements)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess_atom_element guesses from names -- this will return a list of empty strings. It also checks one value at a time. You could use guess_types here instead, which just passes it down to guess_atom_element anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding to @lilyminium's comment. As per #2222, it would be good to throw a user warning when guessing here. That way users are aware of what MDA is doing being the scenes.

Copy link
Contributor Author

@AmeyaHarmalkar AmeyaHarmalkar Mar 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, gotcha. I will pass it via guess_types.
Also, I added a warning in the following if conditional. Should I shift that a loop above?

if not any(elements):
# Can further add a check_cg_atom(elements) function in guessers.py to check for CG atom.
elements = guess_atom_element(elements)
if elements == '':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't elements a list?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think you'll want something like any(not e for e in elements) to detect missing elements

if elements == '':
warnings.warn("Element record found to be non-physical.")
#Nomenclature that X will denote any non-physical element.
elements = 'X'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From #2553 I think the consensus is that element should be a False-y value if it's non-physical, not a special name.

Copy link
Contributor Author

@AmeyaHarmalkar AmeyaHarmalkar Mar 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of it as 3 conditionals. 1. In case it is an existing and valid element. 2. The element exists, but is not valid. .i.e. non-physical maybe because of the limitations of the tables or being a CG type? 3. It does not exist.
Should I just have 2 instead?

attrs.append(Elements(elements, guessed=True))
else:
attrs.append(Elements(guessers.guess_types(elements),
guessed=True))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you guessing the elements again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it as a double check. I was think of providing a None/False in the else loop

@lilyminium
Copy link
Member

lilyminium commented Mar 14, 2020

@AmeyaHarmalkar I just noticed you're making changes on your develop branch. This means that any further branches you checkout will contain these commits. Two ways to go about this:

Create a new development branch and check out all new branches from there

  1. Make a new branch from develop:
git checkout -b real-develop
  1. Roll back changes:
git reset --hard e416aa7
  1. Check out all new branches from there.

or

Create a new PR and delete this one.

  1. Make a new branch from develop:
git checkout -b my-new-branch
git push origin my-new-branch
  1. Roll back changes on develop:
git checkout develop
git reset --hard e416aa7
  1. Open a new PR from my-new-branch and delete this one. Link to this PR number and note that it replaces this one.

The user guide describes the typical workflow for contributing to MDAnalysis.

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good start. In addition to @lilyminium 's comments, you'll need to write some tests that capture what this change does. You'll want to test that:

  • a pdb file with elements correctly provides them
  • a pdb file with bizarre elements correctly fails & warns

@richardjgowers richardjgowers self-assigned this Mar 14, 2020
@AmeyaHarmalkar
Copy link
Contributor Author

Thanks @lilyminium and @richardjgowers for the review. I will implement your suggestions right away and create a different branch as well.

@AmeyaHarmalkar
Copy link
Contributor Author

@lilyminium I am closing this Pull request and submitting a different one like you suggested. Sorry for the inconvenience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants