Skip to content

RDKitReader and RDKitParser#2707

Merged
richardjgowers merged 43 commits intoMDAnalysis:developfrom
cbouy:develop
Jun 19, 2020
Merged

RDKitReader and RDKitParser#2707
richardjgowers merged 43 commits intoMDAnalysis:developfrom
cbouy:develop

Conversation

@cbouy
Copy link
Member

@cbouy cbouy commented Jun 1, 2020

Part of the fixes for #2468

Changes made in this Pull Request:

  • added the RDKitParser which creates a core.topology.Topology object from an rdkit.Chem.rdchem.Mol object.
  • added the RDKitReader (based on the MemoryReader) to read coordinates from RDKit conformers
  • added the Aromaticities topology attributes, and the aromatic selection token (for now, only usable when the Universe was created from an RDKit molecule)
  • added the from_smiles classmethod to the Universe (can add hydrogens and generate multiple conformers)

This is was a minimal version complemented iteratively by adding new attributes, tests and the doc. Code heavily inspired from the ParmEdParser.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@codecov
Copy link

codecov bot commented Jun 1, 2020

Codecov Report

Merging #2707 into develop will increase coverage by 0.03%.
The diff coverage is 96.02%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2707      +/-   ##
===========================================
+ Coverage    91.08%   91.11%   +0.03%     
===========================================
  Files          177      179       +2     
  Lines        23655    23833     +178     
  Branches      3122     3143      +21     
===========================================
+ Hits         21545    21716     +171     
- Misses        1490     1496       +6     
- Partials       620      621       +1     
Impacted Files Coverage Δ
package/MDAnalysis/core/topologyattrs.py 95.23% <87.50%> (-0.07%) ⬇️
package/MDAnalysis/core/universe.py 95.60% <90.00%> (-0.26%) ⬇️
package/MDAnalysis/topology/RDKitParser.py 96.52% <96.52%> (ø)
package/MDAnalysis/coordinates/RDKit.py 100.00% <100.00%> (ø)
package/MDAnalysis/coordinates/__init__.py 100.00% <100.00%> (ø)
package/MDAnalysis/core/selection.py 99.48% <100.00%> (+<0.01%) ⬆️
package/MDAnalysis/topology/__init__.py 100.00% <100.00%> (ø)
topology/__init__.py 100.00% <0.00%> (ø)
coordinates/__init__.py 100.00% <0.00%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 788f294...13168ba. Read the comment docs.

@cbouy cbouy marked this pull request as ready for review June 2, 2020 18:18
@tylerjereddy
Copy link
Member

Unrelated, but I wonder why Azure pipelines is not showing up in CI.

@tylerjereddy
Copy link
Member

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbouy this is looking like a good start!

@cbouy
Copy link
Member Author

cbouy commented Jun 3, 2020

Can I add an SDF file in the test data to check that it's read correctly as well ?

@IAlibay
Copy link
Member

IAlibay commented Jun 3, 2020

Can I add an SDF file in the test data to check that it's read correctly as well ?

Please do! The more tests the better :)

@richardjgowers
Copy link
Member

richardjgowers commented Jun 3, 2020 via email

@cbouy
Copy link
Member Author

cbouy commented Jun 3, 2020

Looks like there are a few more formats that RDKit supports but not MDA:

  • SDF
  • MOL
  • MAE (from Maestro)
  • TPL (from Catalyst)
  • HELM (from Pistoia Alliance)
  • Fasta
  • sequence of amino-acids

But I guess just SDF is enough. I'll try to find a file with charges, stereochemistry, custom data and whatnot...

@IAlibay
Copy link
Member

IAlibay commented Jun 3, 2020

Looks like there are a few more formats that RDKit supports but not MDA:

  • SDF
  • MOL
  • MAE (from Maestro)
  • TPL (from Catalyst)
  • HELM (from Pistoia Alliance)
  • Fasta
  • sequence of amino-acids

But I guess just SDF is enough. I'll try to find a file with charges, stereochemistry, custom data and whatnot...

Note here, to save on space, we could probably store things in gzip format and feed rdkit with file-like objects.

@cbouy
Copy link
Member Author

cbouy commented Jun 4, 2020

Is the Charges attribute supposed to be used for formal or partial charges ?

@IAlibay
Copy link
Member

IAlibay commented Jun 4, 2020

Is the Charges attribute supposed to be used for formal or partial charges ?

I believe it's the partial charges of each atom in electron charge units.

Comment on lines +36 to +49
def mol2_mol():
return Chem.MolFromMol2File(mol2_molecule, removeHs=False)

def smiles_mol():
mol = Chem.MolFromSmiles("CCO")
mol = Chem.AddHs(mol)
cids = AllChem.EmbedMultipleConfs(mol, numConfs=3)
return mol

class TestRDKitReader(object):
@pytest.mark.parametrize("rdmol, n_frames", [
(mol2_mol(), 1),
(smiles_mol(), 3),
])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is a very pythonic way of testing things, but since parametrize won't take fixtures as args I decided to do it this way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah pytest can't parametrize over fixtures, it's annoying. Not sure there's much better you can do here.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple more things, mainly docstring (Travis is failing somewhere). Have you had a chance to build the docs and have a look at them? If you haven't, I'd recommend doing so, there's always a tiny thing that just doesn't build you expect it to.

Overall looks good to me, should be about ready to merge.

coordinates = np.array([
conf.GetPositions() for conf in filename.GetConformers()],
dtype=np.float32)
if coordinates.size == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a feeling for what the standard behaviour across the MDA readers is? I believe the netcdf reader will just throw an error if it can't find a reasonable dimension (my thought here is that we would want to have a similar behaviour to how other things work).

@cbouy cbouy changed the title Basic topology from RDKitParser RDKitReader and RDKitParser Jun 18, 2020
@cbouy
Copy link
Member Author

cbouy commented Jun 18, 2020

Have you had a chance to build the docs and have a look at them? If you haven't, I'd recommend doing so, there's always a tiny thing that just doesn't build you expect it to.

I have to admit I skipped building the doc locally until now. I think I installed everything now but when I run make html in /package/doc/sphinx I get the following:

Warning, treated as error:
/mnt/e/Projets/mdanalysis/package/MDAnalysis/analysis/encore/similarity.py:docstring of MDAnalysis.analysis.encore.similarity.ces:1:duplicate object description of MDAnalysis.analysis.encore.similarity.ces, other instance in documentation_pages/analysis/encore/similarity, use :noindex: for one of them
Makefile:42: recipe for target 'html' failed
make: *** [html] Error 2

Also, is make html the right command to use ?

@IAlibay
Copy link
Member

IAlibay commented Jun 18, 2020

Have you had a chance to build the docs and have a look at them? If you haven't, I'd recommend doing so, there's always a tiny thing that just doesn't build you expect it to.

I have to admit I skipped building the doc locally until now. I think I installed everything now but when I run make html in /package/doc/sphinx I get the following:

Warning, treated as error:
/mnt/e/Projets/mdanalysis/package/MDAnalysis/analysis/encore/similarity.py:docstring of MDAnalysis.analysis.encore.similarity.ces:1:duplicate object description of MDAnalysis.analysis.encore.similarity.ces, other instance in documentation_pages/analysis/encore/similarity, use :noindex: for one of them
Makefile:42: recipe for target 'html' failed
make: *** [html] Error 2

Also, is make html the right command to use ?

I'm not sure where you got to in the process, @lilyminium made a very good guide on building the documentation: https://userguide.mdanalysis.org/contributing_code.html#working-with-the-code-documentation

You should just need to do python setup.py build_sphinx -E in package.

@cbouy
Copy link
Member Author

cbouy commented Jun 18, 2020

You should just need to do python setup.py build_sphinx -E in package

Got it, thanks! Still the same error though

@orbeckst
Copy link
Member

orbeckst commented Jun 18, 2020

Sphinx must be < 3. See #2667 .

For

 python setup.py build_sphinx

You can also temporarily disable the warnings = error by changing

warning-is-error = 1

This will always build all docs and for historical reasons they will be built in package/doc/html/html/index.html.

If you want faster turn-around while working on docs then use make html; edit the Makefile https://github.com/MDAnalysis/mdanalysis/blob/develop/package/doc/sphinx/Makefile#L6
and temporarily remove -W (warn -> errors) then run

cd package/doc/sphinx
make html

These docs will show up at package/doc/html/index.html (sigh... it's ugly).

The Makefile does not have -E set so it will just rebuild what you changed. This is nicer for debugging reST.

@IAlibay
Copy link
Member

IAlibay commented Jun 18, 2020

Edit: oops, I didn't see @orbeckst's comment.

You should just need to do python setup.py build_sphinx -E in package

Got it, thanks! Still the same error though

Interesting, what version of sphinx are you running? I know that 2+ have issues with our docs (I think there's an issue open about it). With 1.8.5, I get the following:

Warning, treated as error:
/home/bioc1523/github/mdanalysis/package/MDAnalysis/core/universe.py:docstring of MDAnalysis.core.universe.Universe.from_smiles:34:Unexpected indentation.

Have you had any luck with compiling the docs from the standard develop branch?

(I have just noticed that your PR is coming from your fork's develop branch rather than a separate branch, you probably want to avoid doing that in the future).

@cbouy
Copy link
Member Author

cbouy commented Jun 18, 2020

Downgrading from 3.0.4 to 1.8.5 fixed the issues I mentioned earlier, thanks! And thanks for the tips @orbeckst !

@IAlibay I'll create a new branch next time my bad

@cbouy
Copy link
Member Author

cbouy commented Jun 19, 2020

Is this still for 1.0.0 ? I see that it's been timestamped to last week on the changelog. Or should I mark it for 1.0.1 in the docs and the changelog ?

@richardjgowers
Copy link
Member

richardjgowers commented Jun 19, 2020 via email

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I have anything else to say on this one, lgtm :)
I'll let @fiona-naughton and @richardjgowers approve it before we merge.

pass

def apply(self, group):
return group[group.aromaticities].unique
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this so that it will fail if there aren't aromaticities available, rather than silently putting everything as false

@IAlibay IAlibay requested a review from fiona-naughton June 19, 2020 14:45
n_residues = 1

# Segment
if any(segids) and not any(val is None for val in segids):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbouy Was just checking codecov to double check that there were enough tests and it looks like this branch of the code isn't tested at all (might be looking at an old commit). Could you double check that this is the case & add a test if so? https://codecov.io/gh/MDAnalysis/mdanalysis/src/dc482401df0d5c5839b2b37343b49a778987c2b1/package/MDAnalysis/topology/RDKitParser.py

@richardjgowers richardjgowers merged commit 50cd6e7 into MDAnalysis:develop Jun 19, 2020
@richardjgowers
Copy link
Member

I'll merge this so it's done, we can look at coverage + corner cases in future issues.

@cbouy first step done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants