RDKitReader and RDKitParser by cbouy · Pull Request #2707 · MDAnalysis/mdanalysis

cbouy · 2020-06-01T18:05:31Z

Part of the fixes for #2468

Changes made in this Pull Request:

added the RDKitParser which creates a core.topology.Topology object from an rdkit.Chem.rdchem.Mol object.
added the RDKitReader (based on the MemoryReader) to read coordinates from RDKit conformers
added the Aromaticities topology attributes, and the aromatic selection token (for now, only usable when the Universe was created from an RDKit molecule)
added the from_smiles classmethod to the Universe (can add hydrogens and generate multiple conformers)

This is was a minimal version complemented iteratively by adding new attributes, tests and the doc. Code heavily inspired from the ParmEdParser.

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

codecov · 2020-06-01T19:47:14Z

Codecov Report

Merging #2707 into develop will increase coverage by 0.03%.
The diff coverage is 96.02%.

@@             Coverage Diff             @@
##           develop    #2707      +/-   ##
===========================================
+ Coverage    91.08%   91.11%   +0.03%     
===========================================
  Files          177      179       +2     
  Lines        23655    23833     +178     
  Branches      3122     3143      +21     
===========================================
+ Hits         21545    21716     +171     
- Misses        1490     1496       +6     
- Partials       620      621       +1

Impacted Files	Coverage Δ
package/MDAnalysis/core/topologyattrs.py	`95.23% <87.50%> (-0.07%)`	⬇️
package/MDAnalysis/core/universe.py	`95.60% <90.00%> (-0.26%)`	⬇️
package/MDAnalysis/topology/RDKitParser.py	`96.52% <96.52%> (ø)`
package/MDAnalysis/coordinates/RDKit.py	`100.00% <100.00%> (ø)`
package/MDAnalysis/coordinates/__init__.py	`100.00% <100.00%> (ø)`
package/MDAnalysis/core/selection.py	`99.48% <100.00%> (+<0.01%)`	⬆️
package/MDAnalysis/topology/__init__.py	`100.00% <100.00%> (ø)`
topology/__init__.py	`100.00% <0.00%> (ø)`
coordinates/__init__.py	`100.00% <0.00%> (ø)`
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 788f294...13168ba. Read the comment docs.

tylerjereddy · 2020-06-03T02:32:53Z

Unrelated, but I wonder why Azure pipelines is not showing up in CI.

tylerjereddy · 2020-06-03T02:33:28Z

/azp run

azure-pipelines · 2020-06-03T02:33:33Z

No pipelines are associated with this pull request.

testsuite/MDAnalysisTests/topology/test_rdkit.py

richardjgowers

@cbouy this is looking like a good start!

cbouy · 2020-06-03T17:38:19Z

Can I add an SDF file in the test data to check that it's read correctly as well ?

IAlibay · 2020-06-03T17:41:24Z

Can I add an SDF file in the test data to check that it's read correctly as well ?

Please do! The more tests the better :)

richardjgowers · 2020-06-03T17:42:14Z

Yup a small one but with corner cases preferably

…

On Wed, Jun 3, 2020 at 18:38, Cédric Bouysset ***@***.***> wrote: Can I add an SDF file in the test data to check that it's read correctly as well ? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2707 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACGSGBYYX3SUGEHCBT54BDLRU2DBTANCNFSM4NQAXTLA> .

testsuite/MDAnalysisTests/topology/test_rdkit.py

cbouy · 2020-06-03T17:57:10Z

Looks like there are a few more formats that RDKit supports but not MDA:

SDF
MOL
MAE (from Maestro)
TPL (from Catalyst)
HELM (from Pistoia Alliance)
Fasta
sequence of amino-acids

But I guess just SDF is enough. I'll try to find a file with charges, stereochemistry, custom data and whatnot...

IAlibay · 2020-06-03T18:12:22Z

Looks like there are a few more formats that RDKit supports but not MDA:

SDF

MOL

MAE (from Maestro)

TPL (from Catalyst)

HELM (from Pistoia Alliance)

Fasta

sequence of amino-acids

But I guess just SDF is enough. I'll try to find a file with charges, stereochemistry, custom data and whatnot...

Note here, to save on space, we could probably store things in gzip format and feed rdkit with file-like objects.

cbouy · 2020-06-04T17:13:01Z

Is the Charges attribute supposed to be used for formal or partial charges ?

IAlibay · 2020-06-04T17:15:36Z

Is the Charges attribute supposed to be used for formal or partial charges ?

I believe it's the partial charges of each atom in electron charge units.

cbouy · 2020-06-16T09:26:14Z

testsuite/MDAnalysisTests/coordinates/test_rdkit.py

+def mol2_mol():
+    return Chem.MolFromMol2File(mol2_molecule, removeHs=False)
+
+def smiles_mol():
+    mol = Chem.MolFromSmiles("CCO")
+    mol = Chem.AddHs(mol)
+    cids = AllChem.EmbedMultipleConfs(mol, numConfs=3)
+    return mol
+
+class TestRDKitReader(object):
+    @pytest.mark.parametrize("rdmol, n_frames", [
+        (mol2_mol(), 1),
+        (smiles_mol(), 3),
+    ])


Not sure if this is a very pythonic way of testing things, but since parametrize won't take fixtures as args I decided to do it this way

Yeah pytest can't parametrize over fixtures, it's annoying. Not sure there's much better you can do here.

IAlibay

Just a couple more things, mainly docstring (Travis is failing somewhere). Have you had a chance to build the docs and have a look at them? If you haven't, I'd recommend doing so, there's always a tiny thing that just doesn't build you expect it to.

Overall looks good to me, should be about ready to merge.

package/MDAnalysis/core/universe.py

package/MDAnalysis/topology/RDKitParser.py

testsuite/MDAnalysisTests/topology/test_rdkit.py

testsuite/MDAnalysisTests/core/test_universe.py

IAlibay · 2020-06-17T19:51:57Z

package/MDAnalysis/coordinates/RDKit.py

+        coordinates = np.array([
+            conf.GetPositions() for conf in filename.GetConformers()], 
+            dtype=np.float32)
+        if coordinates.size == 0:


Do we have a feeling for what the standard behaviour across the MDA readers is? I believe the netcdf reader will just throw an error if it can't find a reasonable dimension (my thought here is that we would want to have a similar behaviour to how other things work).

cbouy · 2020-06-18T17:03:16Z

Have you had a chance to build the docs and have a look at them? If you haven't, I'd recommend doing so, there's always a tiny thing that just doesn't build you expect it to.

I have to admit I skipped building the doc locally until now. I think I installed everything now but when I run make html in /package/doc/sphinx I get the following:

Warning, treated as error:
/mnt/e/Projets/mdanalysis/package/MDAnalysis/analysis/encore/similarity.py:docstring of MDAnalysis.analysis.encore.similarity.ces:1:duplicate object description of MDAnalysis.analysis.encore.similarity.ces, other instance in documentation_pages/analysis/encore/similarity, use :noindex: for one of them
Makefile:42: recipe for target 'html' failed
make: *** [html] Error 2

Also, is make html the right command to use ?

IAlibay · 2020-06-18T17:06:39Z

Have you had a chance to build the docs and have a look at them? If you haven't, I'd recommend doing so, there's always a tiny thing that just doesn't build you expect it to.

I have to admit I skipped building the doc locally until now. I think I installed everything now but when I run make html in /package/doc/sphinx I get the following:
Warning, treated as error:
/mnt/e/Projets/mdanalysis/package/MDAnalysis/analysis/encore/similarity.py:docstring of MDAnalysis.analysis.encore.similarity.ces:1:duplicate object description of MDAnalysis.analysis.encore.similarity.ces, other instance in documentation_pages/analysis/encore/similarity, use :noindex: for one of them
Makefile:42: recipe for target 'html' failed
make: *** [html] Error 2
Also, is make html the right command to use ?

I'm not sure where you got to in the process, @lilyminium made a very good guide on building the documentation: https://userguide.mdanalysis.org/contributing_code.html#working-with-the-code-documentation

You should just need to do python setup.py build_sphinx -E in package.

cbouy · 2020-06-18T17:20:53Z

You should just need to do python setup.py build_sphinx -E in package

Got it, thanks! Still the same error though

orbeckst · 2020-06-18T17:29:01Z

Sphinx must be < 3. See #2667 .

For

 python setup.py build_sphinx

You can also temporarily disable the warnings = error by changing

mdanalysis/package/setup.cfg

Line 18 in 788f294

warning-is-error = 1

This will always build all docs and for historical reasons they will be built in package/doc/html/html/index.html.

If you want faster turn-around while working on docs then use make html; edit the Makefile https://github.com/MDAnalysis/mdanalysis/blob/develop/package/doc/sphinx/Makefile#L6
and temporarily remove -W (warn -> errors) then run

cd package/doc/sphinx
make html

These docs will show up at package/doc/html/index.html (sigh... it's ugly).

The Makefile does not have -E set so it will just rebuild what you changed. This is nicer for debugging reST.

IAlibay · 2020-06-18T17:31:06Z

Edit: oops, I didn't see @orbeckst's comment.

You should just need to do python setup.py build_sphinx -E in package

Got it, thanks! Still the same error though

Interesting, what version of sphinx are you running? I know that 2+ have issues with our docs (I think there's an issue open about it). With 1.8.5, I get the following:

Warning, treated as error:
/home/bioc1523/github/mdanalysis/package/MDAnalysis/core/universe.py:docstring of MDAnalysis.core.universe.Universe.from_smiles:34:Unexpected indentation.

Have you had any luck with compiling the docs from the standard develop branch?

(I have just noticed that your PR is coming from your fork's develop branch rather than a separate branch, you probably want to avoid doing that in the future).

cbouy · 2020-06-18T18:22:27Z

Downgrading from 3.0.4 to 1.8.5 fixed the issues I mentioned earlier, thanks! And thanks for the tips @orbeckst !

@IAlibay I'll create a new branch next time my bad

cbouy · 2020-06-19T09:30:50Z

Is this still for 1.0.0 ? I see that it's been timestamped to last week on the changelog. Or should I mark it for 1.0.1 in the docs and the changelog ?

richardjgowers · 2020-06-19T09:37:53Z

This will be 2.0

…

On Fri, Jun 19, 2020 at 10:31, Cédric Bouysset ***@***.***> wrote: Is this still for 1.0.0 ? I see that it's been timestamped to last week on the changelog. Or should I mark it for 1.0.1 in the docs and the changelog ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2707 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACGSGB4OA7IO3L2S4CEX3YLRXMV5TANCNFSM4NQAXTLA> .

IAlibay

I don't think I have anything else to say on this one, lgtm :)
I'll let @fiona-naughton and @richardjgowers approve it before we merge.

richardjgowers · 2020-06-19T14:44:18Z

package/MDAnalysis/core/selection.py

+        pass
+
+    def apply(self, group):
+        return group[group.aromaticities].unique


I've changed this so that it will fail if there aren't aromaticities available, rather than silently putting everything as false

IAlibay · 2020-06-19T14:51:20Z

package/MDAnalysis/topology/RDKitParser.py

+            n_residues = 1
+
+        # Segment
+        if any(segids) and not any(val is None for val in segids):


@cbouy Was just checking codecov to double check that there were enough tests and it looks like this branch of the code isn't tested at all (might be looking at an old commit). Could you double check that this is the case & add a test if so? https://codecov.io/gh/MDAnalysis/mdanalysis/src/dc482401df0d5c5839b2b37343b49a778987c2b1/package/MDAnalysis/topology/RDKitParser.py

richardjgowers · 2020-06-19T15:12:06Z

I'll merge this so it's done, we can look at coverage + corner cases in future issues.

@cbouy first step done!

barebones topology from RDKitParser

ba1f3e4

test for PDB and MOL2 + fix atomtypes and residues

7e172fc

cbouy marked this pull request as ready for review June 2, 2020 18:18

tylerjereddy added the Component-Topology label Jun 3, 2020

richardjgowers reviewed Jun 3, 2020

View reviewed changes

testsuite/MDAnalysisTests/topology/test_rdkit.py Outdated Show resolved Hide resolved

richardjgowers reviewed Jun 3, 2020

View reviewed changes

testsuite/MDAnalysisTests/topology/test_rdkit.py Outdated Show resolved Hide resolved

richardjgowers reviewed Jun 3, 2020

View reviewed changes

Cédric Bouysset added 4 commits June 3, 2020 18:33

fix test on bond orders

c436517

add SMILES parser test

b25898f

reorganize code + create name if not present

60af718

added Segids attribute

94f18b5

IAlibay added GSOC Starter and removed GSOC Starter labels Jun 3, 2020

IAlibay reviewed Jun 3, 2020

View reviewed changes

testsuite/MDAnalysisTests/topology/test_rdkit.py Show resolved Hide resolved

added rdkit to appveyor and travis

fdb6513

Cédric Bouysset added 4 commits June 4, 2020 17:00

added SDF file for rdkit tests

9caaa5c

forgot hydrogens and 3D on SDFile

88b2130

SDF test

ffbf149

added formal charges + attrs specific to PDB files

8736750

Cédric Bouysset added 2 commits June 15, 2020 19:21

fix using fixtures in parametrize

0aff0f6

actual fix for coordinate tests

f0497bf

cbouy commented Jun 16, 2020

View reviewed changes

IAlibay requested changes Jun 17, 2020

View reviewed changes

cbouy changed the title ~~Basic topology from RDKitParser~~ RDKitReader and RDKitParser Jun 18, 2020

fix the docs and style guide

2755853

Cédric Bouysset added 2 commits June 19, 2020 11:58

update docs and changelog for 2.0.0

b611915

Merge branch 'develop' into develop

dc48240

IAlibay approved these changes Jun 19, 2020

View reviewed changes

Update selection.py

76fb209

richardjgowers approved these changes Jun 19, 2020

View reviewed changes

Update selection.py

13168ba

richardjgowers reviewed Jun 19, 2020

View reviewed changes

IAlibay requested a review from fiona-naughton June 19, 2020 14:45

IAlibay reviewed Jun 19, 2020

View reviewed changes

richardjgowers merged commit 50cd6e7 into MDAnalysis:develop Jun 19, 2020

mattwthompson mentioned this pull request Jun 28, 2020

Feature request: RDKit interoperability mosdef-hub/gmso#424

Open

cbouy mentioned this pull request Aug 12, 2020

Simple RDKitConverter #2775

Merged

7 tasks

PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021

RDKitReader and RDKitParser (MDAnalysis#2707)

1c8f9b2

fiona-naughton added new-feature Component-Converters labels Sep 26, 2023

Conversation

cbouy commented Jun 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Uh oh!

codecov bot commented Jun 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tylerjereddy commented Jun 3, 2020

Uh oh!

tylerjereddy commented Jun 3, 2020

Uh oh!

azure-pipelines bot commented Jun 3, 2020

Uh oh!

Uh oh!

Uh oh!

richardjgowers left a comment

Choose a reason for hiding this comment

Uh oh!

cbouy commented Jun 3, 2020

Uh oh!

IAlibay commented Jun 3, 2020

Uh oh!

richardjgowers commented Jun 3, 2020 via email

Uh oh!

Uh oh!

cbouy commented Jun 3, 2020

Uh oh!

IAlibay commented Jun 3, 2020

Uh oh!

cbouy commented Jun 4, 2020

Uh oh!

IAlibay commented Jun 4, 2020

Uh oh!

cbouy Jun 16, 2020

Choose a reason for hiding this comment

Uh oh!

richardjgowers Jun 18, 2020

Choose a reason for hiding this comment

Uh oh!

IAlibay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IAlibay Jun 17, 2020

Choose a reason for hiding this comment

Uh oh!

cbouy commented Jun 18, 2020

Uh oh!

IAlibay commented Jun 18, 2020

Uh oh!

cbouy commented Jun 18, 2020

Uh oh!

orbeckst commented Jun 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IAlibay commented Jun 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cbouy commented Jun 18, 2020

Uh oh!

cbouy commented Jun 19, 2020

Uh oh!

richardjgowers commented Jun 19, 2020 via email

Uh oh!

IAlibay left a comment

Choose a reason for hiding this comment

Uh oh!

richardjgowers Jun 19, 2020

Choose a reason for hiding this comment

Uh oh!

IAlibay Jun 19, 2020

Choose a reason for hiding this comment

Uh oh!

richardjgowers commented Jun 19, 2020

cbouy commented Jun 1, 2020 •

edited

Loading

codecov bot commented Jun 1, 2020 •

edited

Loading

orbeckst commented Jun 18, 2020 •

edited

Loading

IAlibay commented Jun 18, 2020 •

edited

Loading