Improving the RDKitConverter caching system#2942
Improving the RDKitConverter caching system#2942IAlibay merged 12 commits intoMDAnalysis:developfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #2942 +/- ##
========================================
Coverage 93.55% 93.56%
========================================
Files 176 176
Lines 22837 22837
Branches 3194 3195 +1
========================================
+ Hits 21366 21368 +2
+ Misses 1421 1418 -3
- Partials 50 51 +1
Continue to review full report at Codecov.
|
|
ping @IAlibay @richardjgowers |
orbeckst
left a comment
There was a problem hiding this comment.
I like the use of the lru cache from the stdlib. Peripheral comments inline.
| conversions in memory. Using ``maxsize=None`` will remove all limits | ||
| to the cache size, i.e. everything is cached. | ||
| """ | ||
| global atomgroup_to_mol |
There was a problem hiding this comment.
This is probably not thread-safe – not a big deal, though, and I don't have a better idea.
(Although, we don't really encourage use of threads for parallelization; multiprocessing should do just fine.)
IAlibay
left a comment
There was a problem hiding this comment.
Overall lgtm! Just a few comments, mainly to do with tests & docs.
|
I updated my first post with the new changes. |
|
All tests are passing 💃 anything else ? |
Apologies for taking so long here, I'll re-review over the weekend but I think we should be good. |
|
@cbouy if you want to update this against the current develop, it'll finally be on my list for the next thing I review. |
Co-authored-by: Oliver Beckstein <orbeckst@gmail.com>
|
Okay I think I finally managed to run a proper |
| def test_single_atom_mol(self, smi): | ||
| u = mda.Universe.from_smiles(smi, addHs=False, | ||
| generate_coordinates=False) | ||
| mol = u.atoms.convert_to("RDKIT") |
There was a problem hiding this comment.
Sorry, I think I'm just being silly and forgetting a very obvious thing. Could you remind me why these are all being switched away from convert_to?
There was a problem hiding this comment.
convert_to doesn't pass arguments to the underlying converter, it was in a PR at some point though (#2882 )
There was a problem hiding this comment.
Doesn't this behaviour contradict the docstring? I.e. ":func:set_converter_cache_size. However, ag.convert_to("RDKIT")
followed by ag.convert_to("RDKIT", NoImplicit=False) will not use the"
Or was the argument that we would merge #2882 before this PR?
There was a problem hiding this comment.
The point is the converter modules weren't really documented to be instantiated like c = mda.coordinates.RDKit.RDKitConverter(); c.convert(...) but usually go through the convert_to AtomGroup method.
So yeah I assumed 2882 would be merged before v2.0 comes out
There was a problem hiding this comment.
Switched back to using convert_to now that it's merged!
|
@IAlibay sorry, don't have time today to review — I'll leave it to you. |
|
I think this PR is complete, but I want to hold off on merging before we have a clearer idea of what's going on with #2882. |
|
I'm adding the missing changelog now. For the changelog of this PR though, do I mention the fixes/changes I made, or just the enhancements ( |
|
If there was an issue for the fixes then I’d still add a CHANGELOG entry even though it’s a bit weird. However,for people living in the edge (using develop) it’s still helpful.
… Am 4/24/21 um 09:30 schrieb Cédric Bouysset ***@***.***>:
I'm adding the missing changelog now. For the changelog of this PR though, do I mention the fixes/changes I made, or just the enhancements (set_converter_cache_size(maxsize) and the force parameter) ? The RDKit converter isn't released yet so it's a bit weird fixing something that isn't officially out...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Sorry for the delayed response here @cbouy. I'd add the following entries: enhancements:
changes:
Fixes:
edit: once #2882 is done if you can add these and then update against develop I'll merge this. |
|
RDKIT crashes are starting to happen too frequently for py3.6 + numpy 1.16 (see: #3287), I'm not sure if this is somehow linked to the new converter API, so I've updated this PR against the current develop to see if it fixes things. @cbouy please do double check that I've not accidentally broken things! edit: best way to check that this is fixed is just by re-running CI I guess -- number of successful CI runs: 3 (that should be enough) |

The current "homemade" caching system in the RDKit converter only allows to store the most recent conversion.
This new version uses the
functools.lru_cachewhich allows users to select how many molecules should be cached, and improves readability/maintainability IMOAlso, the new caching system retrieves the converted items from the hash of all the arguments passed to the decorated
atomgroup_to_molfunction, instead of the id of the atomgroup and the arguments, which makes more sense. I didn't know what a hash was until recently so please forgive me for the rookie mistake :DNow if you successively run
u.atoms.convert_to("RDKIT")it will benefit from the caching system.I needed to convert two different atomgroups (protein and ligand) while iterating over a trajectory and the previous system would just rebuild the whole topology (which takes quite some time for a protein) for each molecule at every frame hence why I think this is necessary. Now it works like a breeze.
Changes made in this Pull Request:
functools.lru_cacheset_converter_cache_size(maxsize)function to modify how many items are retained in the cacheatomgroup_to_moloutside of the RDKitConverter class (it's not really needed there anyway), otherwise I need to define hash and eq dunders for the caching to workAttributeError. The error is not raised whenNoImplicit=Falseforceparameter to the RDKitConverter to ignore the aboveAttributeErrorand continue the conversion, which is mostly useful for inorganic molecules, CO2 and so on.PR Checklist