Implements cache validation at the Universe level by mnmelo · Pull Request #3135 · MDAnalysis/mdanalysis

mnmelo · 2021-03-02T07:11:16Z

Objects keep their own caches, but those can now be set to be validated against a centralized registry under object.universe. This simplifies the centralized invalidation of object caches.

This PR implements a mix of the ideas floated in #2376, #3005 and in the mailing list discussion: caches are still stored under an object's own _cache but now, if asked to, the cache retrieval can check universe._cache['_valid'][key] for cache validity. This solution, compared to storing the actual cache under universe, prevents garbage collection from getting blocked when an object references itself in a cache.

Applied to fragment caching and added asv benchmark (FragmentCaching.time_find_cached_fragments). Already large speedup in fragment accession (Fixes #2376):

=============================== ============ =============
         universe_type           no caching     caching
------------------------------- ------------ -------------
 large_fragment_small_solvents   275±0.9ms    1.56±0.01μs
         large_fragment           167±9ms     1.56±0.01μs
         polymer_chains          47.9±0.2ms   1.58±0.01μs
=============================== ============ =============

Tests include cache invalidation when bonds are added/removed.

Also added asv's benchmarks/results subdir to .gitignore

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

Objects keep their own caches, but those can now be set to be validated against a centralized registry under object.universe. This simplifies the centralized invalidation of object caches. Applied to fragment caching and added asv benchmark. Already large speedup in fragment accession (Fixes #2376). Tests include cache invalidation when bonds are added/removed. Added asv's benchmarks/results subdir to .gitignore

pep8speaks · 2021-03-02T07:11:22Z

Hello @mnmelo! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-03-02 11:50:35 UTC

codecov · 2021-03-02T07:29:22Z

Codecov Report

Merging #3135 (caa3ec5) into develop (f542aa4) will decrease coverage by 1.51%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop    #3135      +/-   ##
===========================================
- Coverage    93.12%   91.61%   -1.52%     
===========================================
  Files          171      166       -5     
  Lines        22706    21914     -792     
  Branches      3209     3206       -3     
===========================================
- Hits         21146    20076    -1070     
+ Misses        1503     1273     -230     
- Partials        57      565     +508

Impacted Files	Coverage Δ
package/MDAnalysis/core/topologyattrs.py	`95.53% <100.00%> (-1.16%)`	⬇️
package/MDAnalysis/core/universe.py	`96.74% <100.00%> (-1.07%)`	⬇️
package/MDAnalysis/lib/util.py	`88.19% <100.00%> (-1.87%)`	⬇️
package/MDAnalysis/coordinates/chemfiles.py	`24.27% <0.00%> (-66.07%)`	⬇️
package/MDAnalysis/analysis/encore/covariance.py	`75.38% <0.00%> (-12.31%)`	⬇️
...ckage/MDAnalysis/analysis/encore/confdistmatrix.py	`67.41% <0.00%> (-10.12%)`	⬇️
package/MDAnalysis/coordinates/H5MD.py	`86.15% <0.00%> (-8.41%)`	⬇️
package/MDAnalysis/coordinates/DMS.py	`88.05% <0.00%> (-7.60%)`	⬇️
...age/MDAnalysis/analysis/hbonds/hbond_autocorrel.py	`87.58% <0.00%> (-7.30%)`	⬇️
package/MDAnalysis/coordinates/GMS.py	`85.31% <0.00%> (-7.16%)`	⬇️
... and 100 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f542aa4...950cc1a. Read the comment docs.

richardjgowers · 2021-03-02T13:05:17Z

@mnmelo if I've read this correctly.. caches are still stored on the AtomGroup, but there's also a check against u._caches['_valid'] to see if the cache on a Group is still trustworthy?

WRT the CacheKey class, is there a reason we can't use id(ag) to hash an AtomGroup?

Would it be possible to instead store the cache on Universe, so there'd be something like u._caches['fragments'][id(ag)] for a given ag? Then there wouldn't have to be any weakset or extra classes. This technically "leaks" integers into u._caches but I can't see this becoming a problem.

mnmelo · 2021-03-02T18:14:33Z

Yup, it is mostly as you understood (but see the 2nd paragraph below).

I would have preferred to use id(self), as you suggest, but weakrefs don't work for ints. The validity registry could be made a regular set, using id(self) for hashing, but would leak ints, as you predict. I tried to avoid that leaking, not only for memory, but to keep the validity registry set small so that lookups are fast. However, the set lookup complexity seems to be O(1), so I may have prematurely optimized. If you're ok with int leakages (sounds wrong, in principle), then I can simplify this to a regular, non-weak, set that uses id(self).

As to storing the cache itself under universe, see that that not only leaks ints but also AtomGroups and suchlike, which is something heavier (imagine a select_atoms being repeated every frame for a couple thousand frames). Hence my initial strategy of weakrefs to hold the caches under Universe. And even with weakrefs, I ran into possible garbage collection blocks when caching under universe and the ag itself is part of its own cache (don't know if it's a possibility, but prefer to future-proof it now).

Storing the cache with the object solves garbage collection locks, allowing always for a clean collection of the ag. And using the validity registry still allows for full flexibility in cache invalidation.

mnmelo · 2021-03-02T18:20:31Z

Also see that if what you want is to remove the ugly, dummy class _CacheKey it can be deployed in-line when first needed on an ag:

self._cache_key = type('_CacheKey', (), dict())()

although I suspect this is less performant when used repeatedly.

richardjgowers

@mnmelo yes you're right. I'd missed that the values are also AtomGroups, so if we store everything on Universe it's annoying to get things to garbage collect.

The _CacheKey confused me for a bit until I realised it's a bit like a self-destructing int

richardjgowers · 2021-03-03T16:44:40Z

package/MDAnalysis/lib/util.py

+                    # objects don't clutter it.
+                    valid_caches = u_cache.setdefault(key, weakref.WeakSet())
+                    try:
+                        if self._cache_key not in valid_caches:


As a comment, one day it might be worthwhile to put this self._cache_key setting behaviour into Group.__hash__ so that every AtomGroup natively hashes quickly/cleanly/weakly.

Great idea; that seems super clean! I can implement it right away if you're ok with having both functionalities added in the same PR.

I went over and checked and we already have a __hash__ defined, and it looks like it's actually hashing ix which isn't going to be fast, but technically necessary for two equivalent AGs to hash identically. For this lookup I'd rather we quickly hash the AG and maybe have duplicate caches

Ah, yea. I hadn't checked __hash__ but I had initially tried to use the ag itself as the key, and got the same slow performance (I just assumed the entire object was being hashed instead of just _ix, which probably boils down to a similar time penalty).

* Implements cache validation at the Universe level Objects keep their own caches, but those can now be set to be validated against a centralized registry under object.universe. This simplifies the centralized invalidation of object caches. Applied to fragment caching and added asv benchmark. Already large speedup in fragment accession (Fixes MDAnalysis#2376). Tests include cache invalidation when bonds are added/removed. Added asv's benchmarks/results subdir to .gitignore

PEP8 compliance

950cc1a

mnmelo mentioned this pull request Mar 2, 2021

Codecov reports wrong coverage diffs #3136

Open

tylerjereddy added Component-Core enhancement labels Mar 3, 2021

richardjgowers approved these changes Mar 3, 2021

View reviewed changes

richardjgowers reviewed Mar 3, 2021

View reviewed changes

mnmelo merged commit ff7ffa1 into develop Mar 4, 2021

mnmelo deleted the feature-universe-based-cache-validation branch March 4, 2021 00:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implements cache validation at the Universe level#3135

Implements cache validation at the Universe level#3135
mnmelo merged 2 commits intodevelopfrom
feature-universe-based-cache-validation

mnmelo commented Mar 2, 2021

Uh oh!

pep8speaks commented Mar 2, 2021 •

edited

Loading

Uh oh!

codecov bot commented Mar 2, 2021 •

edited

Loading

Uh oh!

richardjgowers commented Mar 2, 2021

Uh oh!

mnmelo commented Mar 2, 2021

Uh oh!

mnmelo commented Mar 2, 2021

Uh oh!

richardjgowers left a comment

Uh oh!

richardjgowers Mar 3, 2021

Uh oh!

mnmelo Mar 3, 2021

Uh oh!

richardjgowers Mar 3, 2021

Uh oh!

mnmelo Mar 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mnmelo commented Mar 2, 2021

PR Checklist

Uh oh!

pep8speaks commented Mar 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2021-03-02 11:50:35 UTC

Uh oh!

codecov bot commented Mar 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

richardjgowers commented Mar 2, 2021

Uh oh!

mnmelo commented Mar 2, 2021

Uh oh!

mnmelo commented Mar 2, 2021

Uh oh!

richardjgowers left a comment

Choose a reason for hiding this comment

Uh oh!

richardjgowers Mar 3, 2021

Choose a reason for hiding this comment

Uh oh!

mnmelo Mar 3, 2021

Choose a reason for hiding this comment

Uh oh!

richardjgowers Mar 3, 2021

Choose a reason for hiding this comment

Uh oh!

mnmelo Mar 4, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pep8speaks commented Mar 2, 2021 •

edited

Loading

codecov bot commented Mar 2, 2021 •

edited

Loading