Implements cache validation at the Universe level#3135
Conversation
Objects keep their own caches, but those can now be set to be validated against a centralized registry under object.universe. This simplifies the centralized invalidation of object caches. Applied to fragment caching and added asv benchmark. Already large speedup in fragment accession (Fixes #2376). Tests include cache invalidation when bonds are added/removed. Added asv's benchmarks/results subdir to .gitignore
Codecov Report
@@ Coverage Diff @@
## develop #3135 +/- ##
===========================================
- Coverage 93.12% 91.61% -1.52%
===========================================
Files 171 166 -5
Lines 22706 21914 -792
Branches 3209 3206 -3
===========================================
- Hits 21146 20076 -1070
+ Misses 1503 1273 -230
- Partials 57 565 +508
Continue to review full report at Codecov.
|
|
@mnmelo if I've read this correctly.. caches are still stored on the AtomGroup, but there's also a check against WRT the CacheKey class, is there a reason we can't use Would it be possible to instead store the cache on Universe, so there'd be something like |
|
Yup, it is mostly as you understood (but see the 2nd paragraph below). I would have preferred to use As to storing the cache itself under universe, see that that not only leaks ints but also AtomGroups and suchlike, which is something heavier (imagine a Storing the cache with the object solves garbage collection locks, allowing always for a clean collection of the ag. And using the validity registry still allows for full flexibility in cache invalidation. |
|
Also see that if what you want is to remove the ugly, dummy although I suspect this is less performant when used repeatedly. |
richardjgowers
left a comment
There was a problem hiding this comment.
@mnmelo yes you're right. I'd missed that the values are also AtomGroups, so if we store everything on Universe it's annoying to get things to garbage collect.
The _CacheKey confused me for a bit until I realised it's a bit like a self-destructing int
| # objects don't clutter it. | ||
| valid_caches = u_cache.setdefault(key, weakref.WeakSet()) | ||
| try: | ||
| if self._cache_key not in valid_caches: |
There was a problem hiding this comment.
As a comment, one day it might be worthwhile to put this self._cache_key setting behaviour into Group.__hash__ so that every AtomGroup natively hashes quickly/cleanly/weakly.
There was a problem hiding this comment.
Great idea; that seems super clean! I can implement it right away if you're ok with having both functionalities added in the same PR.
There was a problem hiding this comment.
I went over and checked and we already have a __hash__ defined, and it looks like it's actually hashing ix which isn't going to be fast, but technically necessary for two equivalent AGs to hash identically. For this lookup I'd rather we quickly hash the AG and maybe have duplicate caches
There was a problem hiding this comment.
Ah, yea. I hadn't checked __hash__ but I had initially tried to use the ag itself as the key, and got the same slow performance (I just assumed the entire object was being hashed instead of just _ix, which probably boils down to a similar time penalty).
* Implements cache validation at the Universe level Objects keep their own caches, but those can now be set to be validated against a centralized registry under object.universe. This simplifies the centralized invalidation of object caches. Applied to fragment caching and added asv benchmark. Already large speedup in fragment accession (Fixes MDAnalysis#2376). Tests include cache invalidation when bonds are added/removed. Added asv's benchmarks/results subdir to .gitignore
Objects keep their own caches, but those can now be set to be validated against a centralized registry under object.universe. This simplifies the centralized invalidation of object caches.
This PR implements a mix of the ideas floated in #2376, #3005 and in the mailing list discussion: caches are still stored under an object's own
_cachebut now, if asked to, the cache retrieval can checkuniverse._cache['_valid'][key]for cache validity. This solution, compared to storing the actual cache underuniverse, prevents garbage collection from getting blocked when an object references itself in a cache.Applied to fragment caching and added asv benchmark (
FragmentCaching.time_find_cached_fragments). Already large speedup in fragment accession (Fixes #2376):Tests include cache invalidation when bonds are added/removed.
Also added asv's benchmarks/results subdir to
.gitignorePR Checklist