Skip to content

Conversation

@alimanfoo
Copy link
Member

@alimanfoo alimanfoo commented Dec 31, 2017

This PR adds a new zarr.storage.LRUStoreCache class which can be used as a cache layer for slow stores (e.g., stores accessed via network).

Also in developing this PR some inconsistencies in the way that zarr.storage.getsize() was implemented for different store types came to light and have been fixed.

TODO:

  • tests with max_size not None
  • test key caching
  • test coverage to 100%
  • API docs
  • document in tutorial
  • document changes

zarr/storage.py Outdated
with self._mutex:
if self._keys_cache is None:
self._keys_cache = list(self._store.keys())
return iter(self._keys_cache)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably move this under the mutex lock.

@alimanfoo alimanfoo added this to the v2.2 milestone Dec 31, 2017
@alimanfoo alimanfoo added the enhancement New features or improvements label Dec 31, 2017
@alimanfoo
Copy link
Member Author

cc @jhamman, @mrocklin, this PR provides another strategy for improving performance with cloud stores, adding a zarr.LRUStoreCache class which implements a local in-memory cache layer for any backing store. This won't help with write performance but should help with read performance for any computations that revisit the same data and/or metadata more than once. Simple usage example here.

@alimanfoo alimanfoo changed the title WIP store cache LRU store cache Jan 2, 2018
@alimanfoo alimanfoo merged commit c4e2e96 into master Jan 2, 2018
@alimanfoo alimanfoo deleted the store-cache-20171230 branch January 2, 2018 18:20
@jakirkham jakirkham mentioned this pull request Jan 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New features or improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants