Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docs/api/storage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,18 @@ Storage (``zarr.storage``)
.. automethod:: close
.. automethod:: flush

.. autoclass:: LRUStoreCache

.. automethod:: invalidate
.. automethod:: invalidate_values
.. automethod:: invalidate_keys

.. autofunction:: init_array
.. autofunction:: init_group
.. autofunction:: contains_array
.. autofunction:: contains_group
.. autofunction:: listdir
.. autofunction:: rmdir
.. autofunction:: getsize
.. autofunction:: rename
.. autofunction:: migrate_1to2
5 changes: 5 additions & 0 deletions docs/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,11 @@ Enhancements
* **Added support for ``datetime64`` and ``timedelta64`` data types**;
:issue:`85`, :issue:`215`.

* **New LRUStoreCache class**. The class :class:`zarr.storage.LRUStoreCache` has been
added and provides a means to locally cache data in memory from a store that may be
slow, e.g., a store that retrieves data from a remote server via the network;
:issue:`223`.

* **New copy functions**. The new functions :func:`zarr.convenience.copy` and
:func:`zarr.convenience.copy_all` provide a way to copy groups and/or arrays
between HDF5 and Zarr, or between two Zarr groups. The
Expand Down
34 changes: 34 additions & 0 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -729,6 +729,9 @@ group (requires `lmdb <http://lmdb.readthedocs.io/>`_ to be installed)::
>>> z[:] = 42
>>> store.close()

Distributed/cloud storage
~~~~~~~~~~~~~~~~~~~~~~~~~

It is also possible to use distributed storage systems. The Dask project has
implementations of the ``MutableMapping`` interface for Amazon S3 (`S3Map
<http://s3fs.readthedocs.io/en/latest/api.html#s3fs.mapping.S3Map>`_), Hadoop
Expand Down Expand Up @@ -767,6 +770,37 @@ Here is an example using S3Map to read an array created previously::
>>> z[:].tostring()
b'Hello from the cloud!'

Note that retrieving data from a remote service via the network can be significantly
slower than retrieving data from a local file system, and will depend on network latency
and bandwidth between the client and server systems. If you are experiencing poor
performance, there are several things you can try. One option is to increase the array
chunk size, which will reduce the number of chunks and thus reduce the number of network
round-trips required to retrieve data for an array (and thus reduce the impact of network
latency). Another option is to try to increase the compression ratio by changing
compression options or trying a different compressor (which will reduce the impact of
limited network bandwidth). As of version 2.2, Zarr also provides the
:class:`zarr.storage.LRUStoreCache` which can be used to implement a local in-memory cache
layer over a remote store. E.g.::

>>> s3 = s3fs.S3FileSystem(anon=True, client_kwargs=dict(region_name='eu-west-2'))
>>> store = s3fs.S3Map(root='zarr-demo/store', s3=s3, check=False)
>>> cache = zarr.LRUStoreCache(store, max_size=2**28)
>>> root = zarr.group(store=cache)
>>> z = root['foo/bar/baz']
>>> from timeit import timeit
>>> # first data access is relatively slow, retrieved from store
... timeit('print(z[:].tostring())', number=1, globals=globals()) # doctest: +SKIP
b'Hello from the cloud!'
0.1081731989979744
>>> # second data access is faster, uses cache
... timeit('print(z[:].tostring())', number=1, globals=globals()) # doctest: +SKIP
b'Hello from the cloud!'
0.0009490990014455747

If you are still experiencing poor performance with distributed/cloud storage, please
raise an issue on the GitHub issue tracker with any profiling data you can provide, as
there may be opportunities to optimise further either within Zarr or within the mapping
interface to the storage.

.. _tutorial_copy:

Expand Down
2 changes: 1 addition & 1 deletion zarr/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from zarr.creation import (empty, zeros, ones, full, array, empty_like, zeros_like,
ones_like, full_like, open_array, open_like, create)
from zarr.storage import (DictStore, DirectoryStore, ZipStore, TempStore,
NestedDirectoryStore, DBMStore, LMDBStore)
NestedDirectoryStore, DBMStore, LMDBStore, LRUStoreCache)
from zarr.hierarchy import group, open_group, Group
from zarr.sync import ThreadSynchronizer, ProcessSynchronizer
from zarr.codecs import *
Expand Down
7 changes: 7 additions & 0 deletions zarr/compat.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,16 @@
class PermissionError(Exception):
pass

def OrderedDict_move_to_end(od, key):
od[key] = od.pop(key)


else: # pragma: py2 no cover

text_type = str
binary_type = bytes
from functools import reduce
PermissionError = PermissionError

def OrderedDict_move_to_end(od, key):
od.move_to_end(key)
Loading