Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
8ef5dc1
Create an SQLite-backed mutable mapping
jakirkham Dec 21, 2018
e3e2c2e
Test SQLiteStore
jakirkham Dec 21, 2018
d60aaab
Export `SQLiteStore` to the top-level namespace
jakirkham Dec 21, 2018
ace251c
Include some SQLiteStore examples
jakirkham Dec 21, 2018
ecf18f7
Demonstrate the `SQLiteStore` in the tutorial
jakirkham Dec 21, 2018
92a4d71
Provide API documentation for `SQLiteStore`
jakirkham Dec 21, 2018
efa9ccd
Make a release note for `SQLiteStore`
jakirkham Dec 21, 2018
6f68451
Use unique extension for `SQLiteStore` files
jakirkham Dec 21, 2018
9bbbde6
Only close SQLite database when requested
jakirkham Dec 21, 2018
20ef384
Update docs to show how to close `SQLiteStore`
jakirkham Dec 21, 2018
a8f31cf
Ensure all SQL commands are capitalized
jakirkham Dec 21, 2018
5ddc193
Simplify `SQLiteStore`'s `__delitem__` using `in`
jakirkham Dec 21, 2018
0abcc1a
Drop no longer needed flake8 error suppression
jakirkham Dec 21, 2018
b339c09
Simplify `close` and use `flush`
jakirkham Dec 21, 2018
8b8d289
Flush before pickling `SQLiteStore`
jakirkham Dec 21, 2018
1cac5eb
Special case in-memory SQLite database
jakirkham Dec 21, 2018
b8e2d23
Drop unneeded empty `return` statement
jakirkham Dec 21, 2018
4db7e14
Update docs/release.rst
alimanfoo Dec 21, 2018
31a9af3
Update docs/release.rst
alimanfoo Dec 21, 2018
9f5d02b
Correct default value for `check_same_thread`
jakirkham Dec 21, 2018
ac6827e
Flush after making any mutation to the database
jakirkham Dec 21, 2018
8b35eb8
Skip flushing data when pickling `SQLiteStore`
jakirkham Dec 21, 2018
f8d3f03
Skip using `flush` in `close`
jakirkham Dec 21, 2018
1abeba7
Implement `update` for `SQLiteStore`
jakirkham Dec 21, 2018
4bbbeba
Rewrite `__setitem__` to use `update`
jakirkham Dec 21, 2018
f9481b8
Disable `check_same_thread` by default again
jakirkham Dec 21, 2018
0188a60
Force some parameters to defaults
jakirkham Dec 21, 2018
1af4446
Drop `flush` calls from `SQLiteStore`
jakirkham Dec 21, 2018
ca6b8a4
Drop the `flush` function from `SQLiteStore`
jakirkham Dec 21, 2018
eb4564b
Implement optimized `clear` for `SQLiteStore`
jakirkham Dec 25, 2018
4f59451
Implement optimized `rmdir` for `SQLiteStore`
jakirkham Dec 25, 2018
5ab54c0
Implement optimized `getsize` for `SQLiteStore`
jakirkham Dec 25, 2018
c386c72
Implement optimized `listdir` for `SQLiteStore`
jakirkham Dec 25, 2018
f9dfc06
Implement `rename` for `SQLiteStore`
jakirkham Dec 25, 2018
8d4a8e2
Allow users to specify the SQLite table name
jakirkham Dec 25, 2018
349a885
Randomize temporary table name
jakirkham Dec 25, 2018
bfaea1c
add mongodb and redis stores
Dec 27, 2018
d4ad363
top level doc strings
Dec 27, 2018
cfcff9d
fix host kwarg
Dec 27, 2018
09869f5
pickle support
Dec 28, 2018
cdc2656
different way of launching dbs on travis
Dec 30, 2018
c97204c
back to default travis configs
Dec 30, 2018
6c295ee
fixes to mapping classes for both redis and mongodb stores
Dec 31, 2018
b47b244
default redis port
Dec 31, 2018
eb1ce7a
pep8
Dec 31, 2018
e0ccee1
decode for py2?
Dec 31, 2018
533326a
no decode for py2
Dec 31, 2018
d549199
address comments
Jan 4, 2019
fbcc86f
cast to binary type in mongo getitem
Jan 7, 2019
88bb03e
more doc strings
Jan 7, 2019
531eeff
more docs
Jan 7, 2019
add3a0d
split release note into two bullets
Jan 13, 2019
9dce652
Merge branch 'master' of github.com:zarr-developers/zarr into mongodb…
Jan 22, 2019
ef5bc7d
whitespace fix in .travis.yml
Jan 22, 2019
dcd79e5
lint after merge
Jan 22, 2019
9be1557
pin mongo/redis versions and a few doc changes
Feb 7, 2019
20dd25a
Merge branch 'master' of github.com:zarr-developers/zarr into mongodb…
Feb 7, 2019
e2988be
use redis client.delete and check for deleted keys
Feb 7, 2019
44e8850
fix typo in requirements
Feb 7, 2019
7ff0f1a
Update docs/release.rst
alimanfoo Feb 7, 2019
06107ce
Update docs/release.rst
alimanfoo Feb 7, 2019
3821e16
skip redis/mongodb tests when unable to connect
Feb 8, 2019
881b051
Merge branch 'mongodb_store' of github.com:jhamman/zarr into mongodb_…
Feb 8, 2019
5de3e05
fix pep8
alimanfoo Feb 8, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ addons:
packages:
- libdb-dev

services:
- redis-server
- mongodb

matrix:
include:
- python: 2.7
Expand All @@ -20,6 +24,9 @@ matrix:
dist: xenial
sudo: true

before_script:
- mongo mydb_test --eval 'db.createUser({user:"travis",pwd:"test",roles:["readWrite"]});'

install:
- pip install -U pip setuptools wheel tox-travis coveralls

Expand Down
2 changes: 2 additions & 0 deletions docs/api/storage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ Storage (``zarr.storage``)

.. automethod:: close

.. autoclass:: MongoDBStore
.. autoclass:: RedisStore
.. autoclass:: LRUStoreCache

.. automethod:: invalidate
Expand Down
8 changes: 8 additions & 0 deletions docs/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ Enhancements
* Efficient iteration over arrays by decompressing chunkwise.
By :user:`Jerome Kelleher <jeromekelleher>`, :issue:`398`, :issue:`399`.

* Adds the Redis-backed :class:`zarr.storage.RedisStore` class enabling a
Redis database to be used as the backing store for an array or group.
By :user:`Joe Hamman <jhamman>`, :issue:`299`, :issue:`372`.

* Adds the MongoDB-backed :class:`zarr.storage.MongoDBStore` class enabling a
MongoDB database to be used as the backing store for an array or group.
By :user:`Joe Hamman <jhamman>`, :issue:`299`, :issue:`372`.

Bug fixes
~~~~~~~~~

Expand Down
7 changes: 7 additions & 0 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -739,6 +739,13 @@ Python is built with SQLite support)::
>>> z[:] = 42
>>> store.close()

Also added in Zarr version 2.3 are two storage classes for interfacing with server-client
databases. The :class:`zarr.storage.RedisStore` class interfaces `Redis <https://redis.io/>`_
(an in memory data structure store), and the :class:`zarr.storage.MongoDB` class interfaces
with `MongoDB <https://www.mongodb.com/>`_ (an oject oriented NoSQL database). These stores
respectively require the `redis <https://redis-py.readthedocs.io>`_ and
`pymongo <https://api.mongodb.com/python/current/>`_ packages to be installed.

Distributed/cloud storage
~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
2 changes: 2 additions & 0 deletions requirements_dev_optional.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# These packages are currently not available on Windows.
bsddb3==6.2.6
lmdb==0.94
redis==3.0.1
pymongo==3.7.1
2 changes: 1 addition & 1 deletion zarr/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
ones_like, full_like, open_array, open_like, create)
from zarr.storage import (DictStore, DirectoryStore, ZipStore, TempStore,
NestedDirectoryStore, DBMStore, LMDBStore, SQLiteStore,
LRUStoreCache)
LRUStoreCache, RedisStore, MongoDBStore)
from zarr.hierarchy import group, open_group, Group
from zarr.sync import ThreadSynchronizer, ProcessSynchronizer
from zarr.codecs import *
Expand Down
184 changes: 183 additions & 1 deletion zarr/storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
normalize_storage_path, buffer_size,
normalize_fill_value, nolock, normalize_dtype)
from zarr.meta import encode_array_metadata, encode_group_metadata
from zarr.compat import PY2, OrderedDict_move_to_end
from zarr.compat import PY2, OrderedDict_move_to_end, binary_type
from numcodecs.registry import codec_registry
from numcodecs.compat import ensure_bytes, ensure_contiguous_ndarray
from zarr.errors import (err_contains_group, err_contains_array, err_bad_compressor,
Expand Down Expand Up @@ -2084,6 +2084,188 @@ def clear(self):
)


class MongoDBStore(MutableMapping):
"""Storage class using MongoDB.

.. note:: This is an experimental feature.

Requires the `pymongo <https://api.mongodb.com/python/current/>`_
package to be installed.

Parameters
----------
database : string
Name of database
collection : string
Name of collection
**kwargs
Keyword arguments passed through to the `pymongo.MongoClient` function.

Examples
--------
Store a single array::

>>> import zarr
>>> store = zarr.MongoDBStore('localhost')
>>> z = zarr.zeros((10, 10), chunks=(5, 5), store=store, overwrite=True)
>>> z[...] = 42
>>> store.close()

Store a group::

>>> store = zarr.MongoDBStore('localhost')
>>> root = zarr.group(store=store, overwrite=True)
>>> foo = root.create_group('foo')
>>> bar = foo.zeros('bar', shape=(10, 10), chunks=(5, 5))
>>> bar[...] = 42
>>> store.close()

Notes
-----
The maximum chunksize in MongoDB documents is 16 MB.

"""

_key = 'key'
_value = 'value'

def __init__(self, database='mongodb_zarr', collection='zarr_collection',
**kwargs):
import pymongo

self._database = database
self._collection = collection
self._kwargs = kwargs

self.client = pymongo.MongoClient(**self._kwargs)
self.db = self.client.get_database(self._database)
self.collection = self.db.get_collection(self._collection)

def __getitem__(self, key):
doc = self.collection.find_one({self._key: key})

if doc is None:
raise KeyError(key)
else:
return binary_type(doc[self._value])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alimanfoo - casting this return value to a binary string seems to have corrected the json problem we were discussing last week.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Wonder if ensure_bytes would work here as well. 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, what type of object is doc[self._value]?

Copy link
Member

@jakirkham jakirkham Feb 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT it's a bytes object. Am curious to see if we may have already fixed this with other changes and this can be dropped. Have added PR ( #401 ) to give this a try.

Edit: Turns out is a bytes subclass object on Python 2 and a bytes object on Python 3. More details in commit ( efa3ee9 ).


def __setitem__(self, key, value):
value = ensure_bytes(value)
self.collection.replace_one({self._key: key},
{self._key: key, self._value: value},
upsert=True)

def __delitem__(self, key):
result = self.collection.delete_many({self._key: key})
if not result.deleted_count == 1:
raise KeyError(key)

def __iter__(self):
for f in self.collection.find({}):
yield f[self._key]

def __len__(self):
return self.collection.count_documents({})

def __getstate__(self):
return self._database, self._collection, self._kwargs

def __setstate__(self, state):
database, collection, kwargs = state
self.__init__(database=database, collection=collection, **kwargs)

def close(self):
"""Cleanup client resources and disconnect from MongoDB."""
self.client.close()

def clear(self):
"""Remove all items from store."""
self.collection.delete_many({})


class RedisStore(MutableMapping):
"""Storage class using Redis.

.. note:: This is an experimental feature.

Requires the `redis <https://redis-py.readthedocs.io/>`_
package to be installed.

Parameters
----------
prefix : string
Name of prefix for Redis keys
**kwargs
Keyword arguments passed through to the `redis.Redis` function.

Examples
--------
Store a single array::

>>> import zarr
>>> store = zarr.RedisStore(port=6379)
>>> z = zarr.zeros((10, 10), chunks=(5, 5), store=store, overwrite=True)
>>> z[...] = 42

Store a group::

>>> store = zarr.RedisStore(port=6379)
>>> root = zarr.group(store=store, overwrite=True)
>>> foo = root.create_group('foo')
>>> bar = foo.zeros('bar', shape=(10, 10), chunks=(5, 5))
>>> bar[...] = 42

"""
def __init__(self, prefix='zarr', **kwargs):
import redis
self._prefix = prefix
self._kwargs = kwargs

self.client = redis.Redis(**kwargs)

def _key(self, key):
return '{prefix}:{key}'.format(prefix=self._prefix, key=key)

def __getitem__(self, key):
return self.client[self._key(key)]

def __setitem__(self, key, value):
value = ensure_bytes(value)
self.client[self._key(key)] = value

def __delitem__(self, key):
count = self.client.delete(self._key(key))
if not count:
raise KeyError(key)

def keylist(self):
offset = len(self._key('')) # length of prefix
return [key[offset:].decode('utf-8')
for key in self.client.keys(self._key('*'))]

def keys(self):
for key in self.keylist():
yield key

def __iter__(self):
for key in self.keys():
yield key

def __len__(self):
return len(self.keylist())

def __getstate__(self):
return self._prefix, self._kwargs

def __setstate__(self, state):
prefix, kwargs = state
self.__init__(prefix=prefix, **kwargs)

def clear(self):
for key in self.keys():
del self[key]


class ConsolidatedMetadataStore(MutableMapping):
"""A layer over other storage, where the metadata has been consolidated into
a single key.
Expand Down
50 changes: 48 additions & 2 deletions zarr/tests/test_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@
DirectoryStore, ZipStore, init_group, group_meta_key,
getsize, migrate_1to2, TempStore, atexit_rmtree,
NestedDirectoryStore, default_compressor, DBMStore,
LMDBStore, SQLiteStore, atexit_rmglob, LRUStoreCache,
ConsolidatedMetadataStore)
LMDBStore, SQLiteStore, MongoDBStore, RedisStore,
atexit_rmglob, LRUStoreCache, ConsolidatedMetadataStore)
from zarr.meta import (decode_array_metadata, encode_array_metadata, ZARR_FORMAT,
decode_group_metadata, encode_group_metadata)
from zarr.compat import PY2
Expand Down Expand Up @@ -900,6 +900,29 @@ def test_context_manager(self):
except ImportError: # pragma: no cover
sqlite3 = None

try:
import pymongo
from pymongo.errors import ConnectionFailure, ServerSelectionTimeoutError
try:
client = pymongo.MongoClient(host='127.0.0.1',
serverSelectionTimeoutMS=1e3)
client.server_info()
except (ConnectionFailure, ServerSelectionTimeoutError): # pragma: no cover
pymongo = None
except ImportError: # pragma: no cover
pymongo = None

try:
import redis
from redis import ConnectionError
try:
rs = redis.Redis("localhost", port=6379)
rs.ping()
except ConnectionError: # pragma: no cover
redis = None
except ImportError: # pragma: no cover
redis = None


@unittest.skipIf(sqlite3 is None, 'python built without sqlite')
class TestSQLiteStore(StoreTests, unittest.TestCase):
Expand Down Expand Up @@ -930,6 +953,29 @@ def test_pickle(self):
pickle.dumps(store)


@unittest.skipIf(pymongo is None, 'test requires pymongo')
class TestMongoDBStore(StoreTests, unittest.TestCase):

def create_store(self):
store = MongoDBStore(host='127.0.0.1', database='zarr_tests',
collection='zarr_tests')
# start with an empty store
store.clear()
return store


@unittest.skipIf(redis is None, 'test requires redis')
class TestRedisStore(StoreTests, unittest.TestCase):

def create_store(self):
# TODO: this is the default host for Redis on Travis,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, is this TODO something that needs to be resolved before merge, or can we live with as-is?

# we probably want to generalize this though
store = RedisStore(host='localhost', port=6379)
# start with an empty store
store.clear()
return store


class TestLRUStoreCache(StoreTests, unittest.TestCase):

def create_store(self):
Expand Down