-
Notifications
You must be signed in to change notification settings - Fork 109
Description
This is carrying on from zarr-developers/zarr-python#258
I've tried to come up with a minimal example, but it's tricky to illustrate without showing the context. Here is an interaction with zarr with some instrumentation in the encode/decode methods for json.
z = zarr.empty(2, dtype=object, object_codec=numcodecs.JSON(), chunks=(1,))
z[0] = ["11"]
z[1] = ["1", "1"]
print(z[:]) # Borksoutput:
INPUT: (1,)
INPUT: (1,)
OUTPUT: (1, 1)
OUTPUT: (1, 2)
Traceback (most recent call last):
File "dev.py", line 34, in <module>
print(z[:]) # Borks
File "/home/jk/.local/lib/python3.5/site-packages/zarr/core.py", line 559, in __getitem__
return self.get_basic_selection(selection, fields=fields)
File "/home/jk/.local/lib/python3.5/site-packages/zarr/core.py", line 685, in get_basic_selection
fields=fields)
File "/home/jk/.local/lib/python3.5/site-packages/zarr/core.py", line 727, in _get_basic_selection_nd
return self._get_selection(indexer=indexer, out=out, fields=fields)
File "/home/jk/.local/lib/python3.5/site-packages/zarr/core.py", line 1015, in _get_selection
drop_axes=indexer.drop_axes, fields=fields)
File "/home/jk/.local/lib/python3.5/site-packages/zarr/core.py", line 1608, in _chunk_getitem
chunk = self._decode_chunk(cdata)
File "/home/jk/.local/lib/python3.5/site-packages/zarr/core.py", line 1751, in _decode_chunk
chunk = chunk.reshape(self._chunks, order=self._order)
ValueError: cannot reshape array of size 2 into shape (1,)
The INPUT lines are the shapes of the input arrays to encode and the OUTPUT lines are the corresponding output shapes of the arrays from decode.
Problem description
When calling numpy.array([["s1", "s2"], ["s3, "s4"]], dtype=object) numpy is quite aggressive about reshaping the array to store things more efficiently.
I've played around with this a fair bit, and I think the only options are to
-
Drop the numpy dependency in the encoding and decoding steps for JSON (i.e, don't include the dtype in the JSON encoding), and provide the supplied argument directly to the JSON encoder (and conversely, directly return the value of json.loads() from decode.
-
Also encode the input array shape in the JSON encoding.
Both of these options are ugly because they break backward compatibility. I'll make a PR for demonstrating option 2 in a minute for discussion.