-
-
Notifications
You must be signed in to change notification settings - Fork 373
Use ensure_ndarray in a few more places
#506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use ensure_ndarray in a few more places
#506
Conversation
Instead of checking to see if `chunk` is an `ndarray`, use `ensure_ndarray` to get an `ndarray` viewing the underlying data in `chunk`. This way we know attributes like `dtype` are available and can be checked easily. Also makes this a bit more friendly with other array-like types.
Since we are interested in getting an `ndarray` representing the buffer within `out` for writing into, go ahead and use `ensure_ndarray` to coerce the underlying buffer into an `ndarray`. This way we can avoid a needless check and just write into any array-like value for `out` that is provided.
Appears that `out` can also be a Zarr `Array` or any other array-like that does not expose a buffer, but does allow writing into. In these cases `ensure_ndarray` will fail as there is not an underlying buffer that can be used with an `ndarray`. To also handle this case, catch the `TypeError` that `ensure_ndarray` will raise in this case and use that to indicate whether `out` is now an `ndarray` or not. This allows us to continue to write into arbitrary buffers, but also correctly handle objects that do not expose buffers.
|
@jrbourbeau, if you are around, this could use a review 🙂 |
|
|
||
| out_is_ndarray = True | ||
| try: | ||
| out = ensure_ndarray(out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is already checked by test_get_selection_out. Without this try/except, we get the following TypeError because out is a Zarr Array, which cannot be coerced to an ndarray. This is ok and intentional. So we just carry on without coercing that case and note that we do not have an ndarray when checking later.
Details
_________________________ test_get_selection_out ____________________________
def test_get_selection_out():
# basic selections
a = np.arange(1050)
z = zarr.create(shape=1050, chunks=100, dtype=a.dtype)
z[:] = a
selections = [
slice(50, 150),
slice(0, 1050),
slice(1, 2),
]
for selection in selections:
expect = a[selection]
out = zarr.create(shape=expect.shape, chunks=10, dtype=expect.dtype, fill_value=0)
> z.get_basic_selection(selection, out=out)
zarr/tests/test_indexing.py:1036:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
zarr/core.py:698: in get_basic_selection
fields=fields)
zarr/core.py:740: in _get_basic_selection_nd
return self._get_selection(indexer=indexer, out=out, fields=fields)
zarr/core.py:1028: in _get_selection
drop_axes=indexer.drop_axes, fields=fields)
zarr/core.py:1573: in _chunk_getitem
out = ensure_ndarray(out)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
buf = <zarr.core.Array (100,) int64>
def ensure_ndarray(buf):
"""Convenience function to coerce `buf` to a numpy array, if it is not already a
numpy array.
Parameters
----------
buf : array-like or bytes-like
A numpy array or any object exporting a buffer interface.
Returns
-------
arr : ndarray
A numpy array, sharing memory with `buf`.
Notes
-----
This function will not create a copy under any circumstances, it is guaranteed to
return a view on memory exported by `buf`.
"""
if isinstance(buf, np.ndarray):
# already a numpy array
arr = buf
elif isinstance(buf, array.array) and buf.typecode in 'cu':
# Guard condition, do not support array.array with unicode type, this is
# problematic because numpy does not support it on all platforms. Also do not
# support char as it was removed in Python 3.
raise TypeError('array.array with char or unicode type is not supported')
else:
# N.B., first take a memoryview to make sure that we subsequently create a
# numpy array from a memory buffer with no copy
if PY2: # pragma: py3 no cover
try:
mem = memoryview(buf)
except TypeError:
# on PY2 also check if object exports old-style buffer interface
mem = np.getbuffer(buf)
else: # pragma: py2 no cover
> mem = memoryview(buf)
E TypeError: memoryview: a bytes-like object is required, not 'Array'
.tox/py36/lib/python3.6/site-packages/numcodecs/compat.py:74: TypeErrorref: https://travis-ci.org/zarr-developers/zarr-python/jobs/610533967#L2316
|
Planning on merging EOD tomorrow if no comments. |
jrbourbeau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @jakirkham! Generally these changes seem fine by me. I have one question about when ensure_ndarray raises an error that I've left below.
|
|
||
| # check object encoding | ||
| if isinstance(chunk, np.ndarray) and chunk.dtype == object: | ||
| if ensure_ndarray(chunk).dtype == object: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that some non-ndarrays would previously skip over this if block but now fail if ensure_ndarray raises an error? For example:
In [28]: from numcodecs.compat import ensure_ndarray
In [29]: import array
In [30]: chunk = array.array('u', 'hello \u2641')
In [31]: if isinstance(chunk, np.ndarray) and chunk.dtype == object:
...: pass
...:
In [32]: if ensure_ndarray(chunk).dtype == object:
...: pass
...:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-32-e1e15cbf81c2> in <module>
----> 1 if ensure_ndarray(chunk).dtype == object:
2 pass
3
~/miniconda/envs/zarr-python-dev/lib/python3.7/site-packages/numcodecs/compat.py in ensure_ndarray(buf)
63 # problematic because numpy does not support it on all platforms. Also do not
64 # support char as it was removed in Python 3.
---> 65 raise TypeError('array.array with char or unicode type is not supported')
66
67 else:
TypeError: array.array with char or unicode type is not supportedTo be clear, I'm not sure how likely this is (or if it's even possible) to come up in practice. Do you have a sense for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That exception would be expected as we have decided not to work with Python builtin arrays that use character or unicode types. There is some more detailed discussion in this thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically the only exception that we need to worry about is the one coming from memoryview, which will be a TypeError if the object doesn't support the buffer protocol. After that point we have a memoryview. So we know the rest of the code will work (unless NumPy gets a bug ;).
Edit: Sorry was looking at the other change for a second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context this function is trying to turn some data into bytes that can be serialized (say to a file on disk).
The filters before this step should be returning something that either is an ndarray or could be coerced to one. This is required by the compressors that follow and the storage layer afterwards. So if ensure_ndarray fails here, then we have invalid data and raising an exception to the user would be appropriate.
jrbourbeau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thank you for providing all the additional context surrounding this section. I pushed a commit with a changelog entry. Feel free to merge on green. Thanks @jakirkham!
|
Thanks @jrbourbeau for the review and fix! 😄 |
As we can coerce any object that is array-like to an
ndarray, there should be no need for these explicitndarraychecks. So eliminate them by usingensure_ndarrayto get anndarrayobject instead.TODO:
tox -e docs)