Skip to content

Conversation

@jakirkham
Copy link
Member

As we can coerce any object that is array-like to an ndarray, there should be no need for these explicit ndarray checks. So eliminate them by using ensure_ndarray to get an ndarray object instead.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • Docs build locally (e.g., run tox -e docs)
  • AppVeyor and Travis CI passes
  • Test coverage is 100% (Coveralls passes)

Instead of checking to see if `chunk` is an `ndarray`, use
`ensure_ndarray` to get an `ndarray` viewing the underlying data in
`chunk`. This way we know attributes like `dtype` are available and can
be checked easily. Also makes this a bit more friendly with other
array-like types.
Since we are interested in getting an `ndarray` representing the buffer
within `out` for writing into, go ahead and use `ensure_ndarray` to
coerce the underlying buffer into an `ndarray`. This way we can avoid a
needless check and just write into any array-like value for `out` that
is provided.
Appears that `out` can also be a Zarr `Array` or any other array-like
that does not expose a buffer, but does allow writing into. In these
cases `ensure_ndarray` will fail as there is not an underlying buffer
that can be used with an `ndarray`. To also handle this case, catch the
`TypeError` that `ensure_ndarray` will raise in this case and use that
to indicate whether `out` is now an `ndarray` or not. This allows us to
continue to write into arbitrary buffers, but also correctly handle
objects that do not expose buffers.
@jakirkham
Copy link
Member Author

@jrbourbeau, if you are around, this could use a review 🙂


out_is_ndarray = True
try:
out = ensure_ndarray(out)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is already checked by test_get_selection_out. Without this try/except, we get the following TypeError because out is a Zarr Array, which cannot be coerced to an ndarray. This is ok and intentional. So we just carry on without coercing that case and note that we do not have an ndarray when checking later.

Details
_________________________ test_get_selection_out ____________________________

    def test_get_selection_out():
    
        # basic selections
        a = np.arange(1050)
        z = zarr.create(shape=1050, chunks=100, dtype=a.dtype)
        z[:] = a
        selections = [
            slice(50, 150),
            slice(0, 1050),
            slice(1, 2),
        ]
        for selection in selections:
            expect = a[selection]
            out = zarr.create(shape=expect.shape, chunks=10, dtype=expect.dtype, fill_value=0)
>           z.get_basic_selection(selection, out=out)

zarr/tests/test_indexing.py:1036: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
zarr/core.py:698: in get_basic_selection
    fields=fields)
zarr/core.py:740: in _get_basic_selection_nd
    return self._get_selection(indexer=indexer, out=out, fields=fields)
zarr/core.py:1028: in _get_selection
    drop_axes=indexer.drop_axes, fields=fields)
zarr/core.py:1573: in _chunk_getitem
    out = ensure_ndarray(out)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

buf = <zarr.core.Array (100,) int64>

    def ensure_ndarray(buf):
        """Convenience function to coerce `buf` to a numpy array, if it is not already a
        numpy array.
    
        Parameters
        ----------
        buf : array-like or bytes-like
            A numpy array or any object exporting a buffer interface.
    
        Returns
        -------
        arr : ndarray
            A numpy array, sharing memory with `buf`.
    
        Notes
        -----
        This function will not create a copy under any circumstances, it is guaranteed to
        return a view on memory exported by `buf`.
    
        """
    
        if isinstance(buf, np.ndarray):
            # already a numpy array
            arr = buf
    
        elif isinstance(buf, array.array) and buf.typecode in 'cu':
            # Guard condition, do not support array.array with unicode type, this is
            # problematic because numpy does not support it on all platforms. Also do not
            # support char as it was removed in Python 3.
            raise TypeError('array.array with char or unicode type is not supported')
    
        else:
    
            # N.B., first take a memoryview to make sure that we subsequently create a
            # numpy array from a memory buffer with no copy
    
            if PY2:  # pragma: py3 no cover
                try:
                    mem = memoryview(buf)
                except TypeError:
                    # on PY2 also check if object exports old-style buffer interface
                    mem = np.getbuffer(buf)
    
            else:  # pragma: py2 no cover
>               mem = memoryview(buf)
E               TypeError: memoryview: a bytes-like object is required, not 'Array'

.tox/py36/lib/python3.6/site-packages/numcodecs/compat.py:74: TypeError

ref: https://travis-ci.org/zarr-developers/zarr-python/jobs/610533967#L2316

@jakirkham
Copy link
Member Author

Planning on merging EOD tomorrow if no comments.

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @jakirkham! Generally these changes seem fine by me. I have one question about when ensure_ndarray raises an error that I've left below.


# check object encoding
if isinstance(chunk, np.ndarray) and chunk.dtype == object:
if ensure_ndarray(chunk).dtype == object:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that some non-ndarrays would previously skip over this if block but now fail if ensure_ndarray raises an error? For example:

In [28]: from numcodecs.compat import ensure_ndarray

In [29]: import array

In [30]: chunk = array.array('u', 'hello \u2641')

In [31]: if isinstance(chunk, np.ndarray) and chunk.dtype == object:
    ...:     pass
    ...:

In [32]: if ensure_ndarray(chunk).dtype == object:
    ...:     pass
    ...:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-32-e1e15cbf81c2> in <module>
----> 1 if ensure_ndarray(chunk).dtype == object:
      2     pass
      3

~/miniconda/envs/zarr-python-dev/lib/python3.7/site-packages/numcodecs/compat.py in ensure_ndarray(buf)
     63         # problematic because numpy does not support it on all platforms. Also do not
     64         # support char as it was removed in Python 3.
---> 65         raise TypeError('array.array with char or unicode type is not supported')
     66
     67     else:

TypeError: array.array with char or unicode type is not supported

To be clear, I'm not sure how likely this is (or if it's even possible) to come up in practice. Do you have a sense for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That exception would be expected as we have decided not to work with Python builtin arrays that use character or unicode types. There is some more detailed discussion in this thread.

Copy link
Member Author

@jakirkham jakirkham Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the only exception that we need to worry about is the one coming from memoryview, which will be a TypeError if the object doesn't support the buffer protocol. After that point we have a memoryview. So we know the rest of the code will work (unless NumPy gets a bug ;).

Edit: Sorry was looking at the other change for a second.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context this function is trying to turn some data into bytes that can be serialized (say to a file on disk).

The filters before this step should be returning something that either is an ndarray or could be coerced to one. This is required by the compressors that follow and the storage layer afterwards. So if ensure_ndarray fails here, then we have invalid data and raising an exception to the user would be appropriate.

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you for providing all the additional context surrounding this section. I pushed a commit with a changelog entry. Feel free to merge on green. Thanks @jakirkham!

@jakirkham
Copy link
Member Author

Thanks @jrbourbeau for the review and fix! 😄

@jakirkham jakirkham merged commit 58b1786 into zarr-developers:master Nov 12, 2019
@jakirkham jakirkham deleted the use_ensure_ndarray_more branch November 12, 2019 18:56
@Carreau Carreau added this to the v2.4 milestone Sep 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants