Avoid calling np.asarray on lazy indexing classes#6874
Avoid calling np.asarray on lazy indexing classes#6874dcherian merged 59 commits intopydata:mainfrom
Conversation
This returns the underlying array type instead of always casting to np.array. This is necessary for Zarr stores where the Zarr Array wraps a cupy array (for example kvikio.zarr.GDSStoree). In that case, we cannot call np.asarray because __array__ is expected to always return a numpy array. We use get_array in Variable.data to make sure we don't load arrays from such GDSStores.
instead of always casting to np.asarray
for more information, see https://pre-commit.ci
|
As I understand it, the main purpose here is to remove Xarray lazy indexing class. Maybe call this |
Clean up short_array_repr.
| # so we need the explicit check for ExplicitlyIndexed | ||
| if isinstance(array, ExplicitlyIndexed): | ||
| array = array.get_duck_array() | ||
| return _wrap_numpy_scalars(array) |
There was a problem hiding this comment.
Adding _wrap_numpy_scalars allows us to handle scalars being returned by the backend. This seems OK to me in that we place fewer restrictions on the backend (and is backward compatible).
xarray/xarray/core/indexing.py
Lines 607 to 612 in 3ee7b5a
But now the issue is that we should pass an appropriate like argument to np.array but I don't see how to that from a scalar array
Good news is that backends can avoid this complication by returning arrays, so we could just ignore this ugly bit for now.
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> Co-authored-by: Stephan Hoyer <shoyer@google.com>
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
|
@Illviljan feel free to push any typing changes to this PR. I think that would really help clarify the interface. I tried adding a |
|
I don't have a better idea than to do |
|
|
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
for more information, see https://pre-commit.ci
* main: (40 commits) Faq pull request (According to pull request pydata#7604 & issue pydata#1285 (pydata#7638) add timeouts for tests (pydata#7657) Pull Request Labeler - Undo workaround sync-labels bug (pydata#7667) [pre-commit.ci] pre-commit autoupdate (pydata#7651) Allow all integer dtypes in `polyval` (pydata#7619) [skip-ci] dev whats-new (pydata#7660) Redo whats-new for 2023.03.0 (pydata#7659) Set copy=False when calling pd.Series (pydata#7642) Pin pandas < 2 (pydata#7650) Whats-new for release 2023.03.0 (pydata#7643) Bump pypa/gh-action-pypi-publish from 1.7.1 to 1.8.1 (pydata#7648) Use more descriptive link texts (pydata#7625) Fix missing 'dim' argument in _get_nan_block_lengths (pydata#7598) Fix `pcolormesh` with str coords (pydata#7612) [skip-ci] Fix groupby binary ops benchmarks (pydata#7603) Remove incomplete sentence in IO docs (pydata#7631) Allow indexing unindexed dimensions using dask arrays (pydata#5873) Bump pypa/gh-action-pypi-publish from 1.6.4 to 1.7.1 (pydata#7618) [pre-commit.ci] pre-commit autoupdate (pydata#7620) add a test for scatter colorbar extend (pydata#7616) ...
|
I'd like to merge this at the end of next week. It now has tests and should be backwards compatible with external backends. A good next step would be to finish up #7020 |
This is motivated by https://docs.rapids.ai/api/kvikio/stable/api.html#kvikio.zarr.GDSStore which on read loads the data directly into GPU memory.
Currently we rely on
np.asarrayto convert a BackendArray wrapped with a number of lazy indexing classes to a real array but this breaks forGDSStorebecause the underlying array is a cupy array, so usingnp.asarrayraises an error.np.asarraywill raise if a non-numpy array is returned so we need to use something else.Here I added
get_arraywhich likenp.arrayrecurses down until it receives a duck array.Quite a few things are broken I think , but I'd like feedback on the approach.
I considered
np.asanyarray(..., like=...)but that would require the lazy indexing classes to know what they're wrapping which doesn't seem right.Ref: xarray-contrib/cupy-xarray#10 which adds a
kvikiobackend entrypoint