Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

[Python] __getitem__ copies too often #13166

@kohr-h

Description

@kohr-h

Description

While working on #13134 I noticed that NDArray.__getitem__ does not really do a good job of determining whether or not a view instead of a copy can be returned.

For instance, wrapping any indexing object into a 1-element tuple should have no effect at all (at least if one wants to be consistent with NumPy), but the wrapping currently leads to a copy.
Or, indexing a 3D array as array[0] should be equivalent to array[0, :, :], but the latter copies while the former does not.

Here are some examples:

def getitem_returns_view(x, idx):
    tmp = x.copy()
    tmp[:] = 0
    y = tmp[idx]
    y[:] = 1
    return all(yi == xi for yi, xi in zip(y.reshape((-1)), tmp[idx].reshape((-1,))))

x = nd.zeros((3, 4, 5))

## Integers

# Okay
getitem_returns_view(x, 0)  # True
getitem_returns_view(x.asnumpy(), 0)  # True

# NOT okay
getitem_returns_view(x, (0,))  # False
getitem_returns_view(x.asnumpy(), (0,))  # True

# Same with 1 instead of 0

## Slices

# Okay
getitem_returns_view(x, slice(None))  # True
getitem_returns_view(x.asnumpy(), slice(None))  # True

# NOT okay
getitem_returns_view(x, (slice(None),))  # False
getitem_returns_view(x.asnumpy(), (slice(None),))  # True

# NOT okay
getitem_returns_view(x, (0, slice(None)))  # False
getitem_returns_view(x.asnumpy(), (0, slice(None)))  # True

# NOT okay
getitem_returns_view(x, (0, slice(None), slice(None)))  # False
getitem_returns_view(x.asnumpy(), (0, slice(None), slice(None)))  # True

# NOT at all okay
getitem_returns_view(x, (slice(None), slice(None), slice(None)))  # False
getitem_returns_view(x.asnumpy(), (slice(None), slice(None), slice(None)))  # True

In particular the last case array[:, :, :], which is explicitly everything, should definitely not copy.

Suggestion

Instead of the current way of special-casing integers, lists and slice objects, I think it would be more robust to always sanitize the indexing object, in particular:

  • If it's not a tuple, wrap it in a 1-tuple.
  • Expand it to length array.ndim by adding slice(None) to the right.

Then the subsequent logic can look at all the entries in the tuple and determine whether the result will be contiguous (is it correct that the backend does not yet support strided views?). That's not entirely trivial but certainly doable.

Of course, things get a little bit more involved if None (new axis) and Ellipsis (...) are allowed as well. But the code in #13143 already does the above sanitization for that case, it just doesn't always use it to avoid copies.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions