Skip to content

Conversation

@TomAugspurger
Copy link
Contributor

xref #30867

cc @jorisvandenbossche. The alternative is to use a warnings filter inside SingleBlockManger.get_slice to filter the warning, but I think avoiding the warning in the first place is a bit nicer.

@TomAugspurger TomAugspurger added this to the 1.0.0 milestone Jan 27, 2020
@TomAugspurger TomAugspurger added Indexing Related to indexing on series/frames, not to indexes themselves Warnings Warnings that appear or should be added to pandas labels Jan 27, 2020
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good!

@TomAugspurger
Copy link
Contributor Author

Oh, I got who calls what backwards here... Will need to re-think things...

We don't want to change all the index subclasses __getitem__s to do the _getitem_deprecate_nd version that maybe silences warnings.

@TomAugspurger
Copy link
Contributor Author

Changed to using a warnings filter. Note that this has a perf impact for slicing a Series. ~25% slower

Master

In [9]: s = pd.Series(range(10000))

In [10]: %timeit s[:5]
43.5 µs ± 2.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [11]: t = pd.Series([], dtype=float)

In [12]: %timeit t[:]
47.8 µs ± 2.24 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

This PR

In [7]: s = pd.Series(range(10000))

In [8]: %timeit s[:5]
55.6 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [9]: t = pd.Series([], dtype=float)

In [10]: %timeit t[:]
64.2 µs ± 2.27 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Is that slowdown acceptable? Right now, I don't think it is, given that

  1. This is just a DeprecationWarning
  2. We'll be deprecating this in the future anyway.

@jbrockmendel
Copy link
Member

jbrockmendel commented Jan 27, 2020

I think there are 2 ways we get to get_slice with non-slice. One of them is L931, and can be avoided be extracting the single-item list after potentially warning in #31333. The other is on L940 and is explicltly an "mpl hackaround", so we can maybe-unpack there.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jan 28, 2020

Thanks @jbrockmendel, that did it. We're able to apply the filter only when using the mpl compat path.

master

In [2]: s = pd.Series(range(10000))

In [3]: %timeit s[:, None]
36.2 µs ± 293 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [4]: t = pd.Series([], dtype=float)

In [5]: %timeit t[:]
52 µs ± 3.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [6]: %timeit s[:5]
45 µs ± 2.27 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

PR

In [4]: s = pd.Series(range(10000))

In [5]: %timeit s[:, None]
50.2 µs ± 2.56 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [6]: t = pd.Series([], dtype=float)

In [7]: %timeit t[:]
48.4 µs ± 2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [8]: %timeit s[:5]
44.3 µs ± 2.52 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

So the first case is the only one with a slowdown, which I think is acceptable given that we want to deprecate that behavior anyway.

@TomAugspurger
Copy link
Contributor Author

Had to xfail a test asserting that this produced a warning. Will want to restore that with one from Series.__getitem__ I think, but not for 1.0

raise IndexError("Requested axis not found in manager")

return type(self)(self._block._slice(slobj), self.index[slobj], fastpath=True)
return type(self)(self._block._slice(slobj), self.index[slobj], fastpath=True,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we dont want the trailing comma here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth rerunning CI over?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can change it in my next "assorted cleanups" PR

@jbrockmendel
Copy link
Member

So the first case is the only one with a slowdown, which I think is acceptable given that we want to deprecate that behavior anyway.

Sounds good

@TomAugspurger TomAugspurger merged commit 0575149 into pandas-dev:master Jan 28, 2020
@TomAugspurger TomAugspurger deleted the ndim-indexing-series branch January 28, 2020 20:55
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 28, 2020
TomAugspurger added a commit that referenced this pull request Jan 28, 2020
…31403)

Co-authored-by: Tom Augspurger <TomAugspurger@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Indexing Related to indexing on series/frames, not to indexes themselves Warnings Warnings that appear or should be added to pandas

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants