Skip to content

Conversation

@topper-123
Copy link
Contributor

@topper-123 topper-123 commented Aug 11, 2020

Minor performance issue.

By adding a custom __iter__ method to RangeIndex, we partly avoid needing to create/cache the expensive _data attribute and partly it's just faster to iterate over a range than a ndarray:

>>> idx = pd.RangeIndex(100_000)
>>> %%timeit
... for _ in idx:
...     pass
10.9 ms ± 74.7 µs per loop  # master
6.11 ms ± 48.8 µs per loop  # this PR
>>> "_data" in idx._cache
True  # master
False  # this PR

xref #35432, #26565.

@topper-123 topper-123 changed the title PERF: make RangeIndex.__iter__ iterate over ._range PERF: make RangeIndex iterate over ._range Aug 11, 2020
@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Aug 12, 2020
@jreback jreback added this to the 1.2 milestone Aug 12, 2020
@jreback
Copy link
Contributor

jreback commented Aug 12, 2020

cool, can you add an asv which hit this (or do we have enough coverage)?

@topper-123
Copy link
Contributor Author

Ok, i’ve added ASVs.

@jreback jreback merged commit e530066 into pandas-dev:master Aug 13, 2020
@jreback
Copy link
Contributor

jreback commented Aug 13, 2020

thanks @topper-123

@topper-123 topper-123 deleted the perf_RangeIndex.__iter__ branch August 13, 2020 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants