-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
PERF: custom ops for RangeIndex.[all|any|__contains__] #26617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: custom ops for RangeIndex.[all|any|__contains__] #26617
Conversation
pandas/core/indexes/range.py
Outdated
| def __contains__(self, key): | ||
| hash(key) | ||
| try: | ||
| key = com.ensure_python_int(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Python int is needed, else Python iterates over the range, making the operation very slow.
8bca43c to
cfd68e5
Compare
Codecov Report
@@ Coverage Diff @@
## master #26617 +/- ##
==========================================
- Coverage 91.85% 91.85% -0.01%
==========================================
Files 174 174
Lines 50722 50741 +19
==========================================
+ Hits 46593 46606 +13
- Misses 4129 4135 +6
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26617 +/- ##
==========================================
- Coverage 91.87% 90.41% -1.47%
==========================================
Files 174 174
Lines 50661 50668 +7
==========================================
- Hits 46547 45809 -738
- Misses 4114 4859 +745
Continue to review full report at Codecov.
|
pandas/core/common.py
Outdated
| return f | ||
|
|
||
|
|
||
| def ensure_python_int(value: Union[int, Any]) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put in dtypes.cast but i don’t think u need this as we already have ensure_platform_int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensure_platform_int returns a numpy int(32|64), so won't work here:
>>> rng = range(1, 1_000_000)
>>> %timeit 999_999 in rng
133 ns ± 2.46 ns per loop
>>> %timeit np.int64(999_999) in rng
306 ms ± 66.3 ms per loopThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok in any event move it to collocate with the ensure _ routines
4826efc to
3ca546e
Compare
pandas/core/dtypes/cast.py
Outdated
| raise ValueError("Trying to coerce float values to integers") | ||
|
|
||
|
|
||
| def ensure_python_int(value: Union[int, Any]) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave the wrong direction, can you put in core/dtypes/common (right below the other ensure_*)
and can you add a doc-string (I know its trivial)
the typing seems odd here, shouldn't this be Union[int, np.integer] ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure. In principle a float or a np.floating and possibly others could for some subvalues pass the test (e.g. if value is 1.0). But ok, passing a float would be a code smell, so just as well make the programmer see passing those as typing errors.
pandas/core/indexes/range.py
Outdated
| def __contains__(self, key): | ||
| hash(key) | ||
| try: | ||
| key = cast.ensure_python_int(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import directly
pandas/core/indexes/range.py
Outdated
| def all(self): | ||
| return 0 not in self._range | ||
|
|
||
| def any(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to wait for this until #26581 is merged, as mypy doesn't understand/see attributes defined in __new__ or _simple_new . If that one is merged, I can go through this again wrt. typing.
jreback
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments
34706ce to
fa4c649
Compare
|
Comments addressed. The typing issue must wait until #26581 is merged for mypy to see _range. |
3d52e46 to
949d452
Compare
|
can you add the issue number onto the one you already have in Performance section; merge master as well. ping on green. |
949d452 to
71ab9ed
Compare
71ab9ed to
4aea46d
Compare
git diff upstream/master -u -- "*.py" | flake8 --diffFollow-up to #26565. Make
RangeIndex._databe created in fewer cases.Performance examples (but the larger gain is from memory savings by not creating the
_dataarray):