Skip to content

Conversation

@jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Jan 11, 2020

Small simplification gives a small speedup

In [2]: rng = pd.Index(range(10**5)) 

In [3]: %timeit rng.get_loc(5.0)                                                
6.38 µs ± 381 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  <-- master
1.15 µs ± 40.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  <-- PR

In [5]: %timeit rng.get_loc(5)                                                  
845 ns ± 5.71 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  <--master
In [5]: %timeit rng.get_loc(5)                                                                                                                                                                            
860 ns ± 16.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  <-- PR

In [9]: def foo(): 
   ...:     try: 
   ...:         return rng.get_loc(None) 
   ...:     except KeyError: 
   ...:         pass 
   ...:                                                                         

In [10]: %timeit foo()                                                          
6.72 µs ± 656 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  <-- master
1.11 µs ± 94 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  <-- PR

@jbrockmendel jbrockmendel added the Indexing Related to indexing on series/frames, not to indexes themselves label Jan 11, 2020
except ValueError:
raise KeyError(key)
if method is None and tolerance is None:
if is_integer(key) or (is_float(key) and key.is_integer()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the extra condition here imply any behavioral changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, intlike floats go through here now instead of going through the _maybe_cast_indexer path in Index.get_loc.

@jorisvandenbossche
Copy link
Member

How does this change affect __contains__ ?

@jbrockmendel
Copy link
Member Author

How does this change affect contains ?

Typo, will update title

@jbrockmendel jbrockmendel changed the title PERF: RangeIndex.__contains__ PERF: RangeIndex.get_loc Jan 13, 2020
@jorisvandenbossche
Copy link
Member

The timings you showed were then also not relevant?

@jbrockmendel
Copy link
Member Author

The timings you showed were then also not relevant?

Huh, not sure how that happened. Will update with get_loc timings, which make a much bigger difference:

In [2]: rng = pd.Index(range(10**5)) 

In [3]: %timeit rng.get_loc(5.0)                                                
6.38 µs ± 381 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  <-- master
1.15 µs ± 40.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  <-- PR

In [5]: %timeit rng.get_loc(5)                                                  
845 ns ± 5.71 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  <--master
In [5]: %timeit rng.get_loc(5)                                                                                                                                                                            
860 ns ± 16.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  <-- PR

In [9]: def foo(): 
   ...:     try: 
   ...:         return rng.get_loc(None) 
   ...:     except KeyError: 
   ...:         pass 
   ...:                                                                         

In [10]: %timeit foo()                                                          
6.72 µs ± 656 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  <-- master
1.11 µs ± 94 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  <-- PR

@jorisvandenbossche
Copy link
Member

That looks better :)

@jorisvandenbossche jorisvandenbossche added this to the 1.1 milestone Jan 13, 2020
@jreback jreback merged commit 030a35c into pandas-dev:master Jan 15, 2020
@jreback
Copy link
Contributor

jreback commented Jan 15, 2020

thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the collect5 branch January 15, 2020 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Indexing Related to indexing on series/frames, not to indexes themselves

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants