Skip to content

Conversation

@qwhelan
Copy link
Contributor

@qwhelan qwhelan commented Jul 14, 2019

MultiIndex.shape is currently extremely slow as it triggers the creation of ._values, which can be quite expensive for datetime levels. The one mitigating factor is that this result is cached and thus making ._values.shape near-instant on subsequent calls, but also hard to catch in asv benchmarks; this commit adds a suite dedicated to measuring such cached properties on Index objects.

asv results show a ~400,000x speedup for a relatively straightforward case:

       before           after         ratio
     [269d3681]       [d205acf6]
     <master>       <shape>   
-      3.52±0.02s       8.33±0.2μs     0.00  index_cached_properties.MultiIndexCached.time_shape
  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@WillAyd WillAyd added MultiIndex Performance Memory or execution speed performance labels Jul 14, 2019
@qwhelan
Copy link
Contributor Author

qwhelan commented Jul 18, 2019

Updated asv results show moving into Index benefits a few other classes significantly as well:

       before           after         ratio
     [a4c19e7a]       [3c946017]
     <unsorted_cats~1>       <shape>   
-     2.59±0.07μs      2.16±0.06μs     0.83  index_cached_properties.IndexCache.time_shape('PeriodIndex')
-     2.74±0.09μs       2.26±0.1μs     0.83  index_cached_properties.IndexCache.time_shape('DatetimeIndex')
-      5.06±0.2μs       3.57±0.2μs     0.70  index_cached_properties.IndexCache.time_shape('UInt64Index')
-      5.80±0.4μs       3.70±0.3μs     0.64  index_cached_properties.IndexCache.time_shape('Float64Index')
-      6.40±0.4μs       4.08±0.3μs     0.64  index_cached_properties.IndexCache.time_shape('TimedeltaIndex')
-      6.80±0.3μs       3.88±0.2μs     0.57  index_cached_properties.IndexCache.time_shape('IntervalIndex')
-        65.2±1μs         903±20ns     0.01  index_cached_properties.IndexCache.time_shape('Int64Index')
-      65.1±0.9μs         892±10ns     0.01  index_cached_properties.IndexCache.time_shape('RangeIndex')
-         214±2ms       4.45±0.2μs     0.00  index_cached_properties.IndexCache.time_shape('MultiIndex')

@TomAugspurger TomAugspurger added this to the 0.25.0 milestone Jul 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

MultiIndex Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants