Skip to content

Conversation

@debnathshoham
Copy link
Contributor

@debnathshoham debnathshoham commented Aug 28, 2021

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

Added a check as suggested in #39280 (comment)

In [1]: from asv_bench.benchmarks.indexing import InsertColumns

In [2]: self = InsertColumns()

In [3]: self.setup()

In [4]: %timeit self.time_assign_with_setitem()
27.6 ms ± 68.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) #master

In [4]: %timeit self.time_assign_with_setitem()
8.6 ms ± 6.43 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) #PR

Edit:

       before           after         ratio
     [2b3a9b16]       [fa9c7140]
     <master>         <indexing.InsertColumns.time_assign_with_setitem>
-         179±4μs        163±0.6μs     0.91  indexing.NumericSeriesIndexing.time_iloc_array(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'unique_monotonic_inc')
-        838±50μs          759±7μs     0.90  indexing.NonNumericSeriesIndexing.time_getitem_list_like('datetime', 'nonunique_monotonic_inc')
-        37.7±4μs       34.0±0.5μs     0.90  indexing.NumericSeriesIndexing.time_getitem_slice(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
-     2.91±0.09ms       2.61±0.1ms     0.90  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'unique_monotonic_inc')
-        60.1±2μs       54.0±0.5μs     0.90  indexing.NumericSeriesIndexing.time_iloc_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-     1.56±0.03ms      1.40±0.04ms     0.90  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
-        183±20μs          164±1μs     0.89  indexing.NumericSeriesIndexing.time_iloc_array(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-     2.39±0.05ms         2.14±0ms     0.89  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
-        879±30μs         780±50μs     0.89  indexing.CategoricalIndexIndexing.time_get_indexer_list('monotonic_incr')
-        184±20μs          163±1μs     0.88  indexing.NumericSeriesIndexing.time_iloc_array(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
-         131±7μs          115±4μs     0.87  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-     3.02±0.03ms      2.63±0.09ms     0.87  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-     3.19±0.04ms      2.78±0.04ms     0.87  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      11.7±0.2ms       10.2±0.2ms     0.87  indexing.NumericSeriesIndexing.time_loc_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-     1.89±0.04ms         1.65±0ms     0.87  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-      4.63±0.1ms      3.99±0.02ms     0.86  indexing.NumericSeriesIndexing.time_getitem_array(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-     1.82±0.04ms      1.56±0.01ms     0.86  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
-      3.69±0.1ms       3.14±0.1ms     0.85  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-     4.08±0.06ms      3.43±0.02ms     0.84  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-        150±30μs          120±3μs     0.80  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'unique_monotonic_inc')
-        961±30μs         762±20μs     0.79  indexing.CategoricalIndexIndexing.time_get_indexer_list('monotonic_decr')
-        295±30μs          231±3μs     0.78  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
-      1.11±0.3ms         825±20μs     0.74  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.UInt8Engine'>, <class 'numpy.uint8'>), 'monotonic_incr')
-        741±80μs          288±9μs     0.39  indexing.AssignTimeseriesIndex.time_frame_assign_timeseries_index
-        49.3±2ms       18.2±0.2ms     0.37  indexing.InsertColumns.time_assign_with_setitem
-       54.3±20ms         18.5±1ms     0.34  indexing.InsertColumns.time_assign_list_like_with_setitem

@debnathshoham debnathshoham changed the title PERF: indexing.InsertColumns.time_assign_with_setitem PERF: indexing Aug 28, 2021
@jbrockmendel
Copy link
Member

nice cc @phofl

@jreback
Copy link
Contributor

jreback commented Aug 28, 2021

can u run the indexing asvs and report here

@phofl
Copy link
Member

phofl commented Aug 28, 2021

Really nice, good catch.

@jreback
Copy link
Contributor

jreback commented Aug 28, 2021

also pls add a whatsnew note for 1.4

@mzeitlin11
Copy link
Member

Could also go in 1.3.3, looks like this helps top regression here: https://simonjayhawkins.github.io/fantastic-dollop/#regressions?sort=3&dir=desc&branch=1.3.x

@phofl
Copy link
Member

phofl commented Aug 28, 2021

Yeah good point, this check was added in 1.3

@debnathshoham
Copy link
Contributor Author

should I put the whatsnew in 1.3.3 fixed regression or under others?
@jreback - I have added the indexing asvs report in the top comment

@jreback
Copy link
Contributor

jreback commented Aug 28, 2021

great yeah 1.3.3 is fine (add a perf section)


Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Performance improvement for :meth:`DataFrame.__setitem__`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify in which cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @phofl - updated. Let me know if this is fine.

@simonjayhawkins simonjayhawkins added this to the 1.3.3 milestone Aug 29, 2021
@simonjayhawkins simonjayhawkins added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Aug 29, 2021
@simonjayhawkins
Copy link
Member

simonjayhawkins commented Aug 29, 2021

Could also go in 1.3.3, looks like this helps top regression here: https://simonjayhawkins.github.io/fantastic-dollop/#regressions?sort=3&dir=desc&branch=1.3.x

side note: class InsertColumns was updated in #42998. Changing the code or setup code invalidates previous results for that benchmark. So that regression is not currently reported.

@simonjayhawkins
Copy link
Member

side note: class InsertColumns was updated in #42998. Changing the code or setup code invalidates previous results for that benchmark. So that regression is not currently reported.

benchmarks have been re-run and gh-pages updated.


Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Performance improvement for :meth:`DataFrame.__setitem__` with dataframes containing unique columns (:issue:`43274`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not specific enough. If key is any list-like or frame we don't hit this at all. Key has to be a scalar I think and value is not a df

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. let me know how this looks


Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Performance improvement for :meth:`DataFrame.__setitem__` when the key or value is not a dataframe, or key is not list-like (:issue:`43274`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:class:DataFrame, otherwise lgtm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @phofl merge when ready.

~~~~~~~~~~~~~~~~~~~~~~~~
- Performance improvement for :meth:`DataFrame.__setitem__` when the key or value is not a :class:`DataFrame`, or key is not list-like (:issue:`43274`)
-
-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a blank line, docs aren't building like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah! I see. I was confused why the docs were consistently failing.

@phofl phofl merged commit 1009a7d into pandas-dev:master Aug 30, 2021
@phofl
Copy link
Member

phofl commented Aug 30, 2021

thx @debnathshoham

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Aug 30, 2021
@debnathshoham debnathshoham deleted the indexing.InsertColumns.time_assign_with_setitem branch August 30, 2021 15:49
phofl pushed a commit that referenced this pull request Aug 30, 2021
Co-authored-by: Shoham Debnath <debnathshoham@gmail.com>
feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021
* PERF: indexing.InsertColumns.time_assign_with_setitem

* added whatsnew

* more descriptive whatsnew

* rectified whatsnew

* added PR ref in whatsnew

* clear whatsnew

* suggested change

* added blank line
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants