Skip to content

Conversation

@jbrockmendel
Copy link
Member

Following #21582, the biggest avoidable overheads in Period comparisons are in a) isinstance calls that can be short-circuited and b) __ne__ call overhead. This PR moves __eq__ and __ne__ to the cython file, removing the __ne__ overhead, and changes an isinstance check to a try/except.

Using the same profile code from #21582, this brings the set_index runtime from 6.603 seconds down to 2.165 seconds.

@codecov
Copy link

codecov bot commented Jun 23, 2018

Codecov Report

Merging #21606 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21606      +/-   ##
==========================================
- Coverage    91.9%   91.89%   -0.01%     
==========================================
  Files         154      154              
  Lines       49562    49543      -19     
==========================================
- Hits        45549    45530      -19     
  Misses       4013     4013
Flag Coverage Δ
#multiple 90.29% <100%> (-0.01%) ⬇️
#single 41.84% <100%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/offsets.py 97.11% <100%> (-0.05%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36422a8...92e6fdd. Read the comment docs.

@jreback jreback added Performance Memory or execution speed performance Period Period data type labels Jun 25, 2018
@jreback jreback added this to the 0.24.0 milestone Jun 25, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this issue to your existing perf issue (always do that on follow ons)


def __eq__(self, other):
if is_string_object(other):
from pandas.tseries.frequencies import to_offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you want to define a function to return this import, we use this in a number of places?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this just be trading one one-liner for another?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no then then its a function which we can import at the top (and of course if we eventually are able to actually move to_offset to cython its a big win). but its 'cleaner' I think. (certainly can followup on this in new PR). I just don't really like the imports hidden in the functions (as we must do currently), and rather centrailize in a single place.


@property
def _params(self):
from pandas.errors import AbstractMethodError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you import this at the top?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a minor comment. ping on green.


def __eq__(self, other):
if is_string_object(other):
from pandas.tseries.frequencies import to_offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no then then its a function which we can import at the top (and of course if we eventually are able to actually move to_offset to cython its a big win). but its 'cleaner' I think. (certainly can followup on this in new PR). I just don't really like the imports hidden in the functions (as we must do currently), and rather centrailize in a single place.

Since 0 is a bit weird, we suggest avoiding its use.
"""
_params = cache_readonly(BaseOffset._params.fget)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove this blank line


@property # NB: non-cython subclasses override with cache_readonly
def _params(self):
all_paras = self.__dict__.copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give an expl of what this is doing (and you can move the comment inside the function too)

@jreback jreback merged commit 838e8da into pandas-dev:master Jun 26, 2018
@jreback
Copy link
Contributor

jreback commented Jun 26, 2018

thanks @jbrockmendel !

@jbrockmendel jbrockmendel deleted the pcache2 branch July 1, 2018 01:27
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance Period Period data type

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants