-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
BENCH/REF: parametrize CSV benchmarks on engine #38442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Atm not sure what's going on with the benchmark failure - the failure is in |
|
#38476 should fix the unrelated benchmark failure. |
WillAyd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good small question
asv_bench/benchmarks/io/csv.py
Outdated
| fname = "__test__.csv" | ||
| params = [None, 10000] | ||
| param_names = ["skiprows"] | ||
| params = ([None, 10000], ["c"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be c and python or just c?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On current master we only have the default (I think that's c)
We have c and python in all the other benchmarks so it seems reasonable to add that here. If there's a reason we don't want that I'll revert
Thanks @mroeschke!! |
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Is there some logic in which classes are parameterized and which not?
|
|
||
| class ReadCSVCategorical(BaseIO): | ||
| fname = "__test__.csv" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could also parametrize this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Not that I know of beyond (AFAIK) that the python engine supports all these whereas the c/pyarrow engines only support a proper subset. I'll do a pass through the entire file and parametrize where both c and python engines are supported. |
|
CI green |
|
thanks @arw2019 |
black pandasgit diff upstream/master -u -- "*.py" | flake8 --diffThe diff in basic PR implementing the
pyarrow-based CSV engine (#38370) is quite big. A part of that PR is a small refactor of the CSV I/O benchmarks such that they takeengineas a parameter. Most of that refactor is not dependent on having the pyarrow engine so I'm spinning it off here to de-bloat #38370.