-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-45755: [C++][Python][Compute] Add winsorize function #45763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
167656f to
32e1126
Compare
zanmato1984
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some first glance questions.
Plus, no update to doc? https://github.com/apache/arrow/blob/ab4263476d9d5078cd0fa2cce6b922eb7d90c0af/docs/source/cpp/compute.rst
Oops, I had entirely forgotten. |
32e1126 to
2f89fc9
Compare
|
Ok, I've added the docs now. Do you want to take another look? |
|
@github-actions crossbow submit -g cpp |
This comment was marked as outdated.
This comment was marked as outdated.
|
@github-actions crossbow submit -g cpp |
This comment was marked as outdated.
This comment was marked as outdated.
zanmato1984
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with two nits.
be3f7aa to
b203c3d
Compare
| QuantileOptions::NEAREST); | ||
| ARROW_ASSIGN_OR_RAISE( | ||
| auto quantile, | ||
| CallFunction("quantile", {input}, &quantile_options, ctx->exec_context())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou Do you think There is any benefit to resolve and use the quantile kernel here directly as supposed to use CallFunction?
I suppose it is easier this way (using CallFunction), but I wonder, in general, when writing a kernel that uses other kernel/functions, whether it is better to use CallFunction or resolve it kernel and use kernel->Exec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolving the kernel could perhaps save some nanoseconds, but I'm not sure that's significant compared to the other costs.
icexelloss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a question. Otherwise LGTM.
b203c3d to
6750680
Compare
|
@github-actions crossbow submit -g cpp |
|
Revision: 6750680 Submitted crossbow builds: ursacomputing/crossbow @ actions-cd646b53f4 |
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 026a933. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
…#45763) ### Rationale for this change Add a "winsorize" vector function as described here: https://en.wikipedia.org/wiki/Winsorizing and implemented in e.g. Scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.winsorize.html Also make the "quantile" function supported on decimal32/decimal64. ### Are these changes tested? Yes. ### Are there any user-facing changes? No, only a new compute function. * GitHub Issue: apache#45755 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
Add a "winsorize" vector function as described here:
https://en.wikipedia.org/wiki/Winsorizing
and implemented in e.g. Scipy:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.winsorize.html
Also make the "quantile" function supported on decimal32/decimal64.
Are these changes tested?
Yes.
Are there any user-facing changes?
No, only a new compute function.