Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Mar 13, 2025

Rationale for this change

Add a "winsorize" vector function as described here:
https://en.wikipedia.org/wiki/Winsorizing

and implemented in e.g. Scipy:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.winsorize.html

Also make the "quantile" function supported on decimal32/decimal64.

Are these changes tested?

Yes.

Are there any user-facing changes?

No, only a new compute function.

@pitrou pitrou changed the title GH-45755: [C++][Compute] Add winsorize function GH-45755: [C++][Python][Compute] Add winsorize function Mar 13, 2025
@pitrou pitrou force-pushed the winsorize-function branch 2 times, most recently from 167656f to 32e1126 Compare March 13, 2025 14:27
@pitrou pitrou marked this pull request as ready for review March 13, 2025 15:18
@pitrou pitrou requested a review from zanmato1984 March 13, 2025 15:51
Copy link
Contributor

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Mar 14, 2025
@pitrou
Copy link
Member Author

pitrou commented Mar 14, 2025

@pitrou
Copy link
Member Author

pitrou commented Mar 17, 2025

Ok, I've added the docs now. Do you want to take another look?

@pitrou pitrou requested a review from icexelloss March 17, 2025 13:32
@pitrou
Copy link
Member Author

pitrou commented Mar 17, 2025

@github-actions crossbow submit -g cpp

@github-actions

This comment was marked as outdated.

@pitrou
Copy link
Member Author

pitrou commented Mar 17, 2025

@github-actions crossbow submit -g cpp

@github-actions

This comment was marked as outdated.

Copy link
Contributor

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with two nits.

@pitrou pitrou force-pushed the winsorize-function branch from be3f7aa to b203c3d Compare March 20, 2025 10:03
QuantileOptions::NEAREST);
ARROW_ASSIGN_OR_RAISE(
auto quantile,
CallFunction("quantile", {input}, &quantile_options, ctx->exec_context()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou Do you think There is any benefit to resolve and use the quantile kernel here directly as supposed to use CallFunction?

I suppose it is easier this way (using CallFunction), but I wonder, in general, when writing a kernel that uses other kernel/functions, whether it is better to use CallFunction or resolve it kernel and use kernel->Exec

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolving the kernel could perhaps save some nanoseconds, but I'm not sure that's significant compared to the other costs.

Copy link
Contributor

@icexelloss icexelloss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a question. Otherwise LGTM.

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels Mar 20, 2025
@pitrou pitrou force-pushed the winsorize-function branch from b203c3d to 6750680 Compare March 24, 2025 14:23
@pitrou
Copy link
Member Author

pitrou commented Mar 24, 2025

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: 6750680

Submitted crossbow builds: ursacomputing/crossbow @ actions-cd646b53f4

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-meson GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@pitrou pitrou merged commit 026a933 into apache:main Mar 24, 2025
37 checks passed
@pitrou pitrou removed the awaiting merge Awaiting merge label Mar 24, 2025
@pitrou pitrou deleted the winsorize-function branch March 24, 2025 15:26
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 026a933.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Apr 15, 2025
…#45763)

### Rationale for this change

Add a "winsorize" vector function as described here:
https://en.wikipedia.org/wiki/Winsorizing

and implemented in e.g. Scipy:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.winsorize.html

Also make the "quantile" function supported on decimal32/decimal64.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No, only a new compute function.

* GitHub Issue: apache#45755

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants