ARROW-12744: [C++][Compute] Add rounding kernel #10349

edponce · 2021-05-17T19:12:29Z

This PR adds rounding compute functions, namely "round" and "round_to_multiple".

round(x, RoundOptions(ndigits, round_mode)) - round x to the precision indicated by ndigits
round_to_multiple(x, RoundToMultipleOptions(multiple, round_mode)) - round x to scale of multiple

Rounding modes supported are: DOWN, UP, TOWARDS_ZERO, TOWARDS_INFINITY, HALF_DOWN, HALF_UP, HALF_TOWARDS_ZERO, HALF_TOWARDS_INFINITY, HALF_TO_EVEN, HALF_TO_ODD.
By default tie-breaking modes round to the nearest integer and resolve ties with HALF_TO_EVEN.

The rounding functions expect floating-point inputs and return output of the same type. Integral inputs are implicitly type-casted and output is float64.

github-actions · 2021-05-17T19:16:30Z

https://issues.apache.org/jira/browse/ARROW-12744

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

cpp/src/arrow/compute/api_scalar.h

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

edponce · 2021-06-25T11:06:07Z

@bkietz @jorisvandenbossche Need feedback on this PR. Specifically, the rounding options provided and kernel implementations.

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

docs/source/cpp/compute.rst

cpp/src/arrow/compute/api_scalar.h

docs/source/cpp/compute.rst

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

edponce · 2021-07-21T20:08:50Z

There are 2 round functions (Round and MRound) and both use different Options but make use of the same enum RoundMode, therefore I defined enum RoundMode in global space of api_scalar.h. Based on the recent FunctionOptions changes, I added EnumTraits<RoundMode> to api_scalar.cc along with necessary type and registration code.

For tests, I wanted to use the values() method of EnumTraits to be able to iterate through the enum values, but I am not sure on how to invoke the EnumTraits<RoundMode> since it is not exposed in a header file. My solution was to create an array (kRoundModes) with the enum values in the global space of tests.

Also, I could not find a way to create the generator dispatchers without explicitly using the enum RoundMode values as template parameters (I do not think we can do this in C++11 because the value depends on the ty loop variable).

@lidavidm @bkietz Any comments or suggestions would be gladly appreciated.

cpp/src/arrow/compute/api_scalar.h

docs/source/python/api/compute.rst

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

lidavidm · 2021-07-21T20:59:16Z

There seem to be some Windows-specific test failures :/

cpp/src/arrow/compute/api_scalar.h

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

lidavidm

Thanks, this looks good to me. I left two small comments.

cpp/src/arrow/compute/api_scalar.h

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

cpp/src/arrow/compute/api_scalar.h

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

lidavidm

Thanks, this LGTM.

python/pyarrow/tests/test_compute.py

cpp/src/arrow/compute/api_scalar.h

pitrou · 2021-09-07T14:51:06Z

Should you undraft it or is it still WIP?

edponce · 2021-09-07T14:54:42Z

It is basically complete and undrafted. There are a few minor comments I made w.r.t. to doubts that I have.

pitrou

Nice! A bunch of comments below.

cpp/src/arrow/compute/api_scalar.h

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

python/pyarrow/tests/test_compute.py

python/pyarrow/_compute.pyx

docs/source/cpp/compute.rst

cpp/src/arrow/compute/api_scalar.h

cpp/src/arrow/python/python_test.cc

cpp/src/arrow/compute/api_scalar.cc

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

edponce · 2021-09-09T08:55:32Z

Ready for review cc @pitrou

rok · 2021-09-09T10:12:32Z

Looks great!
Would we consider pandas-like time unit rounding once we have this in?

edponce · 2021-09-09T10:41:30Z

@rok It is a possibility, if we have the semantics pinned down w.r.t. how to shift specific timestamps (forward, backward, delta). This would be a new set of compute functions: "round_time" and "round_time_to_multiple", where the latter could be a quaternary/varargs function to support multiples for hour, min, sec, ms/ns.

rok · 2021-09-09T11:15:38Z

@rok It is a possibility, if we have the semantics pinned down w.r.t. how to shift specific timestamps (forward, backward, delta).

Indeed that would probably need some work.

Does this currently support arbitrary rounding to multiple on timestamps? If yes it might be good to limit it to timezoneless and UTC timestamps to avoid ambiguous and nonexistent timestamp issues.

lidavidm

Looks fine to me. One minor comment.

lidavidm · 2021-09-09T12:51:56Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

Should this just return OptionsWrapper::Init like below?

No, it is RoundOptionsWrapper so that its constructor is invoked via make_unique, which initializes pow10 data member. If we use OptionsWrapper then pow10 is not available. I tried invoking OptionsWrapper::Init but it returns a std::unique_ptr which would require "casting" to RoundOptionsWrapper first and then to KernelState to match return type. The unique_ptr casting caused too many issues so I reverted to mimic the OptionsWrapper::Init method.

The way I used these OptionsWrapper classes:

considers Init for options validation because Init can return a Status::Invalid

constructor for initializing non-options state, which can then be accessed in kernels' Call via ctx->state()

I templetized Init method of KernelState derived classes so that derived constructors can be invoked without the need to duplicate the Init definition. This makes KernelState be fully functional to support options validation and extending kernel state via constructor which can most likely benefit other scalar kernels as well.

For example by extending, OptionsWrapper as follows:

template <typename OptionsType> struct OptionsWrapper : public KernelState { template <typename KernelStateType = OptionsWrapper> static Result<std::unique_ptr<KernelState>> Init(KernelContext* ctx, const KernelInitArgs& args) { if (auto options = static_cast<const OptionsType*>(args.options)) { return ::arrow::internal::make_unique<KernelStateType>(*options); } ... } ... };

now we can extend custom states as follows:

struct RoundOptionsWrapper<RoundOptions> : public OptionsWrapper<RoundOptions> { using OptionsType = RoundOptions; using State = RoundOptionsWrapper<OptionsType>; double pow10; explicit RoundOptionsWrapper(OptionsType options) : OptionsWrapper(std::move(options)) { pow10 = RoundUtil::Pow10(std::abs(options.ndigits)); } static Result<std::unique_ptr<KernelState>> Init(KernelContext* ctx, const KernelInitArgs& args) { return OptionsWrapper<OptionsType>::Init<State>(ctx, args); } };

Ok, I tried the template variant of OptionsWrapper and although it made the code cleaner, it failed to compiled for some systems, so reverted to duplicating the OptionsWrapper::Init definition. I think there needs to be a refactoring of KernelState and related parts to support validating kernel options and extending kernel state in a simpler manner. There are different patterns being used in the code to fulfill these. But this is a separate issue from this PR.

cpp/src/arrow/compute/api_scalar.h

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

pitrou · 2021-09-09T13:32:33Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

Just for the record, it feels a bit weird to call GenerateArithmeticRound at kernel execution time. That said, a quick benchmarking in Python shows there doesn't seem to be any large overhead:

>>> import pyarrow as pa, pyarrow.compute as pc >>> floor = pc.get_function("floor") >>> round = pc.get_function("round") >>> arr = pa.array([None], type=pa.float64()) >>> %timeit floor.call([arr]) 2.57 µs ± 10.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) >>> %timeit floor.call([arr]) 2.58 µs ± 11.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) >>> %timeit round.call([arr]) 2.65 µs ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) >>> %timeit round.call([arr]) 2.53 µs ± 12.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

I agree. The purpose of doing this here is to generate dispatchers once prior to kernel invocation. Previously, I tried 2 solutions:

Have a vector of precompute GenerateArithmeticFloatingPoint and use the options.round_mode to index this vector. But this required vector to be ordered identically to the round modes in enum RoundMode.

Similar to above but using an unordered_map indexed by options.round_mode. This requires adding hash support for RoundMode data type. This approach does not imposes a full constraint on the ordering of RoundMode.

Which one do you think is best?

Actually, it is doing the same thing as the other kernels. Other kernels pass exec (an ArrayKernelExec) to the AddKernel method without invoking it. exec is invoked during kernel dispatching because it requires KernelContext, ExecBatch, and Datum parameters.

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

edponce · 2021-09-09T15:58:17Z

@rok This PR only supports rounding for basic arithmetic data types (unsigned/signed int and floating-point).

lidavidm · 2021-09-09T18:08:25Z

@edponce looks like you need to rebase again here as well.

edponce · 2021-09-10T13:45:44Z

This PR is ready for a (hopefully) final review. cc @lidavidm @pitrou

edponce · 2021-09-10T16:44:09Z

Are there any additional comments/reviews? cc @pitrou @bkietz @jorisvandenbossche

pitrou

Thanks for the update! I'll push a few minor changes and will merge.

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

This PR adds rounding compute functions, namely "round" and "round_to_multiple". * `round(x, RoundOptions(ndigits, round_mode))` - round `x` to the precision indicated by `ndigits` * `round_to_multiple(x, RoundToMultipleOptions(multiple, round_mode))` - round `x` to scale of `multiple` Rounding modes supported are: DOWN, UP, TOWARDS_ZERO, TOWARDS_INFINITY, HALF_DOWN, HALF_UP, HALF_TOWARDS_ZERO, HALF_TOWARDS_INFINITY, HALF_TO_EVEN, HALF_TO_ODD. By default tie-breaking modes round to the nearest integer and resolve ties with HALF_TO_EVEN. The rounding functions expect floating-point inputs and return output of the same type. Integral inputs are implicitly type-casted and output is float64. Closes apache#10349 from edponce/ARROW-12744-Add-rounding-kernel Authored-by: Eduardo Ponce <edponce00@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

github-actions bot added the Component: C++ label May 17, 2021

edponce marked this pull request as ready for review May 17, 2021 19:38

bkietz requested changes May 17, 2021

View reviewed changes

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc Outdated Show resolved Hide resolved

cpp/src/arrow/compute/api_scalar.h Outdated Show resolved Hide resolved

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc Outdated Show resolved Hide resolved

edponce marked this pull request as draft May 17, 2021 20:26

edponce force-pushed the ARROW-12744-Add-rounding-kernel branch from 6a6e01e to 49f232b Compare June 25, 2021 10:48

edponce commented Jun 25, 2021

View reviewed changes

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc Outdated Show resolved Hide resolved

edponce marked this pull request as ready for review June 28, 2021 15:03

edponce requested a review from bkietz June 28, 2021 15:05

bkietz requested changes Jun 28, 2021

View reviewed changes

edponce mentioned this pull request Jul 16, 2021

ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels #10727

Closed

edponce force-pushed the ARROW-12744-Add-rounding-kernel branch 2 times, most recently from 402ab23 to 0240f37 Compare July 21, 2021 07:11

edponce force-pushed the ARROW-12744-Add-rounding-kernel branch from e7f5d07 to bf4d80d Compare July 21, 2021 20:28

lidavidm reviewed Jul 21, 2021

View reviewed changes

lidavidm reviewed Jul 22, 2021

View reviewed changes

cpp/src/arrow/compute/api_scalar.h Outdated Show resolved Hide resolved

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc Outdated Show resolved Hide resolved

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc Outdated Show resolved Hide resolved

edponce force-pushed the ARROW-12744-Add-rounding-kernel branch from d58bc48 to 8c3943c Compare August 19, 2021 03:44

lidavidm approved these changes Aug 19, 2021

View reviewed changes

cpp/src/arrow/compute/api_scalar.h Outdated Show resolved Hide resolved

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc Outdated Show resolved Hide resolved

edponce requested review from bkietz, jorisvandenbossche and lidavidm August 19, 2021 17:54

lidavidm approved these changes Aug 19, 2021

View reviewed changes

cpp/src/arrow/compute/api_scalar.h Outdated Show resolved Hide resolved

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc Outdated Show resolved Hide resolved

lidavidm approved these changes Aug 19, 2021

View reviewed changes

github-actions bot added the Component: Python label Aug 20, 2021

lidavidm approved these changes Aug 23, 2021

View reviewed changes

python/pyarrow/tests/test_compute.py Outdated Show resolved Hide resolved

python/pyarrow/tests/test_compute.py Outdated Show resolved Hide resolved

edponce force-pushed the ARROW-12744-Add-rounding-kernel branch from 7ef8e65 to dfe6de2 Compare August 24, 2021 06:23

lidavidm reviewed Aug 24, 2021

View reviewed changes

cpp/src/arrow/compute/api_scalar.h Outdated Show resolved Hide resolved

edponce requested a review from lidavidm September 7, 2021 14:56

edponce force-pushed the ARROW-12744-Add-rounding-kernel branch from 14d6244 to 3e82fd0 Compare September 7, 2021 15:33

pitrou requested changes Sep 7, 2021

View reviewed changes

lidavidm reviewed Sep 7, 2021

View reviewed changes

cpp/src/arrow/compute/api_scalar.h Outdated Show resolved Hide resolved

cpp/src/arrow/python/python_test.cc Outdated Show resolved Hide resolved

edponce force-pushed the ARROW-12744-Add-rounding-kernel branch from 3e82fd0 to 14382ff Compare September 8, 2021 04:23

lidavidm reviewed Sep 8, 2021

View reviewed changes

cpp/src/arrow/compute/api_scalar.cc Outdated Show resolved Hide resolved

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc Outdated Show resolved Hide resolved

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc Outdated Show resolved Hide resolved

edponce force-pushed the ARROW-12744-Add-rounding-kernel branch from 14382ff to dd001b9 Compare September 9, 2021 00:57

lidavidm reviewed Sep 9, 2021

View reviewed changes

pitrou reviewed Sep 9, 2021

View reviewed changes

edponce force-pushed the ARROW-12744-Add-rounding-kernel branch 2 times, most recently from d6b909d to 65cb707 Compare September 10, 2021 02:32

lidavidm approved these changes Sep 10, 2021

View reviewed changes

edponce force-pushed the ARROW-12744-Add-rounding-kernel branch from 0496356 to 302c5f1 Compare September 10, 2021 15:49

pitrou approved these changes Sep 13, 2021

View reviewed changes

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc Outdated Show resolved Hide resolved

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc Outdated Show resolved Hide resolved

pitrou force-pushed the ARROW-12744-Add-rounding-kernel branch from 302c5f1 to f91a1a5 Compare September 13, 2021 16:29

ARROW-12744: [C++] Add rounding kernels

ae740e3

pitrou force-pushed the ARROW-12744-Add-rounding-kernel branch from f91a1a5 to ae740e3 Compare September 13, 2021 16:35

pitrou closed this in 376cb45 Sep 13, 2021

asfimport mentioned this pull request Sep 17, 2021

[C++][Compute] Add rounding kernel #28486

Closed

ARROW-12744: [C++][Compute] Add rounding kernel #10349

ARROW-12744: [C++][Compute] Add rounding kernel #10349

Uh oh!

Conversation

edponce commented May 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 17, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

edponce commented Jun 25, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

edponce commented Jul 21, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lidavidm commented Jul 21, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lidavidm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lidavidm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pitrou commented Sep 7, 2021

Uh oh!

edponce commented Sep 7, 2021

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

edponce commented Sep 9, 2021

Uh oh!

rok commented Sep 9, 2021

Uh oh!

edponce commented Sep 9, 2021

Uh oh!

rok commented Sep 9, 2021

edponce commented May 17, 2021 •

edited

Loading

edponce Sep 9, 2021 •

edited

Loading