ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels #10727

edponce · 2021-07-15T19:30:31Z

This PR adds floor, ceiling, and truncate scalar kernels. For all integral inputs, output is a 64-bit floating-point value.

github-actions · 2021-07-15T19:30:50Z

https://issues.apache.org/jira/browse/ARROW-12745

edponce · 2021-07-15T19:43:27Z

Food for thought: Tests fail for std::numeric_limits<Int64/Uint64>::min/max() due to an invalid range when a Scalar[Int64/Uint64] input is checked against a Scalar[Float64] output. Similarly, if output is Scalar[Float32] tests will fail for Int32/Uint32 cases. This error is triggered by the Arrow testing logic when casting integer-to-floating for comparing values in test assertions. This is normal behavior as not all integers have a floating-point representation.

An example test is:

auto min = std::numeric_limits<unsigned long long>::min();
auto max = std::numeric_limits<unsigned long long>::max();
this->AssertUnaryOp(floor, this->MakeScalar(min), *arrow::MakeScalar(float64(), min));
this->AssertUnaryOp(floor, this->MakeScalar(max), *arrow::MakeScalar(float64(), max));

and the error message is:

'_error_or_value11.status()' failed with Invalid: Integer value 18446744073709551615
  not in range: 0 to 9007199254740992

The meaning of these numbers is:

max(Uint64) = 18446744073709551615
max(2^53) = 9007199254740992  // mantissa of Float64 = 53

There are two alternatives to handle min/max tests:

Do not test min/max for cases that have integral inputs and floating-point outputs
Create a TYPED_TEST_SUITE that uses integral types of up to a width that is less than the mantissa of the floating-point output type

lidavidm

compute.rst needs to be updated as well.

lidavidm · 2021-07-15T21:09:00Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

This means we truncate towards negative infinity? (e.g. truncate(-1.1) = -2 since -1 > -1.1?)

Ah, not greater in magnitude according to cppreference. Maybe that should be clarified.

jorisvandenbossche · 2021-07-16T10:29:35Z

cpp/src/arrow/compute/api_scalar.cc

Suggested change

SCALAR_EAGER_UNARY(Ceiling, "ceiling")

SCALAR_EAGER_UNARY(Ceil, "ceil")

Is there a reason to have it "ceiling" instead of "ceil" (the C++ function is ceil, as well as numpy. SQL seems to have both)

I agree that ceil should be the name used to invoked the function. The only reason for using the long form for the internal variable names is to be somewhat consistent with other compute functions.

@jorisvandenbossche Compute function names (for both C++ API and CallFunction name) are inconsistent w.r.t. to short vs. long form. For example, Ceiling, Negate, Power, etc. I think this should be revisited in another JIRA issue and will require updating Python and R bindings.

I used the short form for compute function names in this PR, and opened this JIRA to revisit the names of compute functions.

jorisvandenbossche · 2021-07-16T10:31:59Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

Suggested change

"Calculate the greatest integer in magnitude less than or equal to the "

"argument element-wise",

"",

"Round down to the nearest integer",

"Calculate the greatest integer in magnitude less than or equal to the "

"argument element-wise",

(that's how you explained it in the api_scalar.h doc comments, which I find easier to understand as short summary of the function.

lidavidm

LGTM. One comment about the tests, one comment about the docs.

lidavidm · 2021-07-16T16:03:03Z

docs/source/cpp/compute.rst

This is worded a little confusingly in my opinion. If we're going to reference rounding strategy here, the notes column should describe the rounding behavior for each function (even if it's just the 'obvious' or 'expected' one).

Or alternatively, something like 'rounding functions find the nearest integer (as a floating-point value) to the argument based on a rounding strategy'.

I added the rounding functions section/text based on the soon-to-be-ready round and mround functions. Although, floor, ceil, and trunc do round to the nearest integer, round/mround do not necessarily. They have options to specify fractional precision and have options for various rounding strategies.

Ah, ok, sounds good.

lidavidm · 2021-07-16T16:07:41Z

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

This drops the tests for atan2?

Thanks for catching this, I had moved atan2 to the binary DispatchBest but apparently skipped it during a rebase/merge. I will add them back.

lidavidm

LGTM

lidavidm · 2021-07-16T19:26:20Z

Thanks @edponce!

github-actions bot added the Component: C++ label Jul 15, 2021

lidavidm reviewed Jul 15, 2021

View reviewed changes

edponce force-pushed the ARROW-12745-Compute-Add-floor-ceiling-and-truncate-k branch from 720e80f to 1df410e Compare July 16, 2021 06:13

edponce marked this pull request as ready for review July 16, 2021 06:14

jorisvandenbossche reviewed Jul 16, 2021

View reviewed changes

edponce force-pushed the ARROW-12745-Compute-Add-floor-ceiling-and-truncate-k branch from 1df410e to 3a657c4 Compare July 16, 2021 15:27

edponce requested review from jorisvandenbossche and lidavidm July 16, 2021 15:55

lidavidm approved these changes Jul 16, 2021

View reviewed changes

edponce added 9 commits July 16, 2021 13:26

add scalar API

7c8c83e

add impl floor, ceil, trunc kernels

feab8d2

extend combinations of parameter types for AssertUnaryOp

95e9cd6

add tests

fa5184a

add ceiling tests

81dd2c6

improve function doc

2dd16a3

add info to C++/Python docs

28b5fdb

rename functions to short form and update description

2b3a402

restore atan2 DispatchBest tests

c0b1041

edponce force-pushed the ARROW-12745-Compute-Add-floor-ceiling-and-truncate-k branch from 3a657c4 to c0b1041 Compare July 16, 2021 17:38

lidavidm approved these changes Jul 16, 2021

View reviewed changes

lidavidm closed this in 8ce0c01 Jul 16, 2021

asfimport mentioned this pull request Jul 20, 2021

[C++][Compute] Add floor, ceiling, and truncate kernels #28487

Closed

	SCALAR_EAGER_UNARY(Ceiling, "ceiling")
	SCALAR_EAGER_UNARY(Ceil, "ceil")

-    "Calculate the greatest integer in magnitude less than or equal to the "
-    "argument element-wise",
-    "",
+    "Round down to the nearest integer",
+    "Calculate the greatest integer in magnitude less than or equal to the "
+    "argument element-wise",

ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels #10727

ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels #10727

Uh oh!

Conversation

edponce commented Jul 15, 2021

Uh oh!

github-actions bot commented Jul 15, 2021

Uh oh!

edponce commented Jul 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lidavidm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edponce Jul 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lidavidm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lidavidm left a comment

Choose a reason for hiding this comment

Uh oh!

lidavidm commented Jul 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

edponce commented Jul 15, 2021 •

edited

Loading

edponce Jul 16, 2021 •

edited

Loading