Skip to content

Conversation

@edponce
Copy link
Contributor

@edponce edponce commented Jul 15, 2021

This PR adds floor, ceiling, and truncate scalar kernels. For all integral inputs, output is a 64-bit floating-point value.

@github-actions
Copy link

@edponce
Copy link
Contributor Author

edponce commented Jul 15, 2021

Food for thought: Tests fail for std::numeric_limits<Int64/Uint64>::min/max() due to an invalid range when a Scalar[Int64/Uint64] input is checked against a Scalar[Float64] output. Similarly, if output is Scalar[Float32] tests will fail for Int32/Uint32 cases. This error is triggered by the Arrow testing logic when casting integer-to-floating for comparing values in test assertions. This is normal behavior as not all integers have a floating-point representation.

An example test is:

auto min = std::numeric_limits<unsigned long long>::min();
auto max = std::numeric_limits<unsigned long long>::max();
this->AssertUnaryOp(floor, this->MakeScalar(min), *arrow::MakeScalar(float64(), min));
this->AssertUnaryOp(floor, this->MakeScalar(max), *arrow::MakeScalar(float64(), max));

and the error message is:

'_error_or_value11.status()' failed with Invalid: Integer value 18446744073709551615
  not in range: 0 to 9007199254740992

The meaning of these numbers is:

max(Uint64) = 18446744073709551615
max(2^53) = 9007199254740992  // mantissa of Float64 = 53

There are two alternatives to handle min/max tests:

  1. Do not test min/max for cases that have integral inputs and floating-point outputs
  2. Create a TYPED_TEST_SUITE that uses integral types of up to a width that is less than the mantissa of the floating-point output type

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute.rst needs to be updated as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we truncate towards negative infinity? (e.g. truncate(-1.1) = -2 since -1 > -1.1?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, not greater in magnitude according to cppreference. Maybe that should be clarified.

@edponce edponce force-pushed the ARROW-12745-Compute-Add-floor-ceiling-and-truncate-k branch from 720e80f to 1df410e Compare July 16, 2021 06:13
@edponce edponce marked this pull request as ready for review July 16, 2021 06:14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SCALAR_EAGER_UNARY(Ceiling, "ceiling")
SCALAR_EAGER_UNARY(Ceil, "ceil")

Is there a reason to have it "ceiling" instead of "ceil" (the C++ function is ceil, as well as numpy. SQL seems to have both)

Copy link
Contributor Author

@edponce edponce Jul 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that ceil should be the name used to invoked the function. The only reason for using the long form for the internal variable names is to be somewhat consistent with other compute functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche Compute function names (for both C++ API and CallFunction name) are inconsistent w.r.t. to short vs. long form. For example, Ceiling, Negate, Power, etc. I think this should be revisited in another JIRA issue and will require updating Python and R bindings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the short form for compute function names in this PR, and opened this JIRA to revisit the names of compute functions.

Comment on lines 1593 to 1595
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Calculate the greatest integer in magnitude less than or equal to the "
"argument element-wise",
"",
"Round down to the nearest integer",
"Calculate the greatest integer in magnitude less than or equal to the "
"argument element-wise",

(that's how you explained it in the api_scalar.h doc comments, which I find easier to understand as short summary of the function.

@edponce edponce force-pushed the ARROW-12745-Compute-Add-floor-ceiling-and-truncate-k branch from 1df410e to 3a657c4 Compare July 16, 2021 15:27
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One comment about the tests, one comment about the docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worded a little confusingly in my opinion. If we're going to reference rounding strategy here, the notes column should describe the rounding behavior for each function (even if it's just the 'obvious' or 'expected' one).

Or alternatively, something like 'rounding functions find the nearest integer (as a floating-point value) to the argument based on a rounding strategy'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the rounding functions section/text based on the soon-to-be-ready round and mround functions. Although, floor, ceil, and trunc do round to the nearest integer, round/mround do not necessarily. They have options to specify fractional precision and have options for various rounding strategies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, sounds good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This drops the tests for atan2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, I had moved atan2 to the binary DispatchBest but apparently skipped it during a rebase/merge. I will add them back.

@edponce edponce force-pushed the ARROW-12745-Compute-Add-floor-ceiling-and-truncate-k branch from 3a657c4 to c0b1041 Compare July 16, 2021 17:38
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lidavidm lidavidm closed this in 8ce0c01 Jul 16, 2021
@lidavidm
Copy link
Member

Thanks @edponce!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants