Skip to content

Conversation

@tshauck
Copy link
Contributor

@tshauck tshauck commented Jul 21, 2024

Which issue does this PR close?

Closes #11583 11583

Rationale for this change

Many math unary functions aren't unbounded, but because the macro doesn't include a way to modify this on a per function basis the trait default (of unbounded) is used.

What changes are included in this PR?

  1. Change the make_math_unary_udf macro to take a bounds function
  2. Implement a function for math functions where the bounds are known a priori.

Are these changes tested?

Yes, see unittests.

Are there any user-facing changes?

No

@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate and removed core Core DataFusion crate labels Jul 21, 2024
@tshauck tshauck marked this pull request as ready for review July 21, 2024 17:12
pub(super) fn unbounded(_input: &[&Interval]) -> crate::Result<Interval> {
// We cannot assume the input datatype is the same of output type.
Interval::make_unbounded(&DataType::Null)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better here returning not_impl_err? The current version may cause type mismatch errors and debugging would require more time.

Copy link
Contributor Author

@tshauck tshauck Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would cause an error when trying to use these functions? The update macro is going to call whatever is passed as $EVALUATE_BOUNDS. Right now this implementation matches the default of the ScalarUDFImpl trait.

I think an alternative is to make something like Interval::make_symmetric_inf. I'm using unbounded here somewhat loosely, but probably in most of these cases it should be (-inf, inf) which could be given a type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went ahead and did that here: cf2b023

@tshauck tshauck requested a review from berkaysynnada July 22, 2024 15:00
}

/// Creates an interval between -∞ to ∞.
pub fn make_infinity_interval(data_type: &DataType) -> Result<Self> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use ScalarValue::Float(None) for infinite bounds by convention. Even if you give Float(Some(f::INF)) or Float(Some(f::NaN)), they are converted to Float(None)'s during interval creation. We had given such a decision to have unique representation of unboundedness. You can check the details here:

macro_rules! handle_float_intervals {

I recommend here to use make_unbounded API with floating types to not break this convention.

}

/// Create an interval from 0 to infinity.
pub fn make_non_negative_infinity_interval(data_type: &DataType) -> Result<Self> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion applies here as well

@berkaysynnada
Copy link
Contributor

Thanks @tshauck. This is a nice step towards having comprehensive interval analysis. I have left a small suggestion. Once that is addressed, the PR will be good to go 🚀

@ozankabak
Copy link
Contributor

Apart from @berkaysynnada's outstanding suggestions the only thing I see is that we would need to use two different π values, depending on whether it is in the lower bound or the upper bound of the resulting interval.

If π will be in the upper bound, we should use the smallest floating point number that is greater than the mathematical (precise) value of π. Let's call this pi_upper. Conversely, if π will be in the lower bound, we should use the largest floating point number that is less than the mathematical (precise) value of π.

Since π is not exactly representable, the pi you get from Rust is actually either pi_upper or pi_lower (you need to check). You can obtain the other one by using the nextafter function (let's just be careful to do this only once and use it as a constant everywhere).

Apart from this detail that is addressable by a small fix, this seems to be a great PR. Thank you.

@tshauck
Copy link
Contributor Author

tshauck commented Jul 23, 2024

Thanks to both of you for the thoughtful feedback.

@berkaysynnada, I updated infinity to use unbounded in 231a717.

@ozankabak, for handling π, do you mean libm::nextafter? There's also a .next_up, but unfortunately, it's in nightly. Did I overlook the actual function, or if not, any opinion on bringing in libm as dependency vs something w/i datafusion, albeit less robust?

@ozankabak
Copy link
Contributor

IIRC nextafter is the IEEE standard name for this function, and Rust's next_up does something similar (in the upward direction). If it is in nightly, we probably should just create constants with these values explicitly given for each float type.

@tshauck tshauck requested a review from berkaysynnada July 24, 2024 00:04
@tshauck
Copy link
Contributor Author

tshauck commented Jul 24, 2024

Thanks @ozankabak, would you please have a look at the latest changes and lmk what you think?

@berkaysynnada
Copy link
Contributor

I checked the new version for both unbounded creation and rounded PI usage. Everything looks good and seems ready to merge. Thanks @tshauck!

Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me -- thanks for this great work @tshauck!

@tshauck tshauck force-pushed the add-unary-udf-bounds branch from 4337de8 to a10a150 Compare July 24, 2024 17:05
@tshauck
Copy link
Contributor Author

tshauck commented Jul 24, 2024

Thanks! -- it looks like signum was changed in the meantime, so removing that for now as it doesn't use make_math_unary_udf anymore. The rebase seems ok locally, but I'll check to see if things don't go through for w/e reason here.

@alamb
Copy link
Contributor

alamb commented Jul 24, 2024

I took a quick look and this looks good to me. Thank you @tshauck @berkaysynnada and @ozankabak 🚀

@alamb alamb merged commit 5901df5 into apache:main Jul 24, 2024
@tshauck tshauck deleted the add-unary-udf-bounds branch July 24, 2024 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement evaluate_bounds for math unary functions

4 participants