-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Ensure that math functions fulfil the ColumnarValue contract #12922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -226,9 +226,8 @@ macro_rules! make_math_unary_udf { | |
| $EVALUATE_BOUNDS(inputs) | ||
| } | ||
|
|
||
| fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> { | ||
| let args = ColumnarValue::values_to_arrays(args)?; | ||
|
|
||
| fn invoke(&self, col_args: &[ColumnarValue]) -> Result<ColumnarValue> { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that @simonvandel and @jonahgao have been removing use of (or example #12889) We should probably make sure we aren't missing a similar problem in their reviews
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it necessary to add an attribute like
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess it's not necessarily true for volatile functions. But I wonder then how will the function know the array length? E.g. for a
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If a function needs to maintain scalar, we can call // in file: datafusion/physical-expr/src/scalar_function.rs
// evaluate the function
let output = match self.args.is_empty() {
true => self.fun.invoke_no_args(batch.num_rows()),
false => self.fun.invoke(&inputs),
}?;
if let ColumnarValue::Array(array) = &output {
if self.fun.need_maintain_scalar() && !inputs.is_empty() {
output = ColumnarValue::from_args_and_result(inputs, array)
}
}
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean that if the function is volatile but it has only scalar arguments, how does it know how many rows to produce?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah anyway, I was just trying to show that this is a deficiency of the UDF model. Ideally, Regarding adding a flag with default
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Make sense to me. This makes it easy to support
This can be a general solution. The disadvantage is that if the arguments are not real scalar, it will cause unnecessary overhead, as the final result of queries will still need to be converted back into arrays.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah, but it sounds very unlikely that we will have a batch of one row. In any case, we can do the same check that all arguments are scalar 👍
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
it's unlikely, but you must not assume that. a trivial example is a source table with one row.
👍 |
||
| let args = ColumnarValue::values_to_arrays(col_args)?; | ||
| let arr: ArrayRef = match args[0].data_type() { | ||
| DataType::Float64 => { | ||
| Arc::new(make_function_scalar_inputs_return_type!( | ||
|
|
@@ -255,7 +254,8 @@ macro_rules! make_math_unary_udf { | |
| ) | ||
| } | ||
| }; | ||
| Ok(ColumnarValue::Array(arr)) | ||
|
|
||
| ColumnarValue::from_args_and_result(col_args, arr) | ||
| } | ||
| } | ||
| } | ||
|
|
@@ -336,9 +336,8 @@ macro_rules! make_math_binary_udf { | |
| $OUTPUT_ORDERING(input) | ||
| } | ||
|
|
||
| fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> { | ||
| let args = ColumnarValue::values_to_arrays(args)?; | ||
|
|
||
| fn invoke(&self, col_args: &[ColumnarValue]) -> Result<ColumnarValue> { | ||
| let args = ColumnarValue::values_to_arrays(col_args)?; | ||
| let arr: ArrayRef = match args[0].data_type() { | ||
| DataType::Float64 => Arc::new(make_function_inputs2!( | ||
| &args[0], | ||
|
|
@@ -364,7 +363,8 @@ macro_rules! make_math_binary_udf { | |
| ) | ||
| } | ||
| }; | ||
| Ok(ColumnarValue::Array(arr)) | ||
|
|
||
| ColumnarValue::from_args_and_result(col_args, arr) | ||
| } | ||
| } | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we allow such code to exist, wouldn't it be better to just have it during constant-folding, rather than inside function implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It already exists in constant folding, but we shouldn't rely on constant folding for the correct function implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That depends how we define "correct".
IF constant folding is the only party where maintaining ColumnarValue::Scalar matters, THEN we can redefine "correct", loosening the contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not the only place where it matters. If for some reason the expression is not constant folded, you will get an error that the array lengths of different columns don't match.