Skip to content

Conversation

@jorgecarleitao
Copy link
Member

This PR is done on top of #7971 ,

Its current runtime consequence is that all our math functions now return float32 or float64 depending on their incoming column (and use float32 for other numeric types). Its API consequences is that it allows to register UDFs of variable return types.

This PR hits both physical plans and logical plans, and abstracts a bit the typing of some of our structs. In particular, it introduces a new trait that only cares about the return datatype of an object (PhysicalExpr and LogicalExpr).

@jorgecarleitao jorgecarleitao marked this pull request as draft August 16, 2020 10:39
@jorgecarleitao jorgecarleitao changed the title ARROW-9756: [Rust] [DataFusion] Added support for scalar UDFs of arbitrary types ARROW-9756: [Rust] [DataFusion] Added support for scalar UDFs of arbitrary return types Aug 16, 2020
@jorgecarleitao
Copy link
Member Author

FYI @andygrove @alamb @houqp : this is a draft because it depends on other PRs being reviewed and accepted.

@github-actions
Copy link

Copy link
Member Author

@jorgecarleitao jorgecarleitao Aug 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to drop partialEq for expressions, and therefore also dropped this since it was strictly not needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo

#[derive(Debug)]
struct SumAccumulator {
sum: Option<ScalarValue>,
pub sum: Option<ScalarValue>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed?


pub mod dataframe;
pub mod datasource;
mod datatyped;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub so that others can implement a different DataTyped?

args: Vec<Expr>,
/// The `DataType` the expression will yield
return_type: DataType,
return_type: ReturnType,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THIS IS A MAJOR CHANGE: it makes Expr hold an Arc<&dyn>, which make them unserializable. I was unable to find another way, unfortunately.

Expr::AggregateFunction { return_type, .. } => Ok(return_type.clone()),
Expr::ScalarFunction {
args, return_type, ..
} => return_type(&args.iter().map(|x| x.as_datatyped()).collect(), schema),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not very beautiful, but was unable to find a better pattern for this :(

match self.schema_provider.get_function_meta(&name) {
Some(fm) => {
let args = if name == "count" {
// optimization to avoid computing expressions
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be moved to another place (an optimizer?). For now I left it where it here as it was.

@jorgecarleitao jorgecarleitao deleted the scalar branch October 28, 2020 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants