-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add FunctionRewrite API, Move Array specific rewrites to datafusion_functions_array
#9583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…sion-array-functions crate
| /// For example, concatenating arrays `a || b` is represented as | ||
| /// `Operator::ArrowAt`, but can be implemented by calling a function | ||
| /// `array_concat` from the `functions-array` crate. | ||
| pub trait FunctionRewrite { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the new trait used to rewrite Exprs to function calls
| schema: &DFSchema, | ||
| _config: &ConfigOptions, | ||
| ) -> datafusion_common::Result<Transformed<Expr>> { | ||
| let transformed = match expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole point of this PR is to move these rules out of datafusion/optimizer/src/analyzer/rewrite_expr.rs in the optimizer crate and into the functions-array crate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading this PR, a question for me is in #9563, get_field function also works for a single ScalarValue of Struct or Map
Should I put the rewrite logic in this file or create a new rewrite file under "datafusion/functions"? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we add another match in this same file
We could also add another FunctionRewriter but that would result in another (unecessary walk down the tree I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, make sense. I thought it's OK for function but may not suitable for semantic, but now adding in the same file works, we may separate another file if there're more function_rewrite.
| && is_func(&left, "make_array") | ||
| && is_func(&right, "make_array") => | ||
| { | ||
| Transformed::yes(array_has_all(*left, *right)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also rewrote these rules to avoid cloneing (as they all get an owned Expr)
| @@ -0,0 +1,123 @@ | |||
| // Licensed to the Apache Software Foundation (ASF) under one | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Github renders this as a new file, but it is the old datafusion/optimizer/src/analyzer/rewrite_expr.rs with the array specific logic removed and instead invokes each FunctionRewrite individually
| pub mod rewrite_expr; | ||
| pub mod subquery; | ||
| pub mod type_coercion; | ||
| use std::sync::Arc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why the imports were moved -- I think my editor did it
FunctionRewrite API, port array specific rewrites to the datafu…FunctionRewrite API, Move Array specific rewrites to datafusion_functions_array
| ) -> datafusion_common::Result<Transformed<Expr>> { | ||
| let transformed = match expr { | ||
| // array1 @> array2 -> array_has_all(array1, array2) | ||
| Expr::BinaryExpr(BinaryExpr { left, op, right }) |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
| plan.with_new_exprs(new_expr, new_inputs) | ||
| } | ||
| } | ||
| struct OperatorToFunctionRewriter<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
references are added for performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think so. In order to pass down the ConfigOptions without clone'ing the them specifically
I think in general we could improve planning efficiency in DataFusion by avoiding clone but that is a project for another day
jayzhan211
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks really nice!
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review @jayzhan211
I am going to merge this in to unblock more function migration work
| schema: &DFSchema, | ||
| _config: &ConfigOptions, | ||
| ) -> datafusion_common::Result<Transformed<Expr>> { | ||
| let transformed = match expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we add another match in this same file
We could also add another FunctionRewriter but that would result in another (unecessary walk down the tree I think)
| plan.with_new_exprs(new_expr, new_inputs) | ||
| } | ||
| } | ||
| struct OperatorToFunctionRewriter<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think so. In order to pass down the ConfigOptions without clone'ing the them specifically
I think in general we could improve planning efficiency in DataFusion by avoiding clone but that is a project for another day
Which issue does this PR close?
Closes #9519
Closes #9557
This is an alternate to the design @jayzhan22 implemented in #9557
Rationale for this change
While reviewing XX I realized that the suggestion to use
AnalyzerRules in the original ticket was somewhat overkill. The core function needed is rewritingExprs (operators specifically) to Functions.Using
AnalyzerRulerequires an entire LogicalPlanWhat changes are included in this PR?
Formalizes the general pattern of rewriting certain expressions / operators to function calls, specifically:
FunctionRewritetraitFunctionRewriters withFunctionRegistryAre these changes tested?
Covered by existing tests
Are there any user-facing changes?
There is a new API that users can use to rewrite DataFusion
Exprs to their own function calls.