-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-11747] Reject SQL mixed with UDFs #14015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,6 +20,7 @@ | |
| import com.google.zetasql.LanguageOptions; | ||
| import com.google.zetasql.Value; | ||
| import java.util.Collection; | ||
| import java.util.HashSet; | ||
| import java.util.List; | ||
| import java.util.Map; | ||
| import org.apache.beam.sdk.extensions.sql.impl.BeamSqlPipelineOptions; | ||
|
|
@@ -63,6 +64,7 @@ | |
| import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef; | ||
| import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral; | ||
| import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; | ||
| import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexSlot; | ||
| import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.schema.SchemaPlus; | ||
| import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; | ||
| import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlOperator; | ||
|
|
@@ -141,11 +143,14 @@ public static Collection<RuleSet> getZetaSqlRuleSets(Collection<RelOptRule> calc | |
| * group is equal to {@code SqlAnalyzer.USER_DEFINED_JAVA_SCALAR_FUNCTIONS} | ||
| */ | ||
| static boolean hasOnlyJavaUdfInProjects(RelOptRuleCall x) { | ||
| HashSet<Integer> udfs = new HashSet<>(); | ||
| List<RelNode> resList = x.getRelList(); | ||
| for (RelNode relNode : resList) { | ||
| if (relNode instanceof LogicalCalc) { | ||
| LogicalCalc logicalCalc = (LogicalCalc) relNode; | ||
| for (RexNode rexNode : logicalCalc.getProgram().getExprList()) { | ||
| List<RexNode> exprList = logicalCalc.getProgram().getExprList(); | ||
| for (int i = 0; i < exprList.size(); i++) { | ||
| RexNode rexNode = exprList.get(i); | ||
| if (rexNode instanceof RexCall) { | ||
| RexCall call = (RexCall) rexNode; | ||
| final SqlOperator operator = call.getOperator(); | ||
|
|
@@ -160,8 +165,10 @@ static boolean hasOnlyJavaUdfInProjects(RelOptRuleCall x) { | |
| SqlUserDefinedFunction udf = (SqlUserDefinedFunction) call.op; | ||
| if (udf.function instanceof ZetaSqlScalarFunctionImpl) { | ||
| ZetaSqlScalarFunctionImpl scalarFunction = (ZetaSqlScalarFunctionImpl) udf.function; | ||
| if (!scalarFunction.functionGroup.equals( | ||
| if (scalarFunction.functionGroup.equals( | ||
| SqlAnalyzer.USER_DEFINED_JAVA_SCALAR_FUNCTIONS)) { | ||
| udfs.add(i); | ||
| } else { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it guaranteed that none of the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think there is a case of nested call?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. E.g. Increment(1 + 1) where Increment is a UDF?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is exactly what I was thinking. If that is true, it is necessary to recursively call on each operand.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. However, it would be a normal approach to low-level code (like the That would cause the project to be turned into: Then recursion is not needed.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if there is such form, Calc splitting will become easy to implement.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Only RexCall has getOperands. I was under the impression that these operands are garenteed to be prior entries returned in ExprList. It will be a little while before I have time to verify. If that is not the case, it is easy to transform the ExprList into that form. That will need to be true for this to be correct and for calc splitting to be easy.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea that is how I expected calc splitting to be implemented as well. I thought that the program was constrained to have that form as well. Didn't you demonstrate an example like
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The case you describe was solved by #13912. Programs should be normalized before we get here, so RexCall is guaranteed to only reference previous arguments: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To answer my own question here, Calc does set normalize=true: https://github.com/apache/calcite/blob/03e356c656c2bc98b1a273352475033545e0928d/core/src/main/java/org/apache/calcite/rel/core/Calc.java#L215 |
||
| // Reject ZetaSQL Builtin Scalar Functions | ||
| return false; | ||
| } | ||
|
|
@@ -205,9 +212,21 @@ static boolean hasOnlyJavaUdfInProjects(RelOptRuleCall x) { | |
| return false; | ||
| } | ||
| } | ||
| for (RexSlot slot : logicalCalc.getProgram().getProjectList()) { | ||
| if (!udfs.contains(slot.getIndex())) { | ||
| // Reject non-udf project | ||
| return false; | ||
| } | ||
| } | ||
| if (logicalCalc.getProgram().getCondition() != null) { | ||
| if (!udfs.contains(logicalCalc.getProgram().getCondition().getIndex())) { | ||
| // Reject non-udf condition | ||
| return false; | ||
| } | ||
| } | ||
| } | ||
| } | ||
| return true; | ||
| return !udfs.isEmpty(); | ||
| } | ||
|
|
||
| /** | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be scoped inside
for (RelNode relNode : resList).Does anyone know if it's even possible for
xto contain multiple rels?