-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-11747] Reject SQL mixed with UDFs #14015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| if (scalarFunction.functionGroup.equals( | ||
| SqlAnalyzer.USER_DEFINED_JAVA_SCALAR_FUNCTIONS)) { | ||
| udfs.add(i); | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it guaranteed that none of the RexNode in call.getOperands() contains any subexpression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a case of nested call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. Increment(1 + 1) where Increment is a UDF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what I was thinking. If that is true, it is necessary to recursively call on each operand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, it would be a normal approach to low-level code (like the Program object) to use SSA / A-normal form so that there is no nested expression. I just don't know what Calcite guarantees.
That would cause the project to be turned into:
x = 1 + 1
y = increment(x)
Then recursion is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there is such form, Calc splitting will become easy to implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only RexCall has getOperands. I was under the impression that these operands are garenteed to be prior entries returned in ExprList. It will be a little while before I have time to verify. If that is not the case, it is easy to transform the ExprList into that form. That will need to be true for this to be correct and for calc splitting to be easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that is how I expected calc splitting to be implemented as well. I thought that the program was constrained to have that form as well.
Didn't you demonstrate an example like increment(1 + 1) where the + was executed by BeamCalcRel? This is what prompted me to read the code to see if the operands were checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case you describe was solved by #13912.
Programs should be normalized before we get here, so RexCall is guaranteed to only reference previous arguments:
https://github.com/apache/calcite/blob/12a484a5c364c36e9551e59f4dc33bfb219ecf07/core/src/main/java/org/apache/calcite/rex/RexProgramBuilder.java#L507
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To answer my own question here, Calc does set normalize=true: https://github.com/apache/calcite/blob/03e356c656c2bc98b1a273352475033545e0928d/core/src/main/java/org/apache/calcite/rel/core/Calc.java#L215
| * group is equal to {@code SqlAnalyzer.USER_DEFINED_JAVA_SCALAR_FUNCTIONS} | ||
| */ | ||
| static boolean hasOnlyJavaUdfInProjects(RelOptRuleCall x) { | ||
| HashSet<Integer> udfs = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be scoped inside for (RelNode relNode : resList).
Does anyone know if it's even possible for x to contain multiple rels?
|
This isn't necessary since we merged #14010, right? |
|
At some point we are going to revert #14010. This was intended to be a demonstration of what is required to entirely avoid mixing built-in operations with UDFs. |
This should completely reject non-udfs mixed with Java UDFs.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.