-
Notifications
You must be signed in to change notification settings - Fork 13.9k
[FLINK-26520][table] Implement SEARCH operator in codegen #19001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… as suggested by Calcite Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
4991fc6 to
8882fd5
Compare
…rnal type system. The conversion from Calcite's Comparable type system to Flink's internal type system is provided by a new function RexLiteralUtil#toFlinkInternalValue Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
8882fd5 to
fde3eaa
Compare
|
@flinkbot run azure |
twalthr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice refactoring @slinkydeveloper. I couldn't add much here. I hope the change is already covered by existing tests.
...link-table-planner/src/main/scala/org/apache/flink/table/planner/codegen/GenerateUtils.scala
Outdated
Show resolved
Hide resolved
| case td: TimestampData => | ||
| s"$TIMESTAMP_DATA.fromEpochMillis(${td.getMillisecond}L, ${td.getNanoOfMillisecond})" | ||
| case decimalData: DecimalData => | ||
| s"""$DECIMAL_UTIL.castFrom( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wouldn't unscaled long/bytes be cheaper? at least we could use the compact representation for simple numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this particular case, it doesn't really make a difference, as we store literal values in static variables anyway. I rather prefer to keep the implementation simple here.
|
|
||
| if (noneMatcher.matches()) { | ||
| val reusePattern = ctx.addReusableStringConstants(newPattern) | ||
| val reusePattern = ctx.addReusableEscapedStringConstant(newPattern) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: method is quite long addReusableEscapedStringConstant -> addReusableEscapedString
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that is adding a constant is quite important, i wouldn't remove it from the name.
...-table-planner/src/main/scala/org/apache/flink/table/planner/plan/utils/RexLiteralUtil.scala
Outdated
Show resolved
Hide resolved
| toFlinkInternalValue(value, valueType.asInstanceOf[DistinctType].getSourceType) | ||
|
|
||
| case SYMBOL => | ||
| value.asInstanceOf[Enum[_]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So Calcite symbols remain? Maybe this should be mentioned in the JavaDocs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep to make the result of this function compatible with generateLiterals
...ble-planner/src/main/scala/org/apache/flink/table/planner/codegen/CodeGeneratorContext.scala
Outdated
Show resolved
Hide resolved
| @@ -464,19 +465,10 @@ class ExprCodeGenerator(ctx: CodeGeneratorContext, nullableInput: Boolean) | |||
| override def visitCall(call: RexCall): GeneratedExpression = { | |||
| val resultType = FlinkTypeFactory.toLogicalType(call.getType) | |||
| if (call.getKind == SqlKind.SEARCH) { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need this early check here or can we simply use the regular switch/case list below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this check because the generateCallExpression method accepts the operands already converted to expressions. But in case of search, the codegen uses Sarg directly.
...-planner/src/main/scala/org/apache/flink/table/planner/codegen/calls/SearchOperatorGen.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
twalthr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @slinkydeveloper.
Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
Implements the SEARCH operator in the codegen and removes the scalar implementation of IN and NOT_IN. Now every scalar IN/NOT_IN using a constant set is implemented through SEARCH (following Calcite's development on the topic CALCITE-4173) and plans will only have SEARCH. This closes #19001.
Implements the SEARCH operator in the codegen and removes the scalar implementation of IN and NOT_IN. Now every scalar IN/NOT_IN using a constant set is implemented through SEARCH (following Calcite's development on the topic CALCITE-4173) and plans will only have SEARCH. This closes apache#19001.
Implements the SEARCH operator in the codegen and removes the scalar implementation of IN and NOT_IN. Now every scalar IN/NOT_IN using a constant set is implemented through SEARCH (following Calcite's development on the topic CALCITE-4173) and plans will only have SEARCH. This closes apache#19001.
What is the purpose of the change
This PR implements the
SEARCHoperator in the codegen, and removes the scalar implementation ofINandNOT_IN. Now every scalarIN/NOT_INusing a constant set is implemented throughSEARCH(following Calcite's development on the topic https://issues.apache.org/jira/browse/CALCITE-4173) and plans will only haveSEARCHBrief change log
NOT_INfrom the plans, which was added only by theConvertToNotInOrInRule. With this commit every scalarIN/NOT_INusing a constant set is converted toSEARCH, otherwise it's converted to a chain of disjunctions (seeRexUtil#expandSearch)GenerateUtils#generateLiteralunderstands the Flink's internal type system. The conversion from Calcite's Comparable type system to Flink's internal type system is provided by a new functionRexLiteralUtil#toFlinkInternalValueSearchOperatorGento implement theSEARCHoperator starting from the previousgenerateInfunction inScalarOperatorGens.Verifying this change
Existing code is already testing thoroughly IN/NOT_IN. I added an additional test for the plan
Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation