Skip to content

Conversation

@slinkydeveloper
Copy link
Contributor

@slinkydeveloper slinkydeveloper commented Mar 7, 2022

What is the purpose of the change

This PR implements the SEARCH operator in the codegen, and removes the scalar implementation of IN and NOT_IN. Now every scalar IN/NOT_IN using a constant set is implemented through SEARCH (following Calcite's development on the topic https://issues.apache.org/jira/browse/CALCITE-4173) and plans will only have SEARCH

Brief change log

  • Remove NOT_IN from the plans, which was added only by the ConvertToNotInOrInRule. With this commit every scalar IN/NOT_IN using a constant set is converted to SEARCH, otherwise it's converted to a chain of disjunctions (see RexUtil#expandSearch)
  • Now GenerateUtils#generateLiteral understands the Flink's internal type system. The conversion from Calcite's Comparable type system to Flink's internal type system is provided by a new function RexLiteralUtil#toFlinkInternalValue
  • Add SearchOperatorGen to implement the SEARCH operator starting from the previous generateIn function in ScalarOperatorGens.

Verifying this change

Existing code is already testing thoroughly IN/NOT_IN. I added an additional test for the plan

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

… as suggested by Calcite

Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
@flinkbot
Copy link
Collaborator

flinkbot commented Mar 7, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@slinkydeveloper slinkydeveloper force-pushed the remove-search branch 2 times, most recently from 4991fc6 to 8882fd5 Compare March 10, 2022 11:10
…rnal type system. The conversion from Calcite's Comparable type system to Flink's internal type system is provided by a new function RexLiteralUtil#toFlinkInternalValue

Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
@slinkydeveloper
Copy link
Contributor Author

@flinkbot run azure

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice refactoring @slinkydeveloper. I couldn't add much here. I hope the change is already covered by existing tests.

case td: TimestampData =>
s"$TIMESTAMP_DATA.fromEpochMillis(${td.getMillisecond}L, ${td.getNanoOfMillisecond})"
case decimalData: DecimalData =>
s"""$DECIMAL_UTIL.castFrom(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wouldn't unscaled long/bytes be cheaper? at least we could use the compact representation for simple numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular case, it doesn't really make a difference, as we store literal values in static variables anyway. I rather prefer to keep the implementation simple here.


if (noneMatcher.matches()) {
val reusePattern = ctx.addReusableStringConstants(newPattern)
val reusePattern = ctx.addReusableEscapedStringConstant(newPattern)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: method is quite long addReusableEscapedStringConstant -> addReusableEscapedString

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that is adding a constant is quite important, i wouldn't remove it from the name.

toFlinkInternalValue(value, valueType.asInstanceOf[DistinctType].getSourceType)

case SYMBOL =>
value.asInstanceOf[Enum[_]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Calcite symbols remain? Maybe this should be mentioned in the JavaDocs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep to make the result of this function compatible with generateLiterals

@@ -464,19 +465,10 @@ class ExprCodeGenerator(ctx: CodeGeneratorContext, nullableInput: Boolean)
override def visitCall(call: RexCall): GeneratedExpression = {
val resultType = FlinkTypeFactory.toLogicalType(call.getType)
if (call.getKind == SqlKind.SEARCH) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this early check here or can we simply use the regular switch/case list below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this check because the generateCallExpression method accepts the operands already converted to expressions. But in case of search, the codegen uses Sarg directly.

Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @slinkydeveloper.

Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
@twalthr twalthr closed this in ae6c8cf Mar 16, 2022
twalthr pushed a commit that referenced this pull request Mar 16, 2022
Implements the SEARCH operator in the codegen and removes
the scalar implementation of IN and NOT_IN. Now every scalar
IN/NOT_IN using a constant set is implemented through SEARCH
(following Calcite's development on the topic CALCITE-4173)
and plans will only have SEARCH.

This closes #19001.
@slinkydeveloper slinkydeveloper deleted the remove-search branch March 16, 2022 09:44
JasonLeeCoding pushed a commit to JasonLeeCoding/flink that referenced this pull request May 27, 2022
Implements the SEARCH operator in the codegen and removes
the scalar implementation of IN and NOT_IN. Now every scalar
IN/NOT_IN using a constant set is implemented through SEARCH
(following Calcite's development on the topic CALCITE-4173)
and plans will only have SEARCH.

This closes apache#19001.
zstraw pushed a commit to zstraw/flink that referenced this pull request Jul 4, 2022
Implements the SEARCH operator in the codegen and removes
the scalar implementation of IN and NOT_IN. Now every scalar
IN/NOT_IN using a constant set is implemented through SEARCH
(following Calcite's development on the topic CALCITE-4173)
and plans will only have SEARCH.

This closes apache#19001.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants