Skip to content

Conversation

@LiBinfeng-01
Copy link
Contributor

@LiBinfeng-01 LiBinfeng-01 commented Aug 31, 2023

Proposed changes

Problem:
When inferring predicate,we lost cast of source expressions and some datatype derivation.

Example:
a = b and cast(a as targetType) = constant
(cast(a as targetType) = constant ) this expression is define as source expression.
we expect getting cast(b as targetType) = constant instead of b = constant

Reason:
When inferring predicate, we will compare original type of a and b. if they can be cast
without precision lost, a new predicate would be created. But created predicate forgot
to cast to target type

Solved:
Add cast to target type, and open make other datatype valid also.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@LiBinfeng-01
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.13 seconds
stream load tsv: 539 seconds loaded 74807831229 Bytes, about 132 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17162154202 Bytes

return expr.rewriteUp(e -> {
if (isDataTypeValid(originDataType, leftSlotEqualToRightSlot)) {
if (ExpressionUtils.isTwoExpressionEqualWithCast(e, leftSlotEqualToRightSlot.child(0))) {
if (!leftSlotEqualToRightSlot.child(1).getDataType().equals(e.getDataType())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some ut case please~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@LiBinfeng-01 LiBinfeng-01 force-pushed the fix_nereids branch 2 times, most recently from 0680917 to fd79201 Compare September 7, 2023 09:48
@LiBinfeng-01
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.13 seconds
stream load tsv: 531 seconds loaded 74807831229 Bytes, about 134 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162091293 Bytes

if (isDataTypeValid(originDataType, leftSlotEqualToRightSlot)) {
if (ExpressionUtils.isTwoExpressionEqualWithCast(e, leftSlotEqualToRightSlot.child(0))) {
if (ExpressionUtils.isTwoExpressionEqualWithCast(e, leftSlotEqualToRightSlot.child(0))
&& !(e instanceof Cast)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add !(e instance Cast) ? add some comment to explain it.
btw, !(e instance Cast) is a common predicate, should put at L106.
and if (isDataTypeValid(originDataType, leftSlotEqualToRightSlot)) could put before return?
and, we do not use the second parameters at all in isDataTypeValid at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because when rewriteup, it would do rewrite repeatly, assume we have a cast(a) = constant,and a = b for example. When rewrite up, it would replace a to b first and then replace cast(b) -> a when go back to cast layer. isDatatypeValid parameter has been used.

@LiBinfeng-01
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 50.61 seconds
stream load tsv: 585 seconds loaded 74807831229 Bytes, about 121 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17162445335 Bytes

@LiBinfeng-01
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.98 seconds
stream load tsv: 580 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17162203651 Bytes

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 11, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@starocean999 starocean999 merged commit be36183 into apache:master Sep 11, 2023
xiaokang pushed a commit that referenced this pull request Sep 11, 2023
…3692)

Problem:
When inferring predicate,we lost cast of source expressions and some datatype derivation.

Example:
a = b and cast(a as targetType) = constant
(cast(a as targetType) = constant ) this expression is define as source expression.
we expect getting cast(b as targetType) = constant instead of b = constant

Reason:
When inferring predicate, we will compare original type of a and b. if they can be cast
without precision lost, a new predicate would be created. But created predicate forgot
to cast to target type

Solved:
Add cast to target type, and open make other datatype valid also.
xiaokang added a commit that referenced this pull request Sep 12, 2023
xiaokang pushed a commit that referenced this pull request Sep 13, 2023
…3692)

Problem:
When inferring predicate,we lost cast of source expressions and some datatype derivation.

Example:
a = b and cast(a as targetType) = constant
(cast(a as targetType) = constant ) this expression is define as source expression.
we expect getting cast(b as targetType) = constant instead of b = constant

Reason:
When inferring predicate, we will compare original type of a and b. if they can be cast
without precision lost, a new predicate would be created. But created predicate forgot
to cast to target type

Solved:
Add cast to target type, and open make other datatype valid also.
xiaokang added a commit that referenced this pull request Sep 13, 2023
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Sep 28, 2023
morrySnow added a commit that referenced this pull request Oct 7, 2023
vinlee19 pushed a commit to vinlee19/doris that referenced this pull request Oct 7, 2023
@xiaokang
Copy link
Contributor

refactored in other way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. merge_conflict reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants