Skip to content

Conversation

@LiBinfeng-01
Copy link
Contributor

@LiBinfeng-01 LiBinfeng-01 commented Jun 26, 2023

Proposed changes

Problem:
When inferring predicate, we assume that slot reference need to be inferred. But in this case:
carete table tb1(l1 smallint) ...;
create table tb2(l2 int) ...;
select * from tb1 inner join tb2 where tb1.l1 = tb2.l2 and tb2.l2 = 1;
We can not get tb1.l1 = 1 filter because we will add a cast to l1 (Cast smallint to int l1) = l2.

Solved:
Add cast consideration when inferring predicate, also add change judgement when judging equals to slotreference and cast expression. But when we want to infer predicate from bigger type cast to smaller type, it is logical error.
For example:
select * from tb1 inner join tb2 where tb1.l1 = cast(tb2.l2 as smallint) and tb2.l2 = (number between smallint max and intmax);
tb2.l2 value can not infer to left side because tb1.l1 would be false value, and when we add one more condition like tb1.l1 = tb3.l3(smallint). It would cause this predicate be false.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@LiBinfeng-01
Copy link
Contributor Author

run buildall

@LiBinfeng-01 LiBinfeng-01 force-pushed the fix_infer_predicate branch from c289425 to eaccc91 Compare June 27, 2023 03:10
@LiBinfeng-01
Copy link
Contributor Author

run feut

@LiBinfeng-01
Copy link
Contributor Author

run buildall

@LiBinfeng-01 LiBinfeng-01 force-pushed the fix_infer_predicate branch from eaccc91 to 81ac254 Compare June 28, 2023 09:00
@LiBinfeng-01
Copy link
Contributor Author

run buildall

// under the License.

suite("load") {
sql 'create database if not exists nereids_infer_predicate_test'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not need create database here, remove load.groovy completely

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@morrySnow morrySnow added the dev/2.0.0 2.0.0 release label Jun 30, 2023
@LiBinfeng-01 LiBinfeng-01 force-pushed the fix_infer_predicate branch from 81ac254 to d503715 Compare July 5, 2023 06:20
@LiBinfeng-01
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.08 seconds
stream load tsv: 457 seconds loaded 74807831229 Bytes, about 156 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 56 seconds loaded 1101869774 Bytes, about 18 MB/s
stream load parquet: 28 seconds loaded 861443392 Bytes, about 29 MB/s
insert into select: 68.2 seconds inserted 10000000 Rows, about 146K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230705071925_clickbench_pr_172566.html

@LiBinfeng-01 LiBinfeng-01 force-pushed the fix_infer_predicate branch from d503715 to b9c4777 Compare July 5, 2023 13:53
@LiBinfeng-01
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 48.46 seconds
stream load tsv: 457 seconds loaded 74807831229 Bytes, about 156 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 56 seconds loaded 1101869774 Bytes, about 18 MB/s
stream load parquet: 28 seconds loaded 861443392 Bytes, about 29 MB/s
insert into select: 67.7 seconds inserted 10000000 Rows, about 147K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230705142027_clickbench_pr_172936.html

@hello-stephen
Copy link
Contributor

(From new mechine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 56.08 seconds
stream load tsv: 516 seconds loaded 74807831229 Bytes, about 138 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 87.3 seconds inserted 10000000 Rows, about 114K ops/s
storage size: 17167412237 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230705231101_clickbench_pr_172941.html

@LiBinfeng-01 LiBinfeng-01 force-pushed the fix_infer_predicate branch from b9c4777 to 42e2ab1 Compare July 6, 2023 09:10
@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 52.65 seconds
stream load tsv: 512 seconds loaded 74807831229 Bytes, about 139 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 90.5 seconds inserted 10000000 Rows, about 110K ops/s
storage size: 17167473914 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230706181641_clickbench_pr_173630.html

starocean999
starocean999 previously approved these changes Jul 6, 2023
@LiBinfeng-01
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 51.7 seconds
stream load tsv: 456 seconds loaded 74807831229 Bytes, about 156 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 57 seconds loaded 1101869774 Bytes, about 18 MB/s
stream load parquet: 28 seconds loaded 861443392 Bytes, about 29 MB/s
insert into select: 67.7 seconds inserted 10000000 Rows, about 147K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230707011857_clickbench_pr_174031.html

@LiBinfeng-01
Copy link
Contributor Author

run clickbench

@LiBinfeng-01
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 51.53 seconds
stream load tsv: 502 seconds loaded 74807831229 Bytes, about 142 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 88.8 seconds inserted 10000000 Rows, about 112K ops/s
storage size: 17166995123 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230707114339_clickbench_pr_174139.html

@hello-stephen
Copy link
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.61 seconds
stream load tsv: 449 seconds loaded 74807831229 Bytes, about 158 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 57 seconds loaded 1101869774 Bytes, about 18 MB/s
stream load parquet: 28 seconds loaded 861443392 Bytes, about 29 MB/s
insert into select: 68.8 seconds inserted 10000000 Rows, about 145K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230707041006_clickbench_pr_174123.html

@hello-stephen
Copy link
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 50.13 seconds
stream load tsv: 447 seconds loaded 74807831229 Bytes, about 159 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 57 seconds loaded 1101869774 Bytes, about 18 MB/s
stream load parquet: 29 seconds loaded 861443392 Bytes, about 28 MB/s
insert into select: 68.7 seconds inserted 10000000 Rows, about 145K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230707043705_clickbench_pr_174136.html

starocean999
starocean999 previously approved these changes Jul 11, 2023
@LiBinfeng-01 LiBinfeng-01 force-pushed the fix_infer_predicate branch from 9b3a2a8 to b5c9089 Compare July 17, 2023 02:17
@LiBinfeng-01
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 54.66 seconds
stream load tsv: 503 seconds loaded 74807831229 Bytes, about 141 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17169748213 Bytes

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 17, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morrySnow morrySnow merged commit 58f2593 into apache:master Jul 19, 2023
@xiaokang xiaokang added dev/2.0.0-merged and removed dev/2.0.0 2.0.0 release labels Jul 20, 2023
xiaokang pushed a commit that referenced this pull request Jul 20, 2023
… predicate (#21171)

Problem:
When inferring predicate, we assume that slot reference need to be inferred. But in this case:
carete table tb1(l1 smallint) ...;
create table tb2(l2 int) ...;
select * from tb1 inner join tb2 where tb1.l1 = tb2.l2 and tb2.l2 = 1;
We can not get tb1.l1 = 1 filter because we will add a cast to l1 (Cast smallint to int l1) = l2.

Solved:
Add cast consideration when inferring predicate, also add change judgement when judging equals to slotreference and cast expression. But when we want to infer predicate from bigger type cast to smaller type, it is logical error.
For example:
select * from tb1 inner join tb2 where tb1.l1 = cast(tb2.l2 as smallint) and tb2.l2 = (number between smallint max and intmax);
tb2.l2 value can not infer to left side because tb1.l1 would be false value, and when we add one more condition like tb1.l1 = tb3.l3(smallint). It would cause this predicate be false.
LHG41278 pushed a commit to LHG41278/dorisMine that referenced this pull request Jul 20, 2023
… predicate (apache#21171)

Problem:
When inferring predicate, we assume that slot reference need to be inferred. But in this case:
carete table tb1(l1 smallint) ...;
create table tb2(l2 int) ...;
select * from tb1 inner join tb2 where tb1.l1 = tb2.l2 and tb2.l2 = 1;
We can not get tb1.l1 = 1 filter because we will add a cast to l1 (Cast smallint to int l1) = l2.

Solved:
Add cast consideration when inferring predicate, also add change judgement when judging equals to slotreference and cast expression. But when we want to infer predicate from bigger type cast to smaller type, it is logical error.
For example:
select * from tb1 inner join tb2 where tb1.l1 = cast(tb2.l2 as smallint) and tb2.l2 = (number between smallint max and intmax);
tb2.l2 value can not infer to left side because tb1.l1 would be false value, and when we add one more condition like tb1.l1 = tb3.l3(smallint). It would cause this predicate be false.
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Oct 19, 2023
This reverts "[Fix](Nereids) Add cast comparison with slot reference when inferring predicate (apache#21171)"
commit 58f2593.
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Oct 19, 2023
This reverts "[Fix](Nereids) Add cast comparison with slot reference when inferring predicate (apache#21171)"
commit 58f2593.
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Oct 19, 2023
This reverts "[Fix](Nereids) Add cast comparison with slot reference when inferring predicate (apache#21171)"
commit 58f2593.
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Oct 23, 2023
This reverts "[Fix](Nereids) Add cast comparison with slot reference when inferring predicate (apache#21171)"
commit 58f2593.
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Oct 24, 2023
This reverts "[Fix](Nereids) Add cast comparison with slot reference when inferring predicate (apache#21171)"
commit 58f2593.
morrySnow added a commit that referenced this pull request Oct 25, 2023
…25637)

extract slot and literal in comparison predicate. infer new one by equals predicates.
use TypeCoercion to add cast on new comparison predicate to ensure it is correct.

This reverts "[Fix](Nereids) Add cast comparison with slot reference when inferring predicate (#21171)"
commit 58f2593.
xiaokang pushed a commit that referenced this pull request Oct 25, 2023
…25637)

extract slot and literal in comparison predicate. infer new one by equals predicates.
use TypeCoercion to add cast on new comparison predicate to ensure it is correct.

This reverts "[Fix](Nereids) Add cast comparison with slot reference when inferring predicate (#21171)"
commit 58f2593.
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Oct 26, 2023
…pache#25637)

pick from master
PR: apache#25637
commit id: ae66464

extract slot and literal in comparison predicate. infer new one by equals predicates.
use TypeCoercion to add cast on new comparison predicate to ensure it is correct.

This reverts "[Fix](Nereids) Add cast comparison with slot reference when inferring predicate (apache#21171)"
commit 58f2593.
xiaokang pushed a commit that referenced this pull request Oct 26, 2023
…25637) (#25930)

pick from master
PR: #25637
commit id: ae66464

extract slot and literal in comparison predicate. infer new one by equals predicates.
use TypeCoercion to add cast on new comparison predicate to ensure it is correct.

This reverts "[Fix](Nereids) Add cast comparison with slot reference when inferring predicate (#21171)"
commit 58f2593.
dutyu pushed a commit to dutyu/doris that referenced this pull request Oct 28, 2023
…pache#25637)

extract slot and literal in comparison predicate. infer new one by equals predicates.
use TypeCoercion to add cast on new comparison predicate to ensure it is correct.

This reverts "[Fix](Nereids) Add cast comparison with slot reference when inferring predicate (apache#21171)"
commit 58f2593.
gnehil pushed a commit to gnehil/doris that referenced this pull request Dec 4, 2023
…pache#25637)

extract slot and literal in comparison predicate. infer new one by equals predicates.
use TypeCoercion to add cast on new comparison predicate to ensure it is correct.

This reverts "[Fix](Nereids) Add cast comparison with slot reference when inferring predicate (apache#21171)"
commit 58f2593.
gnehil pushed a commit to gnehil/doris that referenced this pull request Dec 4, 2023
…pache#25637) (apache#25930)

pick from master
PR: apache#25637
commit id: ae66464

extract slot and literal in comparison predicate. infer new one by equals predicates.
use TypeCoercion to add cast on new comparison predicate to ensure it is correct.

This reverts "[Fix](Nereids) Add cast comparison with slot reference when inferring predicate (apache#21171)"
commit 58f2593.
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…pache#25637)

extract slot and literal in comparison predicate. infer new one by equals predicates.
use TypeCoercion to add cast on new comparison predicate to ensure it is correct.

This reverts "[Fix](Nereids) Add cast comparison with slot reference when inferring predicate (apache#21171)"
commit 58f2593.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/nereids dev/2.0.0-merged kind/test reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants