-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Fix](Nereids) Fix problem of infer predicates not completely #22145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
run buildall |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
Problem:
When inferring predicate in nereids, new inferred predicates can not be the source of next round. For example:
create table tt1(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1');
create table tt2(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1');
create table tt3(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1');
explain select * from tt1 left join tt2 on tt1.c1 = tt2.c1 left join tt3 on tt2.c1 = tt3.c1 where tt1.c1 = 123;
we expect to get t33.c1 = 123, but we can just get t22.c1 = 123. Because when infer tt1.c1 = 123 and tt2.c1 = tt3.c1, we can
not get any relationship of these two predicates.
Solution:
We need to cache middle results of source predicates like t22.c1 = 123 in example.
We use two facilities to do predicate infer: PredicatePropagation and PullUpPredicates. In the prvious implementation, we use a set to save the intermediate result of PredicatePropagation. The purpose is infer new predicate though two equal relation. However, it is the wrong way. Because it could infer wrong predicate through outer join. For example ```sql select a.c1 from a left join b on a.c2 = b.c2 and a.c1 = '1' left join c on a.c2 = c.c2 and a.c1 = '2' inner join d on a.c3=d.c3 ``` the predicates `a.c1 = '1'` and `a.c1 = '2'` should not be inferred as filter to relation `a`. This PR: 1. revert the change from PR apache#22145, commit 3c58e9b 2. Remove the unreasonable restrict in PullupPredicate. 3. Use new Filter node rather than new otherCondition on join node to save infer predicates
We use two facilities to do predicate infer: PredicatePropagation and PullUpPredicates. In the prvious implementation, we use a set to save the intermediate result of PredicatePropagation. The purpose is infer new predicate though two equal relation. However, it is the wrong way. Because it could infer wrong predicate through outer join. For example ```sql select a.c1 from a left join b on a.c2 = b.c2 and a.c1 = '1' left join c on a.c2 = c.c2 and a.c1 = '2' inner join d on a.c3=d.c3 ``` the predicates `a.c1 = '1'` and `a.c1 = '2'` should not be inferred as filter to relation `a`. This PR: 1. revert the change from PR #22145, commit 3c58e9b 2. Remove the unreasonable restrict in PullupPredicate. 3. Use new Filter node rather than new otherCondition on join node to save infer predicates
We use two facilities to do predicate infer: PredicatePropagation and PullUpPredicates. In the prvious implementation, we use a set to save the intermediate result of PredicatePropagation. The purpose is infer new predicate though two equal relation. However, it is the wrong way. Because it could infer wrong predicate through outer join. For example ```sql select a.c1 from a left join b on a.c2 = b.c2 and a.c1 = '1' left join c on a.c2 = c.c2 and a.c1 = '2' inner join d on a.c3=d.c3 ``` the predicates `a.c1 = '1'` and `a.c1 = '2'` should not be inferred as filter to relation `a`. This PR: 1. revert the change from PR #22145, commit 3c58e9b 2. Remove the unreasonable restrict in PullupPredicate. 3. Use new Filter node rather than new otherCondition on join node to save infer predicates
Proposed changes
Problem:
When inferring predicate in nereids, new inferred predicates can not be the source of next round. For example:
create table tt1(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1');
create table tt2(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1');
create table tt3(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1');
explain select * from tt1 left join tt2 on tt1.c1 = tt2.c1 left join tt3 on tt2.c1 = tt3.c1 where tt1.c1 = 123;
we expect to get t33.c1 = 123, but we can just get t22.c1 = 123. Because when infer tt1.c1 = 123 and tt2.c1 = tt3.c1, we can
not get any relationship of these two predicates.
Solution:
We need to cache middle results of source predicates like t22.c1 = 123 in example.
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...