-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feat](unique function) Add unique function #54414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
a9fe6d4 to
8b83b22
Compare
|
run buildall |
This reverts commit 147c2cd.
b3207dc to
e9fdd1a
Compare
|
run buildall |
459e683 to
573041a
Compare
|
run buildall |
TPC-H: Total hot run time: 33915 ms |
TPC-DS: Total hot run time: 169861 ms |
ClickBench: Total hot run time: 33.68 s |
573041a to
c035a25
Compare
|
run buildall |
TPC-H: Total hot run time: 33468 ms |
TPC-DS: Total hot run time: 170462 ms |
ClickBench: Total hot run time: 33.19 s |
c035a25 to
c57f878
Compare
|
run buildall |
|
run p0 |
FE UT Coverage ReportIncrement line coverage |
|
run p0 |
|
run nonconcurrent |
1 similar comment
|
run nonconcurrent |
|
run nonConcurrent |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run buildall |
TPC-H: Total hot run time: 33568 ms |
TPC-DS: Total hot run time: 184582 ms |
ClickBench: Total hot run time: 32.07 s |
|
run external |
FE UT Coverage ReportIncrement line coverage |
|
run cloud_p0 |
|
PR approved by at least one committer and no changes requested. |
in logical plan builder, there are bugs with between: - for sql `a between random() and random()`, since the two unbound `'random()` are the same, it will rewrite to `a = random()`, but the two random() should be different after bind expression; - for sql `random() between 0.1 and 0.5`, it will rewrite to `random() >= 0.1 and random() <= 0.5`, later when bind expression, the two unbound random() will generate two different bounded random() function, but the two random() need to be the same. so, in logical plan builder, the between shouldn't compare low bound and upper bound, and should not expand before bind expression. relate PR: 1) unique function: #54414 2) remove between: #23421
What problem does this PR solve?
for sql
random() > 10 and random() < 5, the tworandom()are different, in order to deal with this case, introduce a class UniqueFunction, UniqueFunction hold a unique id, when check two unique function equal or not is just to compare their class types and the unique id.UniqueFunction have four child classes:
in most case, two unique function should treat as different and have different unique id, but in aggregate, need to treat some unique functions have the same unique id, otherwise the aggregate will throw error. so need to bind their unqiue id to some other's unique id;
here is the detail:
for unique scalar functions in PROJECT/HAVING/QUALIFY/SORT/AGG OUTPUT/REPEAT OUTPUT,
if they have a related AGG plan, need bind their unique id to the matched AGG's group by unique functions.
case as below:
example i:
since it does not contain aggregate, so random1(), random2(), random3(), random4() will be different and have different unique id;
example ii:
it will rewrite as:
since the aggregate's group by list is empty, so random1(), random2(), will be different.
example iii:
since the group by list (a) not contain unique function, so random1(), random2(), random3() will be different.
example:
firstly, handle with the group by: if two group by can whole match, then their matched unique function will be equal and have the same unique id, so random7() equals with random8(), random9() will be equals with random10(), but random7() not equals with random9.
then handle with the PROJECT/HAVING/QUALIFY/SORT/AGG OUTPUT/REPEAT OUTPUT expressions, and we will have:
random1()/random3()/random11() are equal to random7(), and random2()/random4()/random12() are equal to random9(), then update their unique id to the same.
notice for FILTER random5()/random6(), they will be different with all other randoms.
example:
firstly, handle with the grouping sets: random5() equals to random6(), random7() equals random8();
then, handle with the repeat output: random1()/random2() equal to random5(), random3()/random4() equal to random7().
example:
it will rewrite as:
for the rewritten aggregate:
firstly, for the group by: the group by expressions will not try to match with each other even if they seem look the same, so random1() will not equal to random2(), random3() will not equal to random4(),
then handle with the PROJECT/HAVING/QUALIFY/SORT, they will match with the first matched longest group by expression, so random5() equals to random1 but not equal to random2(), random6() equals to random3() but not equal to random5(), then update their unique id to the same.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)