-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Feat](nereids) add transform rule MergePercentileToArray #34313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat](nereids) add transform rule MergePercentileToArray #34313
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
TPC-DS: Total hot run time: 187124 ms |
|
wait for percentileArray perf opt |
e97e3ff to
64401c7
Compare
|
run buildall |
64401c7 to
690d4d1
Compare
|
run buildall |
TPC-H: Total hot run time: 41112 ms |
|
run xcloud_p1 |
|
run cloud_p1 |
|
run performance |
TPC-H: Total hot run time: 41695 ms |
TPC-DS: Total hot run time: 169184 ms |
ClickBench: Total hot run time: 30.65 s |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
MergePercentileToArray is to perform a transformation in this case: select ss_item_sk, percentile(ss_quantity,0.9), percentile(ss_quantity,0.6), percentile(ss_quantity,0.3) from store_sales group by ss_item_sk; ===> select ss_item_sk, percentile_array(ss_quantity,[0.3,0.6,0.9]) from store_sales group by ss_item_sk;
MergePercentileToArray is to perform a transformation in this case: select ss_item_sk, percentile(ss_quantity,0.9), percentile(ss_quantity,0.6), percentile(ss_quantity,0.3) from store_sales group by ss_item_sk; ===> select ss_item_sk, percentile_array(ss_quantity,[0.3,0.6,0.9]) from store_sales group by ss_item_sk;
cherry-pick #34313 to branch-2.1 MergePercentileToArray is to perform a transformation in this case: select ss_item_sk, percentile(ss_quantity,0.9), percentile(ss_quantity,0.6), percentile(ss_quantity,0.3) from store_sales group by ss_item_sk; ==> select ss_item_sk, percentile_array(ss_quantity,[0.3,0.6,0.9]) from store_sales group by ss_item_sk;
MergePercentileToArray is to perform a transformation in this case: select ss_item_sk, percentile(ss_quantity,0.9), percentile(ss_quantity,0.6), percentile(ss_quantity,0.3) from store_sales group by ss_item_sk; ===> select ss_item_sk, percentile_array(ss_quantity,[0.3,0.6,0.9]) from store_sales group by ss_item_sk;
…on (#44783) Related PR: #34313 Problem Summary The original PR did not handle the following scenario: ```sql SELECT SUM(a), PERCENTILE(pk, 0.1) AS c1, PERCENTILE(pk, 0.1) AS c2, PERCENTILE(pk, 0.4) AS c3 FROM test_merge_percentile; ``` In this case, the aggregate outputs include two identical functions (PERCENTILE(pk, 0.1)). When constructing the LogicalProject, a map was used where the key is the child of an Alias and the value is the Alias itself. However, this approach loses information when two Aliases share the same child. This PR modifies the map structure to use the child of an Alias as the key and a list of Alias objects as the value. This ensures that all Alias instances with the same child are preserved, resolving the issue of lost information in such cases.
…on (#44783) Related PR: #34313 Problem Summary The original PR did not handle the following scenario: ```sql SELECT SUM(a), PERCENTILE(pk, 0.1) AS c1, PERCENTILE(pk, 0.1) AS c2, PERCENTILE(pk, 0.4) AS c3 FROM test_merge_percentile; ``` In this case, the aggregate outputs include two identical functions (PERCENTILE(pk, 0.1)). When constructing the LogicalProject, a map was used where the key is the child of an Alias and the value is the Alias itself. However, this approach loses information when two Aliases share the same child. This PR modifies the map structure to use the child of an Alias as the key and a list of Alias objects as the value. This ensures that all Alias instances with the same child are preserved, resolving the issue of lost information in such cases.
…on (#44783) Related PR: #34313 Problem Summary The original PR did not handle the following scenario: ```sql SELECT SUM(a), PERCENTILE(pk, 0.1) AS c1, PERCENTILE(pk, 0.1) AS c2, PERCENTILE(pk, 0.4) AS c3 FROM test_merge_percentile; ``` In this case, the aggregate outputs include two identical functions (PERCENTILE(pk, 0.1)). When constructing the LogicalProject, a map was used where the key is the child of an Alias and the value is the Alias itself. However, this approach loses information when two Aliases share the same child. This PR modifies the map structure to use the child of an Alias as the key and a list of Alias objects as the value. This ensures that all Alias instances with the same child are preserved, resolving the issue of lost information in such cases.
### What problem does this PR solve? Related PR: #34313 Problem Summary: When the second argument of percentile is not literal, will report error :index out of range. This pr fix this bug, taking the second argument of percentile directly, and changing the arguments of percentilearray from array literal to array expression.
### What problem does this PR solve? Related PR: #34313 Problem Summary: When the second argument of percentile is not literal, will report error :index out of range. This pr fix this bug, taking the second argument of percentile directly, and changing the arguments of percentilearray from array literal to array expression.
### What problem does this PR solve? Related PR: #34313 Problem Summary: When the second argument of percentile is not literal, will report error :index out of range. This pr fix this bug, taking the second argument of percentile directly, and changing the arguments of percentilearray from array literal to array expression.
### What problem does this PR solve? Related PR: apache#34313 Problem Summary: When the second argument of percentile is not literal, will report error :index out of range. This pr fix this bug, taking the second argument of percentile directly, and changing the arguments of percentilearray from array literal to array expression.
MergePercentileToArray is to perform a transformation in this case:
select ss_item_sk, percentile(ss_quantity,0.9), percentile(ss_quantity,0.6), percentile(ss_quantity,0.3) from store_sales group by ss_item_sk;
=》
select ss_item_sk, percentile_array(ss_quantity,[0.3,0.6,0.9]) from store_sales group by ss_item_sk;