[Feature] Extract wide common factors by EmmyMiao87 · Pull Request #6083 · apache/doris

EmmyMiao87 · 2021-06-22T13:18:59Z

Proposed changes

This PR mainly adds a rewrite rule 'ExtractCommonFactorsRule'
used to extract wide common factors in the planning stage for 'Expr'.
The main purpose of this rule is to extract (Range or In) expressions
that can be combined from each or clause.
E.g:
Origin expr: (1<a<3 and b in ('a') ) or (2<a<4 and b in ('b'))
Rewritten expr: (1<a<4 ) and (b in ('a', 'b')) and ((1<a<3 and b in ('a') ) or (2<a<4 and b in ('b')))
Although the range of the wide common factors is larger than the real range,
the wide common factors only involve a single column, so it can be pushed down to the scan node,
thereby reducing the amount of scanned data in advance and improving the query speed.

It should be noted that this optimization strategy is not for all scenarios.
When filter rate of the wide common factor is too low,
the query will consume an extra time to calculate the wide common factors.

So this strategy can be switched by configuring session variables 'extract_wide_range_expr'.
The default policy is enabled which means this strategy takes effect.
If you encounter unsatisfactory filtering rate, you can set the variable to false.
It will turn off the strategy.

Fixed #6082

Types of changes

What types of changes does your code introduce to Doris?
Put an x in the boxes that apply

New feature (non-breaking change which adds functionality)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

I have created an issue on (Fix [Feature] Extract wide common factors #6082) and described the bug/feature there in detail
Compiling and unit tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
If these changes need document changes, I have updated the document
Any dependent changes have been merged

yangzhg · 2021-06-24T01:43:57Z

docs/zh-CN/administrator-guide/variables.md


    Doris在这部分进行了优化处理，如果下层的数据节点过多。exchange node会启动多线程进行并行归并来加速排序过程。该参数默认为False，即表示 exchange node 不采取并行的归并排序，来减少额外的CPU和内存消耗。
+
+* `extract_wide_range_expr`


extract_wide_range_expr is the name of who this implementation, not the name of the function， you may change the name

Maybe extract_common_factors?

This PR mainly adds a rewrite rule 'ExtractCommonFactorsRule' used to extract wide common factors in the planning stage for 'Expr'. The main purpose of this rule is to extract (Range or In) expressions that can be combined from each or clause. E.g: Origin expr: (1<a<3 and b in ('a') ) or (2<a<4 and b in ('b')) Rewritten expr: (1<a<4 ) and (b in ('a', 'b')) and ((1<a<3 and b in ('a') ) or (2<a<4 and b in ('b'))) Although the range of the wide common factors is larger than the real range, the wide common factors only involve a single column, so it can be pushed down to the scan node, thereby reducing the amount of scanned data in advance and improving the query speed. It should be noted that this optimization strategy is not for all scenarios. When filter rate of the wide common factor is too low, the query will consume an extra time to calculate the wide common factors. So this strategy can be switched by configuring session vairables 'extract_wide_range_expr'. The default policy is enabled which means this strategy takes effect. If you encounter unsatisfactory filtering rate, you can set the variable to false. It will turn off the strategy. Fixed apache#6082

fe/fe-core/src/main/java/org/apache/doris/rewrite/ExprRewriter.java

yangzhg

LGTM

morningman

LGTM

This PR mainly adds a rewrite rule 'ExtractCommonFactorsRule' used to extract wide common factors in the planning stage for 'Expr'. The main purpose of this rule is to extract (Range or In) expressions that can be combined from each or clause. E.g: Origin expr: (1<a<3 and b in ('a') ) or (2<a<4 and b in ('b')) Rewritten expr: (1<a<4 ) and (b in ('a', 'b')) and ((1<a<3 and b in ('a') ) or (2<a<4 and b in ('b'))) Although the range of the wide common factors is larger than the real range, the wide common factors only involve a single column, so it can be pushed down to the scan node, thereby reducing the amount of scanned data in advance and improving the query speed. It should be noted that this optimization strategy is not for all scenarios. When filter rate of the wide common factor is too low, the query will consume an extra time to calculate the wide common factors. So this strategy can be switched by configuring session vairables 'extract_wide_range_expr'. The default policy is enabled which means this strategy takes effect. If you encounter unsatisfactory filtering rate, you can set the variable to false. It will turn off the strategy. Fixed apache#6082

EmmyMiao87 added area/planner Issues or PRs related to the query planner kind/feature Categorizes issue or PR as related to a new feature. labels Jun 22, 2021

yangzhg reviewed Jun 24, 2021

View reviewed changes

EmmyMiao87 added 5 commits June 24, 2021 15:08

Add comments

14348f8

Add unit test

6bc1942

Add doc

8dde18c

Fix hash code error

ddb1d11

EmmyMiao87 force-pushed the merge_expr branch from 304ac7e to ddb1d11 Compare June 24, 2021 07:08

fix ut

b4e0abb

morningman reviewed Jun 28, 2021

View reviewed changes

fe/fe-core/src/main/java/org/apache/doris/rewrite/ExprRewriter.java Show resolved Hide resolved

yangzhg previously approved these changes Jun 29, 2021

View reviewed changes

yangzhg added the approved Indicates a PR has been approved by one committer. label Jun 29, 2021

Add comment

44fe7bc

EmmyMiao87 dismissed yangzhg’s stale review via 44fe7bc June 30, 2021 03:03

morningman approved these changes Jun 30, 2021

View reviewed changes

morningman merged commit 2a1b239 into apache:master Jul 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Extract wide common factors#6083

[Feature] Extract wide common factors#6083
morningman merged 7 commits intoapache:masterfrom
EmmyMiao87:merge_expr

EmmyMiao87 commented Jun 22, 2021 •

edited

Loading

Uh oh!

yangzhg Jun 24, 2021

Uh oh!

EmmyMiao87 Jun 24, 2021

Uh oh!

Uh oh!

yangzhg left a comment

Uh oh!

morningman left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		Doris在这部分进行了优化处理，如果下层的数据节点过多。exchange node会启动多线程进行并行归并来加速排序过程。该参数默认为False，即表示 exchange node 不采取并行的归并排序，来减少额外的CPU和内存消耗。

		* `extract_wide_range_expr`

Conversation

EmmyMiao87 commented Jun 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Types of changes

Checklist

Uh oh!

yangzhg Jun 24, 2021

Choose a reason for hiding this comment

Uh oh!

EmmyMiao87 Jun 24, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yangzhg left a comment

Choose a reason for hiding this comment

Uh oh!

morningman left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

EmmyMiao87 commented Jun 22, 2021 •

edited

Loading