Skip to content

[Feature] Extract wide common factors#6083

Merged
morningman merged 7 commits intoapache:masterfrom
EmmyMiao87:merge_expr
Jul 1, 2021
Merged

[Feature] Extract wide common factors#6083
morningman merged 7 commits intoapache:masterfrom
EmmyMiao87:merge_expr

Conversation

@EmmyMiao87
Copy link
Contributor

@EmmyMiao87 EmmyMiao87 commented Jun 22, 2021

Proposed changes

This PR mainly adds a rewrite rule 'ExtractCommonFactorsRule'
used to extract wide common factors in the planning stage for 'Expr'.
The main purpose of this rule is to extract (Range or In) expressions
that can be combined from each or clause.
E.g:
Origin expr: (1<a<3 and b in ('a') ) or (2<a<4 and b in ('b'))
Rewritten expr: (1<a<4 ) and (b in ('a', 'b')) and ((1<a<3 and b in ('a') ) or (2<a<4 and b in ('b')))
Although the range of the wide common factors is larger than the real range,
the wide common factors only involve a single column, so it can be pushed down to the scan node,
thereby reducing the amount of scanned data in advance and improving the query speed.

It should be noted that this optimization strategy is not for all scenarios.
When filter rate of the wide common factor is too low,
the query will consume an extra time to calculate the wide common factors.

So this strategy can be switched by configuring session variables 'extract_wide_range_expr'.
The default policy is enabled which means this strategy takes effect.
If you encounter unsatisfactory filtering rate, you can set the variable to false.
It will turn off the strategy.

Fixed #6082

Types of changes

What types of changes does your code introduce to Doris?
Put an x in the boxes that apply

  • New feature (non-breaking change which adds functionality)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have created an issue on (Fix [Feature] Extract wide common factors #6082) and described the bug/feature there in detail
  • Compiling and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • If these changes need document changes, I have updated the document
  • Any dependent changes have been merged

@EmmyMiao87 EmmyMiao87 added area/planner Issues or PRs related to the query planner kind/feature Categorizes issue or PR as related to a new feature. labels Jun 22, 2021

Doris在这部分进行了优化处理,如果下层的数据节点过多。exchange node会启动多线程进行并行归并来加速排序过程。该参数默认为False,即表示 exchange node 不采取并行的归并排序,来减少额外的CPU和内存消耗。

* `extract_wide_range_expr`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_wide_range_expr is the name of who this implementation, not the name of the function, you may change the name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extract_common_factors?

This PR mainly adds a rewrite rule 'ExtractCommonFactorsRule'
  used to extract wide common factors in the planning stage for 'Expr'.
The main purpose of this rule is to extract (Range or In) expressions
  that can be combined from each or clause.
E.g:
  Origin expr: (1<a<3 and b in ('a') ) or (2<a<4 and b in ('b'))
  Rewritten expr: (1<a<4 ) and (b in ('a', 'b')) and ((1<a<3 and b in ('a') ) or (2<a<4 and b in ('b')))
Although the range of the wide common factors is larger than the real range,
  the wide common factors only involve a single column, so it can be pushed down to the scan node,
  thereby reducing the amount of scanned data in advance and improving the query speed.

It should be noted that this optimization strategy is not for all scenarios.
When filter rate of the wide common factor is too low,
  the query will consume an extra time to calculate the wide common factors.

So this strategy can be switched by configuring session vairables 'extract_wide_range_expr'.
The default policy is enabled which means this strategy takes effect.
If you encounter unsatisfactory filtering rate, you can set the variable to false.
It will turn off the strategy.

Fixed apache#6082
yangzhg
yangzhg previously approved these changes Jun 29, 2021
Copy link
Member

@yangzhg yangzhg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yangzhg yangzhg added the approved Indicates a PR has been approved by one committer. label Jun 29, 2021
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 2a1b239 into apache:master Jul 1, 2021
stalary pushed a commit to stalary/doris that referenced this pull request Jul 8, 2021
This PR mainly adds a rewrite rule 'ExtractCommonFactorsRule'
  used to extract wide common factors in the planning stage for 'Expr'.
The main purpose of this rule is to extract (Range or In) expressions
  that can be combined from each or clause.
E.g:
  Origin expr: (1<a<3 and b in ('a') ) or (2<a<4 and b in ('b'))
  Rewritten expr: (1<a<4 ) and (b in ('a', 'b')) and ((1<a<3 and b in ('a') ) or (2<a<4 and b in ('b')))
Although the range of the wide common factors is larger than the real range,
  the wide common factors only involve a single column, so it can be pushed down to the scan node,
  thereby reducing the amount of scanned data in advance and improving the query speed.

It should be noted that this optimization strategy is not for all scenarios.
When filter rate of the wide common factor is too low,
  the query will consume an extra time to calculate the wide common factors.

So this strategy can be switched by configuring session vairables 'extract_wide_range_expr'.
The default policy is enabled which means this strategy takes effect.
If you encounter unsatisfactory filtering rate, you can set the variable to false.
It will turn off the strategy.

Fixed apache#6082
stalary pushed a commit to stalary/doris that referenced this pull request Jul 27, 2021
This PR mainly adds a rewrite rule 'ExtractCommonFactorsRule'
  used to extract wide common factors in the planning stage for 'Expr'.
The main purpose of this rule is to extract (Range or In) expressions
  that can be combined from each or clause.
E.g:
  Origin expr: (1<a<3 and b in ('a') ) or (2<a<4 and b in ('b'))
  Rewritten expr: (1<a<4 ) and (b in ('a', 'b')) and ((1<a<3 and b in ('a') ) or (2<a<4 and b in ('b')))
Although the range of the wide common factors is larger than the real range,
  the wide common factors only involve a single column, so it can be pushed down to the scan node,
  thereby reducing the amount of scanned data in advance and improving the query speed.

It should be noted that this optimization strategy is not for all scenarios.
When filter rate of the wide common factor is too low,
  the query will consume an extra time to calculate the wide common factors.

So this strategy can be switched by configuring session vairables 'extract_wide_range_expr'.
The default policy is enabled which means this strategy takes effect.
If you encounter unsatisfactory filtering rate, you can set the variable to false.
It will turn off the strategy.

Fixed apache#6082
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/planner Issues or PRs related to the query planner kind/feature Categorizes issue or PR as related to a new feature.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Extract wide common factors

3 participants