-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improvement](mtmv) Add id to statistics map in statement context for cost estimation later #35436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… cost estimation later
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
TPC-H: Total hot run time: 41317 ms |
| // Maybe return null, which means the id according statistics should calc normally rather than getting | ||
| // form this map | ||
| // id maybe relation id or cteId or other type of id | ||
| private final Map<Pair<Id, Class<? extends Id>>, Statistics> idToStatisticsMap = new LinkedHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just support relation id now
TPC-DS: Total hot run time: 169253 ms |
ClickBench: Total hot run time: 31.16 s |
|
run buildall |
TPC-H: Total hot run time: 41713 ms |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-DS: Total hot run time: 171030 ms |
ClickBench: Total hot run time: 30.64 s |
… cost estimation later (#35436) Add id to statistics map in statement context for cost estimation later this helps to improve the probability to use materialized view when query a single table with aggregate and many filter
… cost estimation later (#35436) Add id to statistics map in statement context for cost estimation later this helps to improve the probability to use materialized view when query a single table with aggregate and many filter
…ats to mv scan plan based (#35749) this is brought by #35436 in the method `MaterializationContext#getPlanStatistics` this get the materialization context orginal plan statistics. but the `expressionToColumnStats` in statistics is the slot of original plan. We want the statistics of original plan but the `expressionToColumnStats` in which should be mv scan plan based actually. So add the method `MaterializationContext#normalizeStatisticsColumnExpression`. when after generate the PlanStatistics in MaterializationContext, should call the normalizeStatisticsColumnExpression method.
…ats to mv scan plan based (#35749) this is brought by #35436 in the method `MaterializationContext#getPlanStatistics` this get the materialization context orginal plan statistics. but the `expressionToColumnStats` in statistics is the slot of original plan. We want the statistics of original plan but the `expressionToColumnStats` in which should be mv scan plan based actually. So add the method `MaterializationContext#normalizeStatisticsColumnExpression`. when after generate the PlanStatistics in MaterializationContext, should call the normalizeStatisticsColumnExpression method.
…ats to mv scan plan based (apache#35749) this is brought by apache#35436 in the method `MaterializationContext#getPlanStatistics` this get the materialization context orginal plan statistics. but the `expressionToColumnStats` in statistics is the slot of original plan. We want the statistics of original plan but the `expressionToColumnStats` in which should be mv scan plan based actually. So add the method `MaterializationContext#normalizeStatisticsColumnExpression`. when after generate the PlanStatistics in MaterializationContext, should call the normalizeStatisticsColumnExpression method.
…ats to mv scan plan based (apache#35749) this is brought by apache#35436 in the method `MaterializationContext#getPlanStatistics` this get the materialization context orginal plan statistics. but the `expressionToColumnStats` in statistics is the slot of original plan. We want the statistics of original plan but the `expressionToColumnStats` in which should be mv scan plan based actually. So add the method `MaterializationContext#normalizeStatisticsColumnExpression`. when after generate the PlanStatistics in MaterializationContext, should call the normalizeStatisticsColumnExpression method.
…ats to mv scan plan based (apache#35749) this is brought by apache#35436 in the method `MaterializationContext#getPlanStatistics` this get the materialization context orginal plan statistics. but the `expressionToColumnStats` in statistics is the slot of original plan. We want the statistics of original plan but the `expressionToColumnStats` in which should be mv scan plan based actually. So add the method `MaterializationContext#normalizeStatisticsColumnExpression`. when after generate the PlanStatistics in MaterializationContext, should call the normalizeStatisticsColumnExpression method.
…ats to mv scan plan based (apache#35749) this is brought by apache#35436 in the method `MaterializationContext#getPlanStatistics` this get the materialization context orginal plan statistics. but the `expressionToColumnStats` in statistics is the slot of original plan. We want the statistics of original plan but the `expressionToColumnStats` in which should be mv scan plan based actually. So add the method `MaterializationContext#normalizeStatisticsColumnExpression`. when after generate the PlanStatistics in MaterializationContext, should call the normalizeStatisticsColumnExpression method.
Proposed changes
Add id to statistics map in statement context for cost estimation later
this helps to improve the probability to use materialized view when query a single table with aggregate and many filter
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...