-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Optimization](statistics) optimize Incremental statistics collection and statistics cleaning #18653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
44ee653 to
1a093b5
Compare
|
run buildall |
|
TeamCity pipeline, clickbench performance test result: |
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisHelper.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisHelper.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisHelper.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisHelper.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/statistics/StatisticsRepository.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisManager.java
Outdated
Show resolved
Hide resolved
Kikyou1997
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further modifications and discussions are needed for this PR before it could be merged
d0cf7a1 to
a3dc490
Compare
|
run buildall |
I have removed the code that is not part of this pr~ |
a3dc490 to
b1d323b
Compare
b1d323b to
e39194c
Compare
|
run buildall |
|
PR approved by anyone and no changes requested. |
|
Wait until the sampling statistics pr is comerged before modifying |
…18880) 1. Supports sampling to collect statistics 2. Improved syntax for collecting statistics 3. Support histogram specifies the number of buckets 4. Tweaked some code structure --- The syntax supports WITH and PROPERTIES, using the same syntax as before. Column Statistics Collection Syntax: ```SQL ANALYZE [ SYNC ] TABLE table_name [ (column_name [, ...]) ] [ [WITH SYNC] | [WITH INCREMENTAL] | [WITH SAMPLE PERCENT | ROWS ] ] [ PROPERTIES ('key' = 'value', ...) ]; ``` Column histogram collection syntax: ```SQL ANALYZE [ SYNC ] TABLE table_name [ (column_name [, ...]) ] UPDATE HISTOGRAM [ [ WITH SYNC ][ WITH INCREMENTAL ][ WITH SAMPLE PERCENT | ROWS ][ WITH BUCKETS ] ] [ PROPERTIES ('key' = 'value', ...) ]; ``` Illustrate: - sync:Collect statistics synchronously. Return after collecting. - incremental:Collect statistics incrementally. Incremental collection of histogram statistics is not supported. - sample percent | rows:Collect statistics by sampling. Scale and number of rows can be sampled. - buckets:Specifies the maximum number of buckets generated when collecting histogram statistics. - table_name: The purpose table for collecting statistics. Can be of the form `db_name.table_name`. - column_name: The specified destination column must be a column that exists in `table_name`, and multiple column names are separated by commas. - properties:Properties used to set statistics tasks. Currently only the following configurations are supported (equivalent to the with statement) - 'sync' = 'true' - 'incremental' = 'true' - 'sample.percent' = '50' - 'sample.rows' = '1000' - 'num.buckets' = 10 --- TODO: - Supplement the complete p0 test - `Incremental` statistics see #18653
…pache#18880) 1. Supports sampling to collect statistics 2. Improved syntax for collecting statistics 3. Support histogram specifies the number of buckets 4. Tweaked some code structure --- The syntax supports WITH and PROPERTIES, using the same syntax as before. Column Statistics Collection Syntax: ```SQL ANALYZE [ SYNC ] TABLE table_name [ (column_name [, ...]) ] [ [WITH SYNC] | [WITH INCREMENTAL] | [WITH SAMPLE PERCENT | ROWS ] ] [ PROPERTIES ('key' = 'value', ...) ]; ``` Column histogram collection syntax: ```SQL ANALYZE [ SYNC ] TABLE table_name [ (column_name [, ...]) ] UPDATE HISTOGRAM [ [ WITH SYNC ][ WITH INCREMENTAL ][ WITH SAMPLE PERCENT | ROWS ][ WITH BUCKETS ] ] [ PROPERTIES ('key' = 'value', ...) ]; ``` Illustrate: - sync:Collect statistics synchronously. Return after collecting. - incremental:Collect statistics incrementally. Incremental collection of histogram statistics is not supported. - sample percent | rows:Collect statistics by sampling. Scale and number of rows can be sampled. - buckets:Specifies the maximum number of buckets generated when collecting histogram statistics. - table_name: The purpose table for collecting statistics. Can be of the form `db_name.table_name`. - column_name: The specified destination column must be a column that exists in `table_name`, and multiple column names are separated by commas. - properties:Properties used to set statistics tasks. Currently only the following configurations are supported (equivalent to the with statement) - 'sync' = 'true' - 'incremental' = 'true' - 'sample.percent' = '50' - 'sample.rows' = '1000' - 'num.buckets' = 10 --- TODO: - Supplement the complete p0 test - `Incremental` statistics see apache#18653
753c29c to
b95f22c
Compare
dc76110 to
58933ca
Compare
|
run buildall |
|
run clickbench |
…pache#18880) 1. Supports sampling to collect statistics 2. Improved syntax for collecting statistics 3. Support histogram specifies the number of buckets 4. Tweaked some code structure --- The syntax supports WITH and PROPERTIES, using the same syntax as before. Column Statistics Collection Syntax: ```SQL ANALYZE [ SYNC ] TABLE table_name [ (column_name [, ...]) ] [ [WITH SYNC] | [WITH INCREMENTAL] | [WITH SAMPLE PERCENT | ROWS ] ] [ PROPERTIES ('key' = 'value', ...) ]; ``` Column histogram collection syntax: ```SQL ANALYZE [ SYNC ] TABLE table_name [ (column_name [, ...]) ] UPDATE HISTOGRAM [ [ WITH SYNC ][ WITH INCREMENTAL ][ WITH SAMPLE PERCENT | ROWS ][ WITH BUCKETS ] ] [ PROPERTIES ('key' = 'value', ...) ]; ``` Illustrate: - sync:Collect statistics synchronously. Return after collecting. - incremental:Collect statistics incrementally. Incremental collection of histogram statistics is not supported. - sample percent | rows:Collect statistics by sampling. Scale and number of rows can be sampled. - buckets:Specifies the maximum number of buckets generated when collecting histogram statistics. - table_name: The purpose table for collecting statistics. Can be of the form `db_name.table_name`. - column_name: The specified destination column must be a column that exists in `table_name`, and multiple column names are separated by commas. - properties:Properties used to set statistics tasks. Currently only the following configurations are supported (equivalent to the with statement) - 'sync' = 'true' - 'incremental' = 'true' - 'sample.percent' = '50' - 'sample.rows' = '1000' - 'num.buckets' = 10 --- TODO: - Supplement the complete p0 test - `Incremental` statistics see apache#18653
Proposed changes
This pr mainly optimizes the following items:
TODO: Supports incremental collection of materialized view statistics.
Issue Number: close #xxx
Problem summary
Describe your changes.
Checklist(Required)
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...