Skip to content

Conversation

@weizhengte
Copy link
Contributor

@weizhengte weizhengte commented May 3, 2023

Proposed changes

This PR enables periodic collection of statistics and is a precursor to automatic statistics collection. It mainly includes the following contents:

  1. support periodic collection of statistics.
  2. Change the type of Date in statistics p0 to DateV2(see [Enhancement](data-type) add FE config to prohibit create date and decimalv2 type #19077) for test locally. complement cases(remove Chinese characters, optimize code, etc) , improve stability.
  3. Supports setting whether to keep records of statistics synchronization job info, convenient for use in p0 testing.
  4. The statistics job table was modified, and some auxiliary judgments were added to avoid the user perceiving the modification. This function was removed when the table schema is stable.
    ...

Related syntax:

WITH PERIOD: specify the time in seconds, to collect statistics periodically.

-- column statistics
ANALYZE [ SYNC ] TABLE table_name
    [ (column_name [, ...]) ]
    [ [WITH SYNC] | [WITH INCREMENTAL] | [WITH SAMPLE PERCENT | ROWS ] | [WITH PERIOD] ]
    [ PROPERTIES ('key' = 'value', ...) ];

-- histogram statistics
ANALYZE [ SYNC ] TABLE table_name
    [ (column_name [, ...]) ]
    UPDATE HISTOGRAM
    [ [ WITH SYNC ][ WITH SAMPLE PERCENT | ROWS ][ WITH BUCKETS ] | [WITH PERIOD] ]
    [ PROPERTIES ('key' = 'value', ...) ];

Fe configuration:

  • enable_auto_collect_statistics: if true, will allow the system to collect statistics automatically.
  • auto_check_statistics_in_sec: the system automatically checks the time interval for statistics.
  • enable_save_statistics_sync_job: session variable. if true, when synchronously collecting statistics, the information of the statistics job will be saved, currently mainly used for p0 test.

Issue Number: close #xxx

Problem summary

Describe your changes.

Checklist(Required)

  • Does it affect the original behavior
  • Has unit tests been added
  • Has document been added or modified
  • Does it need to update dependencies
  • Is this PR support rollback (If NO, please explain WHY)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added area/planner Issues or PRs related to the query planner kind/test labels May 3, 2023
@weizhengte
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

hello-stephen commented May 4, 2023

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 35.53 seconds
stream load tsv: 431 seconds loaded 74807831229 Bytes, about 165 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 60 seconds loaded 1101869774 Bytes, about 17 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230506013627_clickbench_pr_139008.html

@weizhengte
Copy link
Contributor Author

run buildall

Kikyou1997
Kikyou1997 previously approved these changes May 5, 2023
@github-actions
Copy link
Contributor

github-actions bot commented May 5, 2023

PR approved by anyone and no changes requested.

@weizhengte
Copy link
Contributor Author

run buildall

@weizhengte
Copy link
Contributor Author

run buildall

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit 3f6e511 into apache:master May 6, 2023
Reminiscent pushed a commit to Reminiscent/doris that referenced this pull request May 15, 2023
…pache#19247)

This PR enables periodic collection of statistics and is a precursor to automatic statistics collection. It mainly includes the following contents:

support periodic collection of statistics.
Change the type of Date in statistics p0 to DateV2(see [Enhancement](data-type) add FE config to prohibit create date and decimalv2 type apache#19077) for test locally. complement cases(remove Chinese characters, optimize code, etc) , improve stability.
Supports setting whether to keep records of statistics synchronization job info, convenient for use in p0 testing.
The statistics job table was modified, and some auxiliary judgments were added to avoid the user perceiving the modification. This function was removed when the table schema is stable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/planner Issues or PRs related to the query planner kind/test reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants