Skip to content

Conversation

@Kikyou1997
Copy link
Contributor

@Kikyou1997 Kikyou1997 commented Sep 6, 2023

Proposed changes

  1. Analyze with sample automatically when table size is greater than huge_table_lower_bound_size_in_bytes(5G by default). User can disable this feature by fe option enable_auto_sample
  2. Support grammer like ANALYZE TABLE test WITH FULL to force do full analyze whatever table size is
  3. Fix bugs that tables stats doesn't get updated properly when stats is dropped, or only few column is analyzed

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@Kikyou1997
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 48.14 seconds
stream load tsv: 538 seconds loaded 74807831229 Bytes, about 132 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17162368657 Bytes

@Kikyou1997
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.24 seconds
stream load tsv: 546 seconds loaded 74807831229 Bytes, about 130 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.3 seconds inserted 10000000 Rows, about 341K ops/s
storage size: 17161924439 Bytes

@Kikyou1997
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.54 seconds
stream load tsv: 541 seconds loaded 74807831229 Bytes, about 131 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17161911964 Bytes

@Kikyou1997
Copy link
Contributor Author

run buildall

1 similar comment
@shuke987
Copy link
Collaborator

shuke987 commented Sep 8, 2023

run buildall

@Kikyou1997
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 49.01 seconds
stream load tsv: 530 seconds loaded 74807831229 Bytes, about 134 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.1 seconds inserted 10000000 Rows, about 343K ops/s
storage size: 17162153498 Bytes

@Kikyou1997
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.45 seconds
stream load tsv: 547 seconds loaded 74807831229 Bytes, about 130 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17162002390 Bytes

@Kikyou1997
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.97 seconds
stream load tsv: 579 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.3 seconds inserted 10000000 Rows, about 341K ops/s
storage size: 17162357413 Bytes

@Kikyou1997 Kikyou1997 force-pushed the feat/sample_auto branch 2 times, most recently from 26909c2 to fe34113 Compare September 11, 2023 07:50
@Kikyou1997
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.37 seconds
stream load tsv: 581 seconds loaded 74807831229 Bytes, about 122 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s
storage size: 17162148612 Bytes

@morrySnow morrySnow changed the title [feat](nereids): Support analyze with sample automatically [feat](stats) Support analyze with sample automatically Sep 11, 2023
@Kikyou1997
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.5 seconds
stream load tsv: 579 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162148420 Bytes

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.6 seconds
stream load tsv: 581 seconds loaded 74807831229 Bytes, about 122 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17162159832 Bytes

@Kikyou1997
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.28 seconds
stream load tsv: 575 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162006042 Bytes

Copy link
Collaborator

@wm1581066 wm1581066 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@Kikyou1997
Copy link
Contributor Author

run buildall

@Kikyou1997
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.16 seconds
stream load tsv: 600 seconds loaded 74807831229 Bytes, about 118 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17162169472 Bytes

@Kikyou1997
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.65 seconds
stream load tsv: 593 seconds loaded 74807831229 Bytes, about 120 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17162550848 Bytes

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 13, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morrySnow morrySnow merged commit 786a721 into apache:master Sep 13, 2023
morningman pushed a commit that referenced this pull request Oct 13, 2023
…anch 2.0 (#25119)

This PR is composed of belowing commits which has been merged to Doirs master:

* #24769
* #24672
* #24599
* #24521
* #24405
* #24237
* #24135
* #24074
* #24026
* #23992
* #23978
* #23622
* #23507
* #23354
* #23103
* #22963
* #22896
* #22775
* #22773
morningman pushed a commit that referenced this pull request Oct 15, 2023
….0 (#25421)

This PR is composed of belowing commits which has been merged to Doirs master:

* #24769
* #24672
* #24599
* #24521
* #24405
* #24237
* #24135
* #24074
* #24026
* #23992
* #23978
* #23622
* #23507
* #23354
* #23103
* #22963
* #22896
* #22775
* #22773

After this PR, when user upgrade Doris from 2.0.2 to 2.0.3, the origin info in AnalysisManager will be ignored, and the new module AnalysisManagerV2 will be saved(with more info).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants