Skip to content

Conversation

@Kikyou1997
Copy link
Contributor

@Kikyou1997 Kikyou1997 commented Aug 14, 2023

Proposed changes

  1. Significantly reduce the footprint of ResultRow and improve the performance of analyze by group insert partition stats
  2. Remove table_statistics table and remove support for with auto grammer
  3. Fix bug that literalexpr of columnstats would be lost when serialize and send to other FEs
  4. Make health threshold configurable

I've test this PR on ssb dataset which table definition could be found under tools directory, after this PR the IO request could reduce at least 80% on it.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@Kikyou1997 Kikyou1997 force-pushed the refactor/remove_automatic_analyze branch 3 times, most recently from 6df3c91 to 2b9266e Compare August 16, 2023 02:39
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Kikyou1997 Kikyou1997 force-pushed the refactor/remove_automatic_analyze branch from 2b9266e to 28bb5ec Compare August 16, 2023 06:43
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Kikyou1997 Kikyou1997 force-pushed the refactor/remove_automatic_analyze branch from 28bb5ec to 5c1ca37 Compare August 16, 2023 07:22
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Kikyou1997 Kikyou1997 force-pushed the refactor/remove_automatic_analyze branch from 5c1ca37 to 1484435 Compare August 16, 2023 09:40
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Kikyou1997 Kikyou1997 force-pushed the refactor/remove_automatic_analyze branch from 1484435 to 74da6c2 Compare August 18, 2023 03:02
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Kikyou1997 Kikyou1997 force-pushed the refactor/remove_automatic_analyze branch from 74da6c2 to 73ac6d9 Compare August 18, 2023 03:56
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Kikyou1997 Kikyou1997 force-pushed the refactor/remove_automatic_analyze branch from 73ac6d9 to 0fec7be Compare August 18, 2023 07:52
@Kikyou1997
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.91 seconds
stream load tsv: 538 seconds loaded 74807831229 Bytes, about 132 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.4 seconds inserted 10000000 Rows, about 340K ops/s
storage size: 17162005570 Bytes

@Kikyou1997 Kikyou1997 force-pushed the refactor/remove_automatic_analyze branch from 0fec7be to e05b275 Compare August 20, 2023 04:00
@Kikyou1997
Copy link
Contributor Author

run buildall

@Kikyou1997 Kikyou1997 changed the title [draft](nereids) Refactor stats collection framework [refactor](nereids) Refactor stats collection framework Aug 20, 2023
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.16 seconds
stream load tsv: 559 seconds loaded 74807831229 Bytes, about 127 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.5 seconds inserted 10000000 Rows, about 338K ops/s
storage size: 17161887726 Bytes

@Kikyou1997 Kikyou1997 force-pushed the refactor/remove_automatic_analyze branch from e05b275 to f980783 Compare August 21, 2023 10:33
@Kikyou1997
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.81 seconds
stream load tsv: 542 seconds loaded 74807831229 Bytes, about 131 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 29.6 seconds inserted 10000000 Rows, about 337K ops/s
storage size: 17161950153 Bytes

@Kikyou1997
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Aug 22, 2023
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.41 seconds
stream load tsv: 546 seconds loaded 74807831229 Bytes, about 130 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.4 seconds inserted 10000000 Rows, about 340K ops/s
storage size: 17162204318 Bytes

englefly
englefly previously approved these changes Aug 22, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 22, 2023
starocean999
starocean999 previously approved these changes Aug 22, 2023
@Kikyou1997 Kikyou1997 dismissed stale reviews from starocean999 and englefly via 76dbf28 August 22, 2023 09:18
@Kikyou1997 Kikyou1997 force-pushed the refactor/remove_automatic_analyze branch from 4563f67 to 76dbf28 Compare August 22, 2023 09:18
@Kikyou1997
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Aug 22, 2023
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.41 seconds
stream load tsv: 546 seconds loaded 74807831229 Bytes, about 130 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.1 seconds inserted 10000000 Rows, about 343K ops/s
storage size: 17161921055 Bytes

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 22, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@starocean999 starocean999 merged commit 35d0c9e into apache:master Aug 23, 2023
morningman pushed a commit that referenced this pull request Aug 24, 2023
…w count while init table (#23170)

1. Load the cache for external table row count while init table, this could avoid no row number stats for the very first time to run an sql.
2. Show cardinality for an external scan node when explain the sql.
3. fix bugs introduced by #22963
morningman pushed a commit that referenced this pull request Oct 13, 2023
…anch 2.0 (#25119)

This PR is composed of belowing commits which has been merged to Doirs master:

* #24769
* #24672
* #24599
* #24521
* #24405
* #24237
* #24135
* #24074
* #24026
* #23992
* #23978
* #23622
* #23507
* #23354
* #23103
* #22963
* #22896
* #22775
* #22773
Jibing-Li added a commit to Jibing-Li/incubator-doris that referenced this pull request Oct 13, 2023
…w count while init table (apache#23170)

1. Load the cache for external table row count while init table, this could avoid no row number stats for the very first time to run an sql.
2. Show cardinality for an external scan node when explain the sql.
3. fix bugs introduced by apache#22963
morningman pushed a commit that referenced this pull request Oct 15, 2023
….0 (#25421)

This PR is composed of belowing commits which has been merged to Doirs master:

* #24769
* #24672
* #24599
* #24521
* #24405
* #24237
* #24135
* #24074
* #24026
* #23992
* #23978
* #23622
* #23507
* #23354
* #23103
* #22963
* #22896
* #22775
* #22773

After this PR, when user upgrade Doris from 2.0.2 to 2.0.3, the origin info in AnalysisManager will be ignored, and the new module AnalysisManagerV2 will be saved(with more info).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants