Skip to content

Conversation

@Jibing-Li
Copy link
Contributor

@Jibing-Li Jibing-Li commented Aug 9, 2023

This pr has four changes:

  1. Collect external table row count when execute analyze database.
  2. Support show cached table stats (row count)
  3. Support alter external table column stats.
  4. Refresh/Invalidate table row count stat memory cache when analyze task finished and drop table stats.

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@Jibing-Li
Copy link
Contributor Author

run buildall

@Jibing-Li Jibing-Li marked this pull request as ready for review August 9, 2023 11:40
@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.85 seconds
stream load tsv: 513 seconds loaded 74807831229 Bytes, about 139 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17162290922 Bytes

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some regression test

@Jibing-Li
Copy link
Contributor Author

run buildall

Support show cached table stats
Support alter column stats.
@Jibing-Li
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.81 seconds
stream load tsv: 512 seconds loaded 74807831229 Bytes, about 139 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162056959 Bytes

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.59 seconds
stream load tsv: 514 seconds loaded 74807831229 Bytes, about 138 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.1 seconds inserted 10000000 Rows, about 343K ops/s
storage size: 17162503405 Bytes

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 11, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit 5b09254 into apache:master Aug 11, 2023
@Jibing-Li Jibing-Li deleted the stats branch August 16, 2023 04:09
Jibing-Li added a commit to Jibing-Li/incubator-doris that referenced this pull request Aug 17, 2023
…pache#22788)

1. Collect external table row count when execute analyze database.
2. Support show cached table stats (row count)
3. Support alter external table column stats.
4. Refresh/Invalidate table row count stat memory cache when analyze task finished and drop table stats.
Jibing-Li added a commit to Jibing-Li/incubator-doris that referenced this pull request Aug 17, 2023
…pache#22788)

1. Collect external table row count when execute analyze database.
2. Support show cached table stats (row count)
3. Support alter external table column stats.
4. Refresh/Invalidate table row count stat memory cache when analyze task finished and drop table stats.
@xiaokang xiaokang mentioned this pull request Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.1-merged merge_conflict reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants