Skip to content

Conversation

@mrhhsg
Copy link
Member

@mrhhsg mrhhsg commented Apr 26, 2023

Proposed changes

Currently bitmapvalue type is copied between columns, it cost a lot of memory. Use a shared ptr in bitmap value to avoid copy data.

Problem summary

Describe your changes.

Checklist(Required)

  • Does it affect the original behavior
  • Has unit tests been added
  • Has document been added or modified
  • Does it need to update dependencies
  • Is this PR support rollback (If NO, please explain WHY)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@mrhhsg
Copy link
Member Author

mrhhsg commented Apr 26, 2023

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment on lines 1165 to 1195
BitmapValue& operator=(const BitmapValue& other) {
_type = other._type;
_sv = other._sv;
_bitmap = other._bitmap;
return *this;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use '= default' to define a trivial copy-assignment operator [modernize-use-equals-default]

Suggested change
BitmapValue& operator=(const BitmapValue& other) {
_type = other._type;
_sv = other._sv;
_bitmap = other._bitmap;
return *this;
}
BitmapValue& operator=(const BitmapValue& other) = default;

@mrhhsg
Copy link
Member Author

mrhhsg commented Apr 26, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

hello-stephen commented Apr 26, 2023

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 34.65 seconds
stream load tsv: 421 seconds loaded 74807831229 Bytes, about 169 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 60 seconds loaded 1101869774 Bytes, about 17 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230504033341_clickbench_pr_137893.html

@mrhhsg
Copy link
Member Author

mrhhsg commented Apr 28, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@mrhhsg
Copy link
Member Author

mrhhsg commented Apr 28, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@mrhhsg
Copy link
Member Author

mrhhsg commented May 4, 2023

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented May 4, 2023

clang-tidy review says "All clean, LGTM! 👍"

@mrhhsg
Copy link
Member Author

mrhhsg commented May 4, 2023

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented May 4, 2023

clang-tidy review says "All clean, LGTM! 👍"

return;
}

if (_bitmap.use_count() > 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉这里用use count 可能不太好,不如记录一个标记位,比如is_owner 这样的,然后再copy constructor 还有operator=的时候,都把is_owner 设置为false;prepare for write的时候,设置为true。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果is owner == true,就直接return了。 我觉得一个bool 的判断效率要比 shared ptr use count 高

@mrhhsg
Copy link
Member Author

mrhhsg commented May 22, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 35.94 seconds
stream load tsv: 423 seconds loaded 74807831229 Bytes, about 168 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 59 seconds loaded 1101869774 Bytes, about 17 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 79.4 seconds inserted 10000000 Rows, about 125K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230522045521_clickbench_pr_148057.html

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@mrhhsg
Copy link
Member Author

mrhhsg commented May 22, 2023

run buildall

@hello-stephen
Copy link
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 36.68 seconds
stream load tsv: 433 seconds loaded 74807831229 Bytes, about 164 MB/s
stream load json: 22 seconds loaded 2358488459 Bytes, about 102 MB/s
stream load orc: 59 seconds loaded 1101869774 Bytes, about 17 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 80.4 seconds inserted 10000000 Rows, about 124K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230522103952_clickbench_pr_148293.html

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 24, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit e5eed53 into apache:master May 24, 2023
mrhhsg added a commit to mrhhsg/doris that referenced this pull request Jun 28, 2023
…ying (apache#19101)

Currently bitmapvalue type is copied between columns, it cost a lot of memory. Use a shared ptr in bitmap value to avoid copy data.
yiguolei pushed a commit that referenced this pull request Jul 11, 2023
…ying (#19101) (#21271)

Currently bitmapvalue type is copied between columns, it cost a lot of memory. Use a shared ptr in bitmap value to avoid copy data.
morningman pushed a commit that referenced this pull request Jul 19, 2023
…ying (#19101) (#21271)

Currently bitmapvalue type is copied between columns, it cost a lot of memory. Use a shared ptr in bitmap value to avoid copy data.
GoGoWen pushed a commit to GoGoWen/incubator-doris that referenced this pull request Jul 26, 2023
…ying (apache#19101) (apache#21271)

Currently bitmapvalue type is copied between columns, it cost a lot of memory. Use a shared ptr in bitmap value to avoid copy data.
@mrhhsg mrhhsg deleted the opt_bitmap branch August 18, 2023 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/1.2.5 need-cherry-pick reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants