-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[opt](mow) reduce memory usage for mow table compaction #36865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
clang-tidy review says "All clean, LGTM! 👍" |
889788b to
079613f
Compare
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 39671 ms |
TPC-DS: Total hot run time: 171273 ms |
ClickBench: Total hot run time: 30.82 s |
be/src/common/config.cpp
Outdated
| // rowid conversion correctness check when compaction for mow table | ||
| DEFINE_mBool(enable_rowid_conversion_correctness_check, "false"); | ||
| // missing rows correctness check when compaction for mow table | ||
| DEFINE_mBool(enable_missing_rows_correctness_check, "true"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the missing row check is costive for large dataset, we can disable it by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
079613f to
a560153
Compare
zhannngchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
clang-tidy review says "All clean, LGTM! 👍" |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
run buildall |
TPC-H: Total hot run time: 39953 ms |
TPC-DS: Total hot run time: 170136 ms |
ClickBench: Total hot run time: 30.8 s |
xy720
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
run p0 |
1 similar comment
|
run p0 |
…he#36865) (apache#36998) cherry-pick apache#36865 to branch-2.0
#43502) Related PR: #36865 Problem Summary: #36865 reduced the memory cost for compactions of MoW table But when we merge the codes for cloud, such optimization is not applied for cloud compaction We found several cases that compaction of MoW table consume lots of memory on cloud, this PR try to fix this issue Co-authored-by: Chen Zhang <zhangchen@selectdb.com>
apache#43502) Related PR: apache#36865 Problem Summary: But when we merge the codes for cloud, such optimization is not applied for cloud compaction We found several cases that compaction of MoW table consume lots of memory on cloud, this PR try to fix this issue Co-authored-by: Chen Zhang <zhangchen@selectdb.com>
apache#43502) Related PR: apache#36865 Problem Summary: apache#36865 reduced the memory cost for compactions of MoW table But when we merge the codes for cloud, such optimization is not applied for cloud compaction We found several cases that compaction of MoW table consume lots of memory on cloud, this PR try to fix this issue Co-authored-by: Chen Zhang <zhangchen@selectdb.com>
apache#43502) Related PR: apache#36865 Problem Summary: apache#36865 reduced the memory cost for compactions of MoW table But when we merge the codes for cloud, such optimization is not applied for cloud compaction We found several cases that compaction of MoW table consume lots of memory on cloud, this PR try to fix this issue Co-authored-by: Chen Zhang <zhangchen@selectdb.com>
apache#43502) Related PR: apache#36865 Problem Summary: apache#36865 reduced the memory cost for compactions of MoW table But when we merge the codes for cloud, such optimization is not applied for cloud compaction We found several cases that compaction of MoW table consume lots of memory on cloud, this PR try to fix this issue Co-authored-by: Chen Zhang <zhangchen@selectdb.com>
The problem:
Huge memory usage while compact mow table, following is a example, compaction peak memory 45.19 GB used while
input_row_num=701398908, output_row_num=175349727, filtered_row_num=526049181How to Fix:
The reason is
missed_rowsandlocation_mapare very expensive. So we opt it:missed_rowsset while we really need;enable_missing_rows_correctness_checkto control whether we need to collectmissed_rowsenable_rowid_conversion_correctness_checkopened, we should collectlocation_map;After fix
after fix, the memory could reduce from 45.19 GB to 9.89 GB