-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Optimize] Optimize the execution model of compaction to limit memory consumption #4670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Optimize] Optimize the execution model of compaction to limit memory consumption #4670
Conversation
be/src/common/config.h
Outdated
| CONF_mInt64(total_permits_for_compaction_score, "10000") | ||
|
|
||
| // Whether compaction task is allowed to start when compaction score of current tablet is out of upper limit. | ||
| CONF_mBool(enable_over_sold, "true"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| CONF_mBool(enable_over_sold, "true"); | |
| CONF_mBool(enable_compaction_permit_over_sold, "true"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK !
| CompactionType compaction_type; | ||
| do { | ||
| if (!config::disable_auto_compaction) { | ||
| if (round < config::cumulative_compaction_rounds_for_each_base_compaction_round) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default cumulative_compaction_rounds_for_each_base_compaction_round is 9, and default generate_compaction_tasks_interval_seconds is 2. So generally, it will create a base compaction task for every 18 seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I also test this with following case:
- Only 1 BE with 1 data dir.
- Create one table with 100 buckets.
- insert data into this table for every 5 seconds.
The compaction is triggered every 2 seconds. And each compaction task cost just 0.x seconds. But the average version count of tablets is about 50, and can not be lower.
So I think the way to generate compaction tasks through polling may not be appropriate. One possible way is to generate compaction tasks through triggering.
Based on polling, currently only one task can be done in 2 seconds, and based on triggering, in my case, it can be done 500 times per second (because the amount of data in each batch is very small in the case of high-frequency load)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestions! I optimized the implementation logic of my producer. If all the compaction tasks produced can hold permits, the producer will continue to produce compaction tasks without sleep. In this way, the production speed can far meet consumer demand.
| CompactionType compaction_type, std::vector<DataDir*> data_dirs) { | ||
| vector<TabletSharedPtr> tablets_compaction; | ||
| std::random_shuffle(data_dirs.begin(), data_dirs.end()); | ||
| for (auto data_dir : data_dirs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can find more than one tablet for each data dir at this time.
the number of tablet found here can be compaction_task_num_per_disk
|
Hi @weizuo93 , I tested in my env with high frequency load, And I recommend to add following patch. For 3 things:
|
OK. It seems more reasonable. |
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
… consumption (apache#4670) Currently, there are M threads to do base compaction and N threads to do cumulative compaction for each disk. Too many compaction tasks may run out of memory, so the max concurrency of running compaction tasks is limited by semaphore. If the running threads cost too much memory, we can't defense it. In addition, reducing concurrency to avoid OOM will lead to some compaction tasks can't be executed in time and we may encounter more heavy compaction. Therefore, concurrency limitation is not enough. The strategy proposed in apache#3624 may be effective to solve the OOM. A CompactionPermitLimiter is used for compaction limitation, and use single-producer/multi-consumer model. Producer will try to generate compaction tasks and acquire `permits` for each task. The compaction task which can hold `permits` will be executed in thread pool and each finished task will release its `permits`. `permits` should be applied for before a compaction task can execute. When the sum of `permits` held by executing compaction tasks reaches a threshold, subsequent compaction task will be no longer allowed, until some `permits` are released. Tablet compaction score is used as `permits` of compaction task here. To some extent, memory consumption can be limited by setting appropriate `permits` threshold.
Proposed changes
Currently, there are M threads to do base compaction and N threads to do cumulative compaction for each disk. Too many compaction tasks may run out of memory, so the max concurrency of running compaction tasks is limited by semaphore. If the running threads cost too much memory, we can't defense it. In addition, reducing concurrency to avoid OOM will lead to some compaction tasks can't be executed in time and we may encounter more heavy compaction. Therefore, concurrency limitation is not enough.
The strategy proposed in #3624 may be effective to solve the OOM.
A CompactionPermitLimiter is used for compaction limitation, and use single-producer/multi-consumer model. Producer will try to generate compaction tasks and acquire
permitsfor each task. The compaction task which can holdpermitswill be executed in thread pool and each finished task will release itspermits.permitsshould be applied for before a compaction task can execute. When the sum ofpermitsheld by executing compaction tasks reaches a threshold, subsequent compaction task will be no longer allowed, until somepermitsare released. Tablet compaction score is used aspermitsof compaction task here.To some extent, memory consumption can be limited by setting appropriate
permitsthreshold.Types of changes
What types of changes does your code introduce to Doris?
Put an
xin the boxes that applyChecklist
Put an
xin the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.