[Optimize] Optimize the execution model of compaction to limit memory consumption #4670

weizuo93 · 2020-09-25T04:18:56Z

Proposed changes

Currently, there are M threads to do base compaction and N threads to do cumulative compaction for each disk. Too many compaction tasks may run out of memory, so the max concurrency of running compaction tasks is limited by semaphore. If the running threads cost too much memory, we can't defense it. In addition, reducing concurrency to avoid OOM will lead to some compaction tasks can't be executed in time and we may encounter more heavy compaction. Therefore, concurrency limitation is not enough.

The strategy proposed in #3624 may be effective to solve the OOM.

A CompactionPermitLimiter is used for compaction limitation, and use single-producer/multi-consumer model. Producer will try to generate compaction tasks and acquire permits for each task. The compaction task which can hold permits will be executed in thread pool and each finished task will release its permits.

permits should be applied for before a compaction task can execute. When the sum of permits held by executing compaction tasks reaches a threshold, subsequent compaction task will be no longer allowed, until some permits are released. Tablet compaction score is used as permits of compaction task here.

To some extent, memory consumption can be limited by setting appropriate permits threshold.

Types of changes

What types of changes does your code introduce to Doris?
Put an x in the boxes that apply

[] Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
[] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[] Documentation Update (if none of the other choices apply)
Code refactor (Modify the code structure, format the code, etc...)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

Issue (Fix [Proposal] The execution model of compaction needs to improve #3624)have described the bug/feature there in detail
Compiling and unit tests pass locally with my changes
[] I have added tests that prove my fix is effective or that my feature works
[] If this change need a document change, I have updated the document
Any dependent changes have been merged

morningman · 2020-09-26T03:28:39Z

be/src/common/config.h

+    CONF_mInt64(total_permits_for_compaction_score, "10000")
+
+    // Whether compaction task is allowed to start when compaction score of current tablet is out of upper limit.
+    CONF_mBool(enable_over_sold, "true");


Suggested change

CONF_mBool(enable_over_sold, "true");

CONF_mBool(enable_compaction_permit_over_sold, "true");

morningman · 2020-09-26T07:30:56Z

be/src/olap/olap_server.cpp

+    CompactionType compaction_type;
+    do {
+        if (!config::disable_auto_compaction) {
+            if (round < config::cumulative_compaction_rounds_for_each_base_compaction_round) {


The default cumulative_compaction_rounds_for_each_base_compaction_round is 9, and default generate_compaction_tasks_interval_seconds is 2. So generally, it will create a base compaction task for every 18 seconds?

And I also test this with following case:

Only 1 BE with 1 data dir.

Create one table with 100 buckets.

insert data into this table for every 5 seconds.

The compaction is triggered every 2 seconds. And each compaction task cost just 0.x seconds. But the average version count of tablets is about 50, and can not be lower.

So I think the way to generate compaction tasks through polling may not be appropriate. One possible way is to generate compaction tasks through triggering.

Based on polling, currently only one task can be done in 2 seconds, and based on triggering, in my case, it can be done 500 times per second (because the amount of data in each batch is very small in the case of high-frequency load)

Thanks for your suggestions! I optimized the implementation logic of my producer. If all the compaction tasks produced can hold permits, the producer will continue to produce compaction tasks without sleep. In this way, the production speed can far meet consumer demand.

morningman · 2020-09-26T07:48:10Z

be/src/olap/olap_server.cpp

+        CompactionType compaction_type, std::vector<DataDir*> data_dirs) {
+    vector<TabletSharedPtr> tablets_compaction;
+    std::random_shuffle(data_dirs.begin(), data_dirs.end());
+    for (auto data_dir : data_dirs) {


I think we can find more than one tablet for each data dir at this time.
the number of tablet found here can be compaction_task_num_per_disk

morningman · 2020-09-29T10:39:25Z

Hi @weizuo93 , I tested in my env with high frequency load, And I recommend to add following patch.

diff --git a/be/src/olap/compaction.cpp b/be/src/olap/compaction.cpp
index f81a029cc..aa0f0b452 100644
--- a/be/src/olap/compaction.cpp
+++ b/be/src/olap/compaction.cpp
@@ -39,13 +39,13 @@ OLAPStatus Compaction::do_compaction(int64_t permits) {
     TRACE("start to do compaction");
     _tablet->data_dir()->disks_compaction_score_increment(permits);
     _tablet->data_dir()->disks_compaction_num_increment(1);
-    OLAPStatus st = do_compaction_impl();
+    OLAPStatus st = do_compaction_impl(permits);
     _tablet->data_dir()->disks_compaction_score_increment(-permits);
     _tablet->data_dir()->disks_compaction_num_increment(-1);
     return st;
 }

-OLAPStatus Compaction::do_compaction_impl() {
+OLAPStatus Compaction::do_compaction_impl(int64_t permits) {
     OlapStopWatch watch;

     // 1. prepare input and output parameters
@@ -63,7 +63,8 @@ OLAPStatus Compaction::do_compaction_impl() {
     _tablet->compute_version_hash_from_rowsets(_input_rowsets, &_output_version_hash);

     LOG(INFO) << "start " << compaction_name() << ". tablet=" << _tablet->full_name()
-            << ", output version is=" << _output_version.first << "-" << _output_version.second;
+            << ", output version is=" << _output_version.first << "-" << _output_version.second
+            << ", score: " << permits;

     RETURN_NOT_OK(construct_output_rowset_writer());
     RETURN_NOT_OK(construct_input_rowset_readers());
diff --git a/be/src/olap/compaction.h b/be/src/olap/compaction.h
index 20a567cdc..f43bc6f1d 100644
--- a/be/src/olap/compaction.h
+++ b/be/src/olap/compaction.h
@@ -55,7 +55,7 @@ protected:
     virtual ReaderType compaction_type() const = 0;

     OLAPStatus do_compaction(int64_t permits);
-    OLAPStatus do_compaction_impl();
+    OLAPStatus do_compaction_impl(int64_t permits);

     void modify_rowsets();
     OLAPStatus gc_unused_rowsets();
diff --git a/be/src/olap/storage_engine.cpp b/be/src/olap/storage_engine.cpp
index 020b40fca..c2cfcc162 100644
--- a/be/src/olap/storage_engine.cpp
+++ b/be/src/olap/storage_engine.cpp
@@ -598,8 +598,8 @@ void StorageEngine::_perform_cumulative_compaction(TabletSharedPtr best_tablet)

     OLAPStatus res = cumulative_compaction.compact();
     if (res != OLAP_SUCCESS) {
-        best_tablet->set_last_cumu_compaction_failure_time(UnixMillis());
         if (res != OLAP_ERR_CUMULATIVE_NO_SUITABLE_VERSIONS) {
+            best_tablet->set_last_cumu_compaction_failure_time(UnixMillis());
             DorisMetrics::instance()->cumulative_compaction_request_failed->increment(1);
             LOG(WARNING) << "failed to do cumulative compaction. res=" << res
                         << ", table=" << best_tablet->full_name();
diff --git a/be/src/olap/tablet_manager.cpp b/be/src/olap/tablet_manager.cpp
index fdfa99631..bf7d09890 100644
--- a/be/src/olap/tablet_manager.cpp
+++ b/be/src/olap/tablet_manager.cpp
@@ -763,7 +763,7 @@ TabletSharedPtr TabletManager::find_best_tablet_to_compaction(
     }

     if (best_tablet != nullptr) {
-        LOG(INFO) << "Found the best tablet for compaction. "
+        VLOG(1) << "Found the best tablet for compaction. "
                   << "compaction_type=" << compaction_type_str
                   << ", tablet_id=" << best_tablet->tablet_id()
                   << ", highest_score=" << highest_score;

For 3 things:

Avoid too many "Found the best tablet for compaction. " log.
Still show the highest score int log, but along with the compaction start log.
Do not set last_cumu_compaction_failure_time of tablet if error is OLAP_ERR_CUMULATIVE_NO_SUITABLE_VERSIONS. This can make compaction process continue running.

weizuo93 · 2020-09-30T02:34:31Z

Hi @weizuo93 , I tested in my env with high frequency load, And I recommend to add following patch.

diff --git a/be/src/olap/compaction.cpp b/be/src/olap/compaction.cpp
index f81a029cc..aa0f0b452 100644
--- a/be/src/olap/compaction.cpp
+++ b/be/src/olap/compaction.cpp
@@ -39,13 +39,13 @@ OLAPStatus Compaction::do_compaction(int64_t permits) {
     TRACE("start to do compaction");
     _tablet->data_dir()->disks_compaction_score_increment(permits);
     _tablet->data_dir()->disks_compaction_num_increment(1);
-    OLAPStatus st = do_compaction_impl();
+    OLAPStatus st = do_compaction_impl(permits);
     _tablet->data_dir()->disks_compaction_score_increment(-permits);
     _tablet->data_dir()->disks_compaction_num_increment(-1);
     return st;
 }

-OLAPStatus Compaction::do_compaction_impl() {
+OLAPStatus Compaction::do_compaction_impl(int64_t permits) {
     OlapStopWatch watch;

     // 1. prepare input and output parameters
@@ -63,7 +63,8 @@ OLAPStatus Compaction::do_compaction_impl() {
     _tablet->compute_version_hash_from_rowsets(_input_rowsets, &_output_version_hash);

     LOG(INFO) << "start " << compaction_name() << ". tablet=" << _tablet->full_name()
-            << ", output version is=" << _output_version.first << "-" << _output_version.second;
+            << ", output version is=" << _output_version.first << "-" << _output_version.second
+            << ", score: " << permits;

     RETURN_NOT_OK(construct_output_rowset_writer());
     RETURN_NOT_OK(construct_input_rowset_readers());
diff --git a/be/src/olap/compaction.h b/be/src/olap/compaction.h
index 20a567cdc..f43bc6f1d 100644
--- a/be/src/olap/compaction.h
+++ b/be/src/olap/compaction.h
@@ -55,7 +55,7 @@ protected:
     virtual ReaderType compaction_type() const = 0;

     OLAPStatus do_compaction(int64_t permits);
-    OLAPStatus do_compaction_impl();
+    OLAPStatus do_compaction_impl(int64_t permits);

     void modify_rowsets();
     OLAPStatus gc_unused_rowsets();
diff --git a/be/src/olap/storage_engine.cpp b/be/src/olap/storage_engine.cpp
index 020b40fca..c2cfcc162 100644
--- a/be/src/olap/storage_engine.cpp
+++ b/be/src/olap/storage_engine.cpp
@@ -598,8 +598,8 @@ void StorageEngine::_perform_cumulative_compaction(TabletSharedPtr best_tablet)

     OLAPStatus res = cumulative_compaction.compact();
     if (res != OLAP_SUCCESS) {
-        best_tablet->set_last_cumu_compaction_failure_time(UnixMillis());
         if (res != OLAP_ERR_CUMULATIVE_NO_SUITABLE_VERSIONS) {
+            best_tablet->set_last_cumu_compaction_failure_time(UnixMillis());
             DorisMetrics::instance()->cumulative_compaction_request_failed->increment(1);
             LOG(WARNING) << "failed to do cumulative compaction. res=" << res
                         << ", table=" << best_tablet->full_name();
diff --git a/be/src/olap/tablet_manager.cpp b/be/src/olap/tablet_manager.cpp
index fdfa99631..bf7d09890 100644
--- a/be/src/olap/tablet_manager.cpp
+++ b/be/src/olap/tablet_manager.cpp
@@ -763,7 +763,7 @@ TabletSharedPtr TabletManager::find_best_tablet_to_compaction(
     }

     if (best_tablet != nullptr) {
-        LOG(INFO) << "Found the best tablet for compaction. "
+        VLOG(1) << "Found the best tablet for compaction. "
                   << "compaction_type=" << compaction_type_str
                   << ", tablet_id=" << best_tablet->tablet_id()
                   << ", highest_score=" << highest_score;

For 3 things:

Avoid too many "Found the best tablet for compaction. " log.
Still show the highest score int log, but along with the compaction start log.
Do not set last_cumu_compaction_failure_time of tablet if error is OLAP_ERR_CUMULATIVE_NO_SUITABLE_VERSIONS. This can make compaction process continue running.

OK. It seems more reasonable.

morningman

LGTM

… consumption (apache#4670) Currently, there are M threads to do base compaction and N threads to do cumulative compaction for each disk. Too many compaction tasks may run out of memory, so the max concurrency of running compaction tasks is limited by semaphore. If the running threads cost too much memory, we can't defense it. In addition, reducing concurrency to avoid OOM will lead to some compaction tasks can't be executed in time and we may encounter more heavy compaction. Therefore, concurrency limitation is not enough. The strategy proposed in apache#3624 may be effective to solve the OOM. A CompactionPermitLimiter is used for compaction limitation, and use single-producer/multi-consumer model. Producer will try to generate compaction tasks and acquire `permits` for each task. The compaction task which can hold `permits` will be executed in thread pool and each finished task will release its `permits`. `permits` should be applied for before a compaction task can execute. When the sum of `permits` held by executing compaction tasks reaches a threshold, subsequent compaction task will be no longer allowed, until some `permits` are released. Tablet compaction score is used as `permits` of compaction task here. To some extent, memory consumption can be limited by setting appropriate `permits` threshold.

weizuo93 added 22 commits September 18, 2020 22:18

compaction memory optimize and code refactor

1089cda

compaction memory optimize and code refactor

d96e0ee

compaction memory optimize and code refactor

908788a

compaction memory optimize and code refactor

4bea5d5

compaction memory optimize and code refactor

72e4e79

add compaction status for tablet

506afdb

add lambda function for submit callback

dce66cd

code refactor

ab4722a

code refactor

71cffec

code refactor

6468197

code refactor

5f784bf

add metrics and comment for compaction limiter

e9406e7

modify metrics

f7b614c

modify metrics

fbe9b57

modify metrics

7200398

modify metrics

8652b55

modify config

644b126

modify config

6ce8595

modify metrics

b7dd382

modify metrics

256d258

modify metrics

9484f52

modify metrics

18c2933

weizuo93 changed the title ~~[Optimize] optimize the execution model of compaction to limit memory consumption~~ [Optimize] Optimize the execution model of compaction to limit memory consumption Sep 25, 2020

morningman added area/compact Issues or PRs related to the compact kind/improvement labels Sep 26, 2020

morningman self-assigned this Sep 26, 2020

morningman reviewed Sep 26, 2020

View reviewed changes

weizuo93 added 3 commits September 27, 2020 22:16

optimize compaction tasks producer

5c6879b

modify compaction tasks producer

7c343f4

modify compaction tasks producer

3e49f53

optimize compaction memory and code refactor

ab5f6b3

compaction log modification

a0206b3

morningman approved these changes Oct 10, 2020

View reviewed changes

morningman added the approved Indicates a PR has been approved by one committer. label Oct 10, 2020

morningman merged commit eba5955 into apache:master Oct 11, 2020

yangzhg mentioned this pull request Feb 9, 2021

Release Notes 0.14.0 #5374

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Optimize] Optimize the execution model of compaction to limit memory consumption #4670

[Optimize] Optimize the execution model of compaction to limit memory consumption #4670

Uh oh!

weizuo93 commented Sep 25, 2020

Uh oh!

morningman Sep 26, 2020

Uh oh!

weizuo93 Sep 28, 2020

Uh oh!

morningman Sep 26, 2020

Uh oh!

morningman Sep 26, 2020

Uh oh!

weizuo93 Sep 28, 2020

Uh oh!

morningman Sep 26, 2020

Uh oh!

morningman commented Sep 29, 2020 •

edited

Loading

Uh oh!

weizuo93 commented Sep 30, 2020

Uh oh!

morningman left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	CONF_mBool(enable_over_sold, "true");
	CONF_mBool(enable_compaction_permit_over_sold, "true");

[Optimize] Optimize the execution model of compaction to limit memory consumption #4670

[Optimize] Optimize the execution model of compaction to limit memory consumption #4670

Uh oh!

Conversation

weizuo93 commented Sep 25, 2020

Proposed changes

Types of changes

Checklist

Uh oh!

morningman Sep 26, 2020

Choose a reason for hiding this comment

Uh oh!

weizuo93 Sep 28, 2020

Choose a reason for hiding this comment

Uh oh!

morningman Sep 26, 2020

Choose a reason for hiding this comment

Uh oh!

morningman Sep 26, 2020

Choose a reason for hiding this comment

Uh oh!

weizuo93 Sep 28, 2020

Choose a reason for hiding this comment

Uh oh!

morningman Sep 26, 2020

Choose a reason for hiding this comment

Uh oh!

morningman commented Sep 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weizuo93 commented Sep 30, 2020

Uh oh!

morningman left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

morningman commented Sep 29, 2020 •

edited

Loading