[Routine Load][BE]Too many VersionCount cause by wrong compaction failure time 

I start a routine load job and set 

cumulative_compaction_num_threads_per_disk=10
base_compaction_num_threads_per_disk=5
push_write_mbytes_per_sec=100
cumulative_compaction_check_interval_seconds=1
cumulative_compaction_skip_window_seconds=5

in my BE config.

the job properties is set 
(
    "desired_concurrent_number"="25",
    "max_batch_interval" = "60",
    "max_batch_rows" = "1000000000",
    "max_batch_size" = "1073741824",
    "strict_mode" = "false",
    "format" = "json",
    "strip_outer_array" = "false"
)

partitions is divided into "HOUR" time unit

I have 25 BE nodes and throughput is almost 300K msg per second(Data Size: 700MB/s).
Then I find performance of the query which search recent time or now is terrible.
For example: now is 2020-09-07 14:30
some query like timestamp >= "2020-09-07 14:00" and timestamp < "2020-09-07 15:00" will cost 4 sec
while some query like timestamp >= "2020-09-07 13:00" and timestamp < "2020-09-07 14:00" will cost 0.4 sec

I notice that the VersionCount is almost 300+ which is 2-30 as usual. It means some recent data doesn't have been compacted and I increase cumulative_compaction_num_threads_per_disk, it doesn't work.

I review the compaction code(src/olap/tablet_manager.cpp 715):

                int64_t last_failure_ms = tablet_ptr->last_cumu_compaction_failure_time();
                if (compaction_type == CompactionType::BASE_COMPACTION) {
                    last_failure_ms = tablet_ptr->last_base_compaction_failure_time();
                }
                if (now_ms - last_failure_ms <= config::min_compaction_failure_interval_sec * 1000) {
                    VLOG(1) << "Too often to check compaction, skip it."
                            << "compaction_type=" << compaction_type_str
                            << ", last_failure_time_ms=" << last_failure_ms
                            << ", tablet_id=" << tablet_ptr->tablet_id();
                    continue;
                } 

It means tablet doesn't need compact when last failure time is too closed to now. 
the last failure time code(src/olap/storage_engine.cpp 557 _perform_cumulative_compaction 593 _perform_base_compaction) 

    OLAPStatus res = cumulative_compaction.compact();
    if (res != OLAP_SUCCESS) {
        best_tablet->set_last_cumu_compaction_failure_time(UnixMillis());
        if (res != OLAP_ERR_CUMULATIVE_NO_SUITABLE_VERSIONS) {
            DorisMetrics::instance()->cumulative_compaction_request_failed.increment(1);
            LOG(WARNING) << "failed to do cumulative compaction. res=" << res
                        << ", table=" << best_tablet->full_name();
        }
        return;
    }
    best_tablet->set_last_cumu_compaction_failure_time(0);

It means when compaction is success, it will set last failure time (0), when compaction is failed, it will set now instead.
I don't know what error orrurs while compacting and it's truely failed. The last failure time is set now and the tablet will not be compacted during next min_compaction_failure_interval_sec second which default value is 600. So there are more and more routine load data in BE and query is slow.

A simple way to solve the problem is telling the difference between compact result code and set correct last failure time:
Just change the code as follow:

    OLAPStatus res = cumulative_compaction.compact();
    if (res != OLAP_SUCCESS) {
        if (res == OLAP_ERR_BE_TRY_BE_LOCK_ERROR) {
            best_tablet->set_last_cumu_compaction_failure_time(UnixMillis());
        } else {
            best_tablet->set_last_cumu_compaction_failure_time(0);
        }
        if (res != OLAP_ERR_CUMULATIVE_NO_SUITABLE_VERSIONS) {
            DorisMetrics::instance()->cumulative_compaction_request_failed.increment(1);
            LOG(WARNING) << "failed to do cumulative compaction. res=" << res
                        << ", table=" << best_tablet->full_name();
        }
        return;
    }
    best_tablet->set_last_cumu_compaction_failure_time(0);

I think whatever the status of the thread that owns the compaction lock is, the last compaction failure time need to be set as 0 so that the tablet can be scheduled next time.

Any suggestions ?   

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Routine Load][BE]Too many VersionCount cause by wrong compaction failure time #4551

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Routine Load][BE]Too many VersionCount cause by wrong compaction failure time #4551

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions