-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
I start a routine load job and set
cumulative_compaction_num_threads_per_disk=10
base_compaction_num_threads_per_disk=5
push_write_mbytes_per_sec=100
cumulative_compaction_check_interval_seconds=1
cumulative_compaction_skip_window_seconds=5
in my BE config.
the job properties is set
(
"desired_concurrent_number"="25",
"max_batch_interval" = "60",
"max_batch_rows" = "1000000000",
"max_batch_size" = "1073741824",
"strict_mode" = "false",
"format" = "json",
"strip_outer_array" = "false"
)
partitions is divided into "HOUR" time unit
I have 25 BE nodes and throughput is almost 300K msg per second(Data Size: 700MB/s).
Then I find performance of the query which search recent time or now is terrible.
For example: now is 2020-09-07 14:30
some query like timestamp >= "2020-09-07 14:00" and timestamp < "2020-09-07 15:00" will cost 4 sec
while some query like timestamp >= "2020-09-07 13:00" and timestamp < "2020-09-07 14:00" will cost 0.4 sec
I notice that the VersionCount is almost 300+ which is 2-30 as usual. It means some recent data doesn't have been compacted and I increase cumulative_compaction_num_threads_per_disk, it doesn't work.
I review the compaction code(src/olap/tablet_manager.cpp 715):
int64_t last_failure_ms = tablet_ptr->last_cumu_compaction_failure_time();
if (compaction_type == CompactionType::BASE_COMPACTION) {
last_failure_ms = tablet_ptr->last_base_compaction_failure_time();
}
if (now_ms - last_failure_ms <= config::min_compaction_failure_interval_sec * 1000) {
VLOG(1) << "Too often to check compaction, skip it."
<< "compaction_type=" << compaction_type_str
<< ", last_failure_time_ms=" << last_failure_ms
<< ", tablet_id=" << tablet_ptr->tablet_id();
continue;
}
It means tablet doesn't need compact when last failure time is too closed to now.
the last failure time code(src/olap/storage_engine.cpp 557 _perform_cumulative_compaction 593 _perform_base_compaction)
OLAPStatus res = cumulative_compaction.compact();
if (res != OLAP_SUCCESS) {
best_tablet->set_last_cumu_compaction_failure_time(UnixMillis());
if (res != OLAP_ERR_CUMULATIVE_NO_SUITABLE_VERSIONS) {
DorisMetrics::instance()->cumulative_compaction_request_failed.increment(1);
LOG(WARNING) << "failed to do cumulative compaction. res=" << res
<< ", table=" << best_tablet->full_name();
}
return;
}
best_tablet->set_last_cumu_compaction_failure_time(0);
It means when compaction is success, it will set last failure time (0), when compaction is failed, it will set now instead.
I don't know what error orrurs while compacting and it's truely failed. The last failure time is set now and the tablet will not be compacted during next min_compaction_failure_interval_sec second which default value is 600. So there are more and more routine load data in BE and query is slow.
A simple way to solve the problem is telling the difference between compact result code and set correct last failure time:
Just change the code as follow:
OLAPStatus res = cumulative_compaction.compact();
if (res != OLAP_SUCCESS) {
if (res == OLAP_ERR_BE_TRY_BE_LOCK_ERROR) {
best_tablet->set_last_cumu_compaction_failure_time(UnixMillis());
} else {
best_tablet->set_last_cumu_compaction_failure_time(0);
}
if (res != OLAP_ERR_CUMULATIVE_NO_SUITABLE_VERSIONS) {
DorisMetrics::instance()->cumulative_compaction_request_failed.increment(1);
LOG(WARNING) << "failed to do cumulative compaction. res=" << res
<< ", table=" << best_tablet->full_name();
}
return;
}
best_tablet->set_last_cumu_compaction_failure_time(0);
I think whatever the status of the thread that owns the compaction lock is, the last compaction failure time need to be set as 0 so that the tablet can be scheduled next time.
Any suggestions ?