A large number of small segment files will lead to low efficiency for scan operations. Multiple small files can be merged into a large file by compaction operation. So we could take the tablet scan frequency into consideration when selecting an tablet for compaction and preferentially do compaction for those tablets which are scanned frequently during a latest period of time at the present.
Using the compaction strategy of Kudufor reference, scan frequency can be calculated for tablet during a latest period of time at the present and be taken into consideration when calculating compaction score. New compaction score can be calculated like this:
new_compaction_score = k1 * tablet_scan_frequency + k2 * old_compaction_score
k1andk2can be set dynamically through http interface /api/update_config.
We can add a metric query_scan_count for each tablet which records the scan count of the tablet. Thus, tablet scan frequency can be calculated like this:
tablet_scan_frequency = (now_query_scan_count - last_query_scan_count) / (now_time - last_time)
last_query_scan_count and last_time will be updated every time an interval passes.