-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Optimize] Take 'tablet scan frequency' into consideration when selecting a tablet for compaction #4837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Optimize] Take 'tablet scan frequency' into consideration when selecting a tablet for compaction #4837
Conversation
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plz update document:
administrator-guide/config/be_config.md
be/src/olap/tablet.h
Outdated
| std::unique_ptr<CumulativeCompactionPolicy> _cumulative_compaction_policy; | ||
| std::string _cumulative_compaction_type; | ||
|
|
||
| int64_t _last_update_scan_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment for the new fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| CONF_mInt32(compaction_tablet_scan_frequency_factor, "0"); | ||
| CONF_mInt32(compaction_tablet_compaction_score_factor, "1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do they need to be normalized? If needed, you should define them as double.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do they need to be normalized? If needed, you should define them as double.
Normalization is not required.
|
|
||
| ### `column_dictionary_key_size_threshold` | ||
|
|
||
| ### `compaction_tablet_compaction_score_factor` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no document for these 2 configs?
Better give best practice for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no document for these 2 configs?
Better give best practice for them.
done.
be/src/olap/tablet_manager.cpp
Outdated
| } | ||
| if (table_score > highest_score) { | ||
| highest_score = table_score; | ||
| double scan_frequency = tablet_ptr->calculate_scan_frequency(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if compaction_tablet_scan_frequency_factor is zero, we can skip calling calculate_scan_frequency() to save some CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if
compaction_tablet_scan_frequency_factoris zero, we can skip callingcalculate_scan_frequency()to save some CPU.
It's reasonable.
| time_t now = time(nullptr); | ||
| int64_t current_count = query_scan_count->value(); | ||
| double interval = difftime(now, _last_record_scan_count_timestamp); | ||
| double scan_frequency = (current_count - _last_record_scan_count) * 60 / interval; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why multi 60?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why multi 60?
It means the average count of tablet scans for each minute, Otherwise it will be the average count of tablet scans for each second .
6806c9d to
64ed98d
Compare
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Proposed changes
A large number of small segment files will lead to low efficiency for scan operations. Multiple small files can be merged into a large file by compaction operation. So we could take the tablet scan frequency into consideration when selecting an tablet for compaction and preferentially do compaction for those tablets which are scanned frequently during a latest period of time at the present.
Using the compaction strategy of
Kudufor reference,scan frequencycan be calculated for tablet during a latest period of time and be taken into consideration when calculating compaction score.Types of changes
What types of changes does your code introduce to Doris?
Put an
xin the boxes that applyChecklist
Put an
xin the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.