-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feature](vertical_compaction) support vertical compaction #13359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
92520a6 to
8d5d0df
Compare
8d5d0df to
d02b8d5
Compare
| // whether enable vertical compaction | ||
| CONF_mBool(enable_vertical_compaction, "false"); | ||
| // In vertical compaction, column number for every group | ||
| CONF_Int32(vertical_compaction_num_columns_per_group, "5"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config can be changed online and it works according code in merger.cpp
| key_columns.emplace_back(sequence_col_idx); | ||
| } | ||
| delete_sign_idx = tablet_schema->field_index(DELETE_SIGN); | ||
| key_columns.emplace_back(delete_sign_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not delete sign idx in some table, e.g. created in an older doris.
| if (_has_key) { | ||
| _row_count = _num_rows_written; | ||
| } | ||
| _num_rows_written = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should have a check here?
if (_has_key) {
_row_count = _num_rows_written;
} else {
CHECK_EQ(_row_count, _num_rows_written);
}
| EngineChecksumTask checksum_task(_tablet->tablet_id(), _tablet->schema_hash(), | ||
| _input_rowsets.back()->end_version(), &checksum_after); | ||
| checksum_task.execute(); | ||
| if (checksum_before != checksum_after) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use DCHECK here
| """ | ||
|
|
||
| sql """ INSERT INTO ${tableName} VALUES | ||
| (1, '2017-10-01', '2017-10-01', '2017-10-01 11:11:11.110000', '2017-10-01 11:11:11.110111', 'Beijing', 10, 1, '2020-01-02', '2020-01-02', '2017-10-01 11:11:11.160000', '2017-10-01 11:11:11.100111', '2020-01-02', 1, 31, 19) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't consider delete sign and delete query?
|
|
||
| void add_cur_batch() { _cur_batch_num++; } | ||
|
|
||
| void reset_cur_batch() { _cur_batch_num = 0; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is not used?
| _tablet->data_dir()->disks_compaction_num_increment(-1); | ||
|
|
||
| if (config::enable_compaction_checksum) { | ||
| EngineChecksumTask checksum_task(_tablet->tablet_id(), _tablet->schema_hash(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EngineChecksumTask does not do merge for unique key, so it does not work for now?
| Status BetaRowsetReader::get_segment_iterators(RowsetReaderContext* read_context, | ||
| std::vector<RowwiseIterator*>* out_iters) { | ||
| RETURN_NOT_OK(_rowset->load()); | ||
| _context = read_context; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assignment is duplicated with init? BTW, get methods should not set member.
| if (!can_reuse_schema || _context->reuse_input_schema == nullptr) { | ||
| // In vertical compaction, every column group need new schema | ||
| if (read_context->is_vertical_compaction) { | ||
| _can_reuse_schema = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
| std::vector<uint32_t> _column_ids; | ||
| bool _has_key = true; | ||
| // written when add particial columns | ||
| uint32_t _num_rows_written = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable _num_rows_written is confused. It seems that num_rows_written works if users used segment writer in the order write keys, flush keys, writer colums? We should add more comment here.
// used to count rows to SegmentWriter, in vertical compaction, num_rows_written = actual_row_count * column groups number. or we can find a better name.
| if (_data_dir != nullptr && _data_dir->reach_capacity_limit((int64_t)estimate_segment_size())) { | ||
| return Status::InternalError("disk {} exceed capacity limit.", _data_dir->path_hash()); | ||
| Status SegmentWriter::finalize_columns(uint64_t* index_size) { | ||
| if (_has_key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comment for usage. e.g. appendkey finalize_columns or we can add a method finalize_key_columns and it takes row_count as a argument.
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor reviews.
|
Hi @yixiutt , I think this is a breaking change to Doris core feature, so I created a new branch: And I have pushed the PR: opt compaction task producer and quick compaction (#13495) to it. |
Proposed changes
Issue Number: close Issue #12966
Support vertical compaction to reduce memory usage in compaction。
Mainly Design:
Split columns into several groups and record data's sequence in a RowSourceBuffer when compact key group, and use RowSourceBuffer to compact value groups.
Result:
In clickbench test, vertical compaction use 1/10 percent memory of horizontal compaction and speed up 15%。
Problem summary
Describe your changes.
Checklist(Required)
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...