-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](inverted index) Fix inverted index for MOR unique table #31051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. |
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
TeamCity be ut coverage result: |
qidaye
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
| throw new AnalysisException("index should only be used in columns of DUP_KEYS/UNIQUE_KEYS table" | ||
| + " or key columns of AGG_KEYS table. invalid index: " + indexName); | ||
| } else if (keysType == KeysType.UNIQUE_KEYS && !enableUniqueKeyMergeOnWrite | ||
| && indexType == IndexType.INVERTED && properties != null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why check inverted index here? If this is not an inverted index, for example, bloomfilter index (or other index we added in the future) , it is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested and found that index can be used for value columns of mor unique table, since there is a guard for possible wrong result: _should_push_down_value_predicates().
BetaRowsetReader::get_segment_iterators(...) {
// ...
if (_should_push_down_value_predicates()) {
if (_read_context->value_predicates != nullptr) {
_read_options.column_predicates.insert(_read_options.column_predicates.end(),
_read_context->value_predicates->begin(),
_read_context->value_predicates->end());
for (auto pred : *(_read_context->value_predicates)) {
if (_read_options.col_id_to_predicates.count(pred->column_id()) < 1) {
_read_options.col_id_to_predicates.insert(
{pred->column_id(), std::make_shared<AndBlockColumnPredicate>()});
}
auto single_column_block_predicate = new SingleColumnBlockPredicate(pred);
_read_options.col_id_to_predicates[pred->column_id()]->add_column_predicate(
single_column_block_predicate);
}
}
}
// ...
}
bool BetaRowsetReader::_should_push_down_value_predicates() const {
// if unique table with rowset [0-x] or [0-1] [2-y] [...],
// value column predicates can be pushdown on rowset [0-x] or [2-y], [2-y]
// must be compaction, not overlapping and don't have sequence column
return _rowset->keys_type() == UNIQUE_KEYS &&
(((_rowset->start_version() == 0 || _rowset->start_version() == 2) &&
!_rowset->_rowset_meta->is_segments_overlapping() &&
_read_context->sequence_id_idx == -1) ||
_read_context->enable_unique_key_merge_on_write);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If _should_push_down_value_predicates() is false, predicate on value column can not be pushed down to storage layer where index is applied. So it's safe to use index on value column. But it's too slow for MATCH query if index is not applied, so do not allow inverted index with parser.
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 41313 ms |
TPC-DS: Total hot run time: 177143 ms |
ClickBench: Total hot run time: 30.84 s |
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G' |
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 41298 ms |
TPC-DS: Total hot run time: 178834 ms |
ClickBench: Total hot run time: 31.87 s |
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G' |
airborne12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
qidaye
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
zhannngchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…#31051) * [fix](index) Fix index for none key column of unique mor table (apache#31035) * disable INVERTED index with parser on value columns of MOR unique table * add debug log for test_build_index * add debug log * only do index compaction for dup and mow
…#31051) * [fix](index) Fix index for none key column of unique mor table (apache#31035) * disable INVERTED index with parser on value columns of MOR unique table * add debug log for test_build_index * add debug log * only do index compaction for dup and mow
Proposed changes
Issue Number: close #xxx
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...