-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Feature](Row store) support column group with store row format for partial columns of table #34089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
|
|
||
| int32_t rs_column_uid() const { return _rs_column_uid; } | ||
|
|
||
| const std::unordered_set<int32_t> missing_col_uids() const { return _missing_col_uids; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: return type 'const int' is 'const'-qualified at the top level, which may reduce code readability without improving const correctness [readability-const-return-type]
| const std::unordered_set<int32_t> missing_col_uids() const { return _missing_col_uids; } | |
| std::unordered_set<int32_t> missing_col_uids() const { return _missing_col_uids; } |
|
|
||
| const std::unordered_set<int32_t> missing_col_uids() const { return _missing_col_uids; } | ||
|
|
||
| const std::unordered_set<int32_t> include_col_uids() const { return _include_col_uids; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: return type 'const int' is 'const'-qualified at the top level, which may reduce code readability without improving const correctness [readability-const-return-type]
| const std::unordered_set<int32_t> include_col_uids() const { return _include_col_uids; } | |
| std::unordered_set<int32_t> include_col_uids() const { return _include_col_uids; } |
|
run buildall |
|
TeamCity be ut coverage result: |
|
run buildall |
TPC-H: Total hot run time: 40423 ms |
TPC-DS: Total hot run time: 188110 ms |
|
TeamCity be ut coverage result: |
xiaokang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fe reviewed
fe/fe-core/src/main/java/org/apache/doris/common/util/PropertyAnalyzer.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/catalog/TableProperty.java
Outdated
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| if (properties != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put it inside row store branch
fe/fe-core/src/main/java/org/apache/doris/analysis/CreateTableStmt.java
Outdated
Show resolved
Hide resolved
xiaokang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be reviewed
be/src/olap/tablet_schema.h
Outdated
|
|
||
| // groups of this column belongs to | ||
| // contains column ids of group, and encode the group of columns into row store | ||
| std::vector<int32_t> _group_ids; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_row_store_group_ids
be/src/olap/tablet_schema.h
Outdated
| void set_store_row_column(bool store_row_column) { _store_row_column = store_row_column; } | ||
| bool store_row_column() const { return _store_row_column; } | ||
| // indicate if full row store column(all the columns encodes as row) exists | ||
| bool has_full_row_store_column() const { return _store_row_column; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consistent name is better
be/src/vec/jsonb/serialize.h
Outdated
| static void block_to_jsonb(const TabletSchema& schema, const Block& block, ColumnString& dst, | ||
| int num_cols, const DataTypeSerDeSPtrs& serdes); | ||
| int num_cols, const DataTypeSerDeSPtrs& serdes, | ||
| const std::unordered_set<int32_t>& group_cids); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consistent name include_cids is better
| std::unordered_map<uint32_t, uint32_t> _col_uid_to_idx; | ||
| std::vector<std::string> _col_default_values; | ||
| // picked rowstore(column group) column unique id | ||
| int32_t _rs_column_uid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the meaning of name rs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_row_store_coloumn_uid
| const TabletColumn& target_rs_column = schema.column_by_uid(target_rs_column_id); | ||
| DCHECK(target_rs_column.is_row_store_column()); | ||
| // The full column group is considered a full match, thus no missing cids | ||
| if (target_rs_column.group_ids().empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider that if it's right after light schema chage
6f0854c to
1981d22
Compare
|
run buildall |
1 similar comment
|
run buildall |
|
TeamCity be ut coverage result: |
|
run buildall |
|
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 40900 ms |
TPC-DS: Total hot run time: 172450 ms |
|
run buildall |
|
TeamCity be ut coverage result: |
|
run buildall |
|
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 40387 ms |
TPC-DS: Total hot run time: 173431 ms |
ClickBench: Total hot run time: 30.77 s |
xiaokang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…artial columns of table (#34089)
### What problem does this PR solve? Problem Summary: 1. Because defer materialization in BE do not support project on scan node, so we use a little trick to do projection on last fragment's OutputExprs. 2. fix core when table use partial column row store, introduced by #34089 thrift error, reason=No more data to read.F20250211 12:07:07.526710 15029400 rowid_fetcher.cpp:410] Check failed: tablet->tablet_schema()->has_row_store_for_all_columns() *** Check failure stack trace: *** @ 0x10ddd2830 google::LogMessageFatal::~LogMessageFatal() @ 0x10ddce834 google::LogMessageFatal::~LogMessageFatal() @ 0x102a961a8 doris::RowIdStorageReader::read_by_rowids() @ 0x103938624 std::__1::__function::__func<>::operator()() @ 0x103922390 doris::WorkThreadPool<>::work_thread() @ 0x1039228fc std::__1::__thread_proxy<>() @ 0x18300f2e4 _pthread_start
### What problem does this PR solve? Problem Summary: 1. Because defer materialization in BE do not support project on scan node, so we use a little trick to do projection on last fragment's OutputExprs. 2. fix core when table use partial column row store, introduced by apache#34089 thrift error, reason=No more data to read.F20250211 12:07:07.526710 15029400 rowid_fetcher.cpp:410] Check failed: tablet->tablet_schema()->has_row_store_for_all_columns() *** Check failure stack trace: *** @ 0x10ddd2830 google::LogMessageFatal::~LogMessageFatal() @ 0x10ddce834 google::LogMessageFatal::~LogMessageFatal() @ 0x102a961a8 doris::RowIdStorageReader::read_by_rowids() @ 0x103938624 std::__1::__function::__func<>::operator()() @ 0x103922390 doris::WorkThreadPool<>::work_thread() @ 0x1039228fc std::__1::__thread_proxy<>() @ 0x18300f2e4 _pthread_start
### What problem does this PR solve? Problem Summary: 1. Because defer materialization in BE do not support project on scan node, so we use a little trick to do projection on last fragment's OutputExprs. 2. fix core when table use partial column row store, introduced by apache#34089 thrift error, reason=No more data to read.F20250211 12:07:07.526710 15029400 rowid_fetcher.cpp:410] Check failed: tablet->tablet_schema()->has_row_store_for_all_columns() *** Check failure stack trace: *** @ 0x10ddd2830 google::LogMessageFatal::~LogMessageFatal() @ 0x10ddce834 google::LogMessageFatal::~LogMessageFatal() @ 0x102a961a8 doris::RowIdStorageReader::read_by_rowids() @ 0x103938624 std::__1::__function::__func<>::operator()() @ 0x103922390 doris::WorkThreadPool<>::work_thread() @ 0x1039228fc std::__1::__thread_proxy<>() @ 0x18300f2e4 _pthread_start
Proposed changes
Issue Number: close #xxx
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...