-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feature-wip](parquet-vec) Support parquet scanner in vectorized engine #9231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| runtime/vdata_stream_mgr.cpp | ||
| runtime/vpartition_info.cpp | ||
| runtime/vsorted_run_merger.cpp | ||
| runtime/vload_channel.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code not add in this PR?
| #define FOR_ARROW_TYPES(M) \ | ||
| M(::arrow::Type::BOOL, TYPE_BOOLEAN) \ | ||
| M(::arrow::Type::INT8, TYPE_TINYINT) \ | ||
| M(::arrow::Type::UINT8, TYPE_TINYINT) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UINT8 bigger than tinyint ?may overflow
| PaddedPODArray<UInt8> & column_chars_t = assert_cast<ColumnString &>(*data_column).get_chars(); | ||
| PaddedPODArray<UInt32> & column_offsets = assert_cast<ColumnString &>(*data_column).get_offsets(); | ||
|
|
||
| const auto & concrete_array = dynamic_cast<const arrow::BinaryArray &>(*array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code format: auto&. same to other.
| PaddedPODArray<UInt8> & column_chars_t = assert_cast<ColumnString &>(*data_column).get_chars(); | ||
| PaddedPODArray<UInt32> & column_offsets = assert_cast<ColumnString &>(*data_column).get_offsets(); | ||
|
|
||
| const auto & concrete_array = dynamic_cast<const arrow::BinaryArray &>(*array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rethink here really need dynamic_cast ?
dynamic_castmeans bad design of class or code, replace with virtual function- this code may throw exception if cast faild. it is danger here.
| return Status::OK(); | ||
| } | ||
|
|
||
| Status arrow_column_to_doris_column(const arrow::Array* arrow_column, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add UT for this class
| // init arrow batch | ||
| { | ||
| Status st = init_arrow_batch_if_necessary(); | ||
| if (!st.ok()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code is weird? rethink the logic?
|
|
||
| // eval conjuncts, for example: t1 > 1 | ||
| Status VParquetScanner::eval_conjunts(Block* block) { | ||
| for (auto& vctx : _pre_filter_vctxs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may should change _pre_filter_vctxs to a tree to avoid the each time of copy operation
|
close this pr, use this #9433 |
Proposed changes
Issue Number: close #xxx
Problem Summary:
Optimization Todo list:
Performance Testing:
load parquet file in vec version almost 1x faster than rowset version.
rows num:300k
test table schema:
CREATE TABLE
parquet(idint(11) NOT NULL COMMENT "",emailvarchar(26) NOT NULL COMMENT "",c_date32DATE NOT NULL COMMENT "",c_date64DATETIME NOT NULL COMMENT "",c_timestampDATETIME NOT NULL COMMENT "",c_decimal128DECIMAL(27, 9) NULL COMMENT "",c_boolBOOLEAN NULL COMMENT "",c_floatFLOAT NULL COMMENT "",c_doubleDOUBLE NULL COMMENT "",c_fixed_size_binaryCHAR(20) NULL COMMENT "",c_binaryVARCHAR(32) NULL COMMENT "",c_uint64BIGINT NULL COMMENT "")
DISTRIBUTED BY HASH(
id) BUCKETS 1PROPERTIES (
"replication_num" = "1"
);
Checklist(Required)
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...