-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[enhancement](parquet) efficient page skipping by page location #32785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
4dcc089 to
2c68efc
Compare
2c68efc to
d14250e
Compare
be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp
Outdated
Show resolved
Hide resolved
be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp
Outdated
Show resolved
Hide resolved
|
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
| page_data.size -= dl + rl; | ||
| } | ||
|
|
||
| Status ColumnChunkReader::load_page_data() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'load_page_data' has cognitive complexity of 80 (threshold 50) [readability-function-cognitive-complexity]
Status ColumnChunkReader::load_page_data() {
^Additional context
be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:129: +1, including nesting penalty of 0, nesting level increased to 1
if (_offset_index) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:131: +2, including nesting penalty of 1, nesting level increased to 2
RETURN_IF_ERROR(_page_reader->parse_page_header());
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:131: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(_page_reader->parse_page_header());
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:136: +1, including nesting penalty of 0, nesting level increased to 1
if (UNLIKELY(_state != HEADER_PARSED)) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:143: +1, including nesting penalty of 0, nesting level increased to 1
if (_block_compress_codec != nullptr) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:145: +2, including nesting penalty of 1, nesting level increased to 2
RETURN_IF_ERROR(_page_reader->get_page_data(compressed_data));
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:145: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(_page_reader->get_page_data(compressed_data));
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:146: +2, including nesting penalty of 1, nesting level increased to 2
if (header.__isset.data_page_header_v2) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:155: +1
header.__isset.data_page_header_v2 && header.data_page_header_v2.is_compressed;
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:156: +2, including nesting penalty of 1, nesting level increased to 2
if (header.__isset.data_page_header || is_v2_compressed) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:156: +1
if (header.__isset.data_page_header || is_v2_compressed) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:162: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(_block_compress_codec->decompress(compressed_data, &_page_data));
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:162: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(_block_compress_codec->decompress(compressed_data, &_page_data));
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:163: +1, nesting level increased to 2
} else {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:167: +1, nesting level increased to 1
} else {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:168: +2, including nesting penalty of 1, nesting level increased to 2
RETURN_IF_ERROR(_page_reader->get_page_data(_page_data));
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:168: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(_page_reader->get_page_data(_page_data));
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:169: +2, including nesting penalty of 1, nesting level increased to 2
if (header.__isset.data_page_header_v2) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:175: +1, including nesting penalty of 0, nesting level increased to 1
if (_max_rep_level > 0) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:177: +2, including nesting penalty of 1, nesting level increased to 2
if (header.__isset.data_page_header_v2) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:178: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(_rep_level_decoder.init_v2(_v2_rep_levels, _max_rep_level,
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:178: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(_rep_level_decoder.init_v2(_v2_rep_levels, _max_rep_level,
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:180: +1, nesting level increased to 2
} else {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:181: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(_rep_level_decoder.init(
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:181: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(_rep_level_decoder.init(
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:186: +1, including nesting penalty of 0, nesting level increased to 1
if (_max_def_level > 0) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:188: +2, including nesting penalty of 1, nesting level increased to 2
if (header.__isset.data_page_header_v2) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:189: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(_def_level_decoder.init_v2(_v2_def_levels, _max_def_level,
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:189: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(_def_level_decoder.init_v2(_v2_def_levels, _max_def_level,
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:191: +1, nesting level increased to 2
} else {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:192: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(_def_level_decoder.init(
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:192: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(_def_level_decoder.init(
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:197: +1, including nesting penalty of 0, nesting level increased to 1
auto encoding = header.__isset.data_page_header_v2 ? header.data_page_header_v2.encoding
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:200: +1, including nesting penalty of 0, nesting level increased to 1
if (encoding == tparquet::Encoding::PLAIN_DICTIONARY) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:205: +1, including nesting penalty of 0, nesting level increased to 1
if (_decoders.find(static_cast<int>(encoding)) != _decoders.end()) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:207: +1, nesting level increased to 1
} else {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:209: +2, including nesting penalty of 1, nesting level increased to 2
RETURN_IF_ERROR(Decoder::get_decoder(_metadata.type, encoding, page_decoder));
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:209: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(Decoder::get_decoder(_metadata.type, encoding, page_decoder));
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
| page_data.size -= dl + rl; | ||
| } | ||
|
|
||
| Status ColumnChunkReader::load_page_data() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'load_page_data' has cognitive complexity of 84 (threshold 50) [readability-function-cognitive-complexity]
Status ColumnChunkReader::load_page_data() {
^Additional context
be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:127: +1, including nesting penalty of 0, nesting level increased to 1
if (_offset_index) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:129: +2, including nesting penalty of 1, nesting level increased to 2
if (UNLIKELY(!_page_reader->has_header_parsed())) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:130: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(_page_reader->parse_page_header());
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:130: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(_page_reader->parse_page_header());
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:135: +1, including nesting penalty of 0, nesting level increased to 1
if (UNLIKELY(!_page_reader->has_header_parsed())) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:142: +1, including nesting penalty of 0, nesting level increased to 1
if (_block_compress_codec != nullptr) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:144: +2, including nesting penalty of 1, nesting level increased to 2
RETURN_IF_ERROR(_page_reader->get_page_data(compressed_data));
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:144: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(_page_reader->get_page_data(compressed_data));
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:145: +2, including nesting penalty of 1, nesting level increased to 2
if (header.__isset.data_page_header_v2) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:154: +1
header.__isset.data_page_header_v2 && header.data_page_header_v2.is_compressed;
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:155: +2, including nesting penalty of 1, nesting level increased to 2
if (header.__isset.data_page_header || is_v2_compressed) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:155: +1
if (header.__isset.data_page_header || is_v2_compressed) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:161: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(_block_compress_codec->decompress(compressed_data, &_page_data));
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:161: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(_block_compress_codec->decompress(compressed_data, &_page_data));
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:162: +1, nesting level increased to 2
} else {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:166: +1, nesting level increased to 1
} else {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:167: +2, including nesting penalty of 1, nesting level increased to 2
RETURN_IF_ERROR(_page_reader->get_page_data(_page_data));
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:167: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(_page_reader->get_page_data(_page_data));
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:168: +2, including nesting penalty of 1, nesting level increased to 2
if (header.__isset.data_page_header_v2) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:174: +1, including nesting penalty of 0, nesting level increased to 1
if (_max_rep_level > 0) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:176: +2, including nesting penalty of 1, nesting level increased to 2
if (header.__isset.data_page_header_v2) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:177: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(_rep_level_decoder.init_v2(_v2_rep_levels, _max_rep_level,
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:177: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(_rep_level_decoder.init_v2(_v2_rep_levels, _max_rep_level,
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:179: +1, nesting level increased to 2
} else {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:180: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(_rep_level_decoder.init(
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:180: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(_rep_level_decoder.init(
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:185: +1, including nesting penalty of 0, nesting level increased to 1
if (_max_def_level > 0) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:187: +2, including nesting penalty of 1, nesting level increased to 2
if (header.__isset.data_page_header_v2) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:188: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(_def_level_decoder.init_v2(_v2_def_levels, _max_def_level,
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:188: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(_def_level_decoder.init_v2(_v2_def_levels, _max_def_level,
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:190: +1, nesting level increased to 2
} else {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:191: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(_def_level_decoder.init(
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:191: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(_def_level_decoder.init(
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:196: +1, including nesting penalty of 0, nesting level increased to 1
auto encoding = header.__isset.data_page_header_v2 ? header.data_page_header_v2.encoding
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:199: +1, including nesting penalty of 0, nesting level increased to 1
if (encoding == tparquet::Encoding::PLAIN_DICTIONARY) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:204: +1, including nesting penalty of 0, nesting level increased to 1
if (_decoders.find(static_cast<int>(encoding)) != _decoders.end()) {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:206: +1, nesting level increased to 1
} else {
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:208: +2, including nesting penalty of 1, nesting level increased to 2
RETURN_IF_ERROR(Decoder::get_decoder(_metadata.type, encoding, page_decoder));
^be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'
do { \
^be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:208: +3, including nesting penalty of 2, nesting level increased to 3
RETURN_IF_ERROR(Decoder::get_decoder(_metadata.type, encoding, page_decoder));
^be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^c3e6ee9 to
7893f24
Compare
| // Parse dictionary data when reading | ||
| // RETURN_IF_ERROR(_page_reader->next_page_header()); | ||
| // RETURN_IF_ERROR(_decode_dict_page()); | ||
| _page_reader->seek_page(_metadata.dictionary_page_offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change seek_to_page to seek_page , and remove the above comments.
| return next_page(); | ||
| } else if (_page_reader->get_page_header()->type == tparquet::PageType::DATA_PAGE_V2) { | ||
|
|
||
| if (_offset_index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't add is-else logic, the underlying PageReader supply the unified interface.
| // and skip_page() to skip pages one by one. | ||
| // todo: change this interface to seek_to_page(int64_t page_header_offset, size_t num_parsed_values) | ||
| // and set _chunk_parsed_values = num_parsed_values | ||
| // [[deprecated]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's best to keep this interface, only the page with OffsetIndex can call this function.
| void _get_uncompressed_levels(const tparquet::DataPageHeaderV2& page_v2, Slice& page_data); | ||
|
|
||
| //Returns the number of values in the current page. | ||
| int64_t _get_page_num_values() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to PageReader
| const tparquet::RowGroup& row_group, | ||
| const std::vector<RowRange>& row_ranges, cctz::time_zone* ctz, | ||
| io::IOContext* io_ctx, | ||
| io::IOContext* io_ctx, const tparquet::OffsetIndex* offset_index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The complex type column will get wrong OffsetIndex
| io::IOContext* _io_ctx = nullptr; | ||
| tparquet::PageHeader _cur_page_header; | ||
| Statistics _statistics; | ||
| tparquet::PageHeader _cur_page_header {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add {}
| const FieldSchema* col_schema = schema_desc.get_column(read_col); | ||
| static_cast<void>(page_index.collect_skipped_page_range( | ||
| &column_index, conjuncts, col_schema, skipped_page_range, *_ctz)); | ||
| if (skipped_page_range.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never change the main logic in ParquetReader
| _page_reader->seek_page(_metadata.data_page_offset); | ||
| } | ||
| _state = INITIALIZED; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dictionary page can be parsed lazily.
| if (_offset_index) { | ||
| _page_index++; | ||
| if (_page_index == _offset_index->page_locations.size()) { | ||
| return Status::EndOfFile("End of file"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return other errors, don't return EndOfFile, because is has special meaning in VfileScanner.
Proposed changes
Make skip_page() in ColumnChunkReader more efficient. No more reading page headers if there are pagelocations in chunk.
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...