-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feature](hive)Support hive tables after alter type. #25138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
be/src/vec/exec/format/convert.h
Outdated
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| #include <gen_cpp/Metrics_types.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'gen_cpp/Metrics_types.h' file not found [clang-diagnostic-error]
#include <gen_cpp/Metrics_types.h>
^
be/src/vec/exec/format/convert.h
Outdated
|
|
||
| template <typename src_type, typename dst_type, bool is_nullable> | ||
| struct NumberColumnConvert : public ColumnConvert { | ||
| virtual Status convert(const IColumn* src_col, IColumn* dst_col) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'virtual' is redundant since the function is already declared 'override' [modernize-use-override]
| virtual Status convert(const IColumn* src_col, IColumn* dst_col) override; | |
| Status convert(const IColumn* src_col, IColumn* dst_col) override; |
be/src/vec/exec/format/convert.h
Outdated
| } | ||
| template <typename src_type, bool is_nullable> | ||
| struct NumberColumnToStringConvert : public ColumnConvert { | ||
| virtual Status convert(const IColumn* src_col, IColumn* dst_col) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'virtual' is redundant since the function is already declared 'override' [modernize-use-override]
| virtual Status convert(const IColumn* src_col, IColumn* dst_col) override; | |
| Status convert(const IColumn* src_col, IColumn* dst_col) override; |
be/src/vec/exec/format/convert.h
Outdated
| struct int128totimestamp : public ColumnConvert { | ||
| int128totimestamp(DocTime* pTime) { doc = pTime; } | ||
|
|
||
| inline uint64_t to_timestamp_micros(uint32_t hi, uint64_t lo) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'to_timestamp_micros' should be marked [[nodiscard]] [modernize-use-nodiscard]
| inline uint64_t to_timestamp_micros(uint32_t hi, uint64_t lo) const { | |
| [[nodiscard]] inline uint64_t to_timestamp_micros(uint32_t hi, uint64_t lo) const { |
be/src/vec/exec/format/convert.h
Outdated
| return (hi - ParquetInt96::JULIAN_EPOCH_OFFSET_DAYS) * ParquetInt96::MICROS_IN_DAY + | ||
| lo / ParquetInt96::NANOS_PER_MICROSECOND; | ||
| } | ||
| Status convert(const IColumn* src_col, IColumn* dst_col) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: annotate this function with 'override' or (rarely) 'final' [modernize-use-override]
| Status convert(const IColumn* src_col, IColumn* dst_col) { | |
| Status convert(const IColumn* src_col, IColumn* dst_col) override { |
be/src/vec/exec/format/convert.h
Outdated
| DocTime* doc; | ||
|
|
||
| public: | ||
| Status convert(const IColumn* src_col, IColumn* dst_col) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: annotate this function with 'override' or (rarely) 'final' [modernize-use-override]
| Status convert(const IColumn* src_col, IColumn* dst_col) { | |
| Status convert(const IColumn* src_col, IColumn* dst_col) override { |
be/src/vec/exec/format/convert.h
Outdated
| auto* src_data = static_cast<const ColumnVector<NumberType>*>(src_col)->get_data().data(); | ||
| dst_col->resize(rows); | ||
| DecimalScaleParams& scale_params = doc->_decode_params->decimal_scale; | ||
| auto* data = static_cast<ColumnDecimal<Decimal<Int64>>*>(dst_col)->get_data().data(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'data' is not initialized [cppcoreguidelines-init-variables]
| auto* data = static_cast<ColumnDecimal<Decimal<Int64>>*>(dst_col)->get_data().data(); | |
| auto* data = nullptr = static_cast<ColumnDecimal<Decimal<Int64>>*>(dst_col)->get_data().data(); |
be/src/vec/exec/format/convert.h
Outdated
| return Status::OK(); | ||
| } | ||
|
|
||
| public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: redundant access specifier has the same accessibility as the previous access specifier [readability-redundant-access-specifiers]
| public: |
Additional context
be/src/vec/exec/format/convert.h:365: previously declared here
public:
^| inline uint64_t to_timestamp_micros() const { | ||
| return (hi - JULIAN_EPOCH_OFFSET_DAYS) * MICROS_IN_DAY + lo / NANOS_PER_MICROSECOND; | ||
| } | ||
| inline __int128 to_int128() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'to_int128' should be marked [[nodiscard]] [modernize-use-nodiscard]
| inline __int128 to_int128() const { | |
| [[nodiscard]] inline __int128 to_int128() const { |
| } | ||
|
|
||
| BlockUPtr block = ctx->get_free_block(); | ||
| BlockUPtr block = ctx->get_free_block(); //create block <- _output_tuple_desc / 想要的结果 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'block' is not initialized [cppcoreguidelines-init-variables]
| BlockUPtr block = ctx->get_free_block(); //create block <- _output_tuple_desc / 想要的结果 | |
| BlockUPtr block = 0 = ctx->get_free_block(); //create block <- _output_tuple_desc / 想要的结果 |
|
schema change |
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
be/src/vec/exec/format/convert.h
Outdated
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| #include <gen_cpp/PlanNodes_types.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'gen_cpp/PlanNodes_types.h' file not found [clang-diagnostic-error]
#include <gen_cpp/PlanNodes_types.h>
^|
TeamCity be ut coverage result: |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
| DataTypePtr src_type; | ||
| ParquetConvert::convert_data_type_from_parquet(physical_type, src_type,type,&need_convert); | ||
|
|
||
| ColumnPtr src_column = doris_column; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'src_column' is not initialized [cppcoreguidelines-init-variables]
| ColumnPtr src_column = doris_column; | |
| ColumnPtr src_column = 0 = doris_column; |
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
| // do nothing | ||
| break; | ||
| } | ||
| ColumnSelectVector::DataReadType read_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'read_type' is not initialized [cppcoreguidelines-init-variables]
ColumnSelectVector::DataReadType read_type;
^| while (size_t run_length = select_vector.get_next_run<has_filter>(&read_type)) { | ||
| switch (read_type) { | ||
| case ColumnSelectVector::CONTENT: { | ||
| std::vector<StringRef> string_values; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'string_values' is not initialized [cppcoreguidelines-init-variables]
| std::vector<StringRef> string_values; | |
| std::vector<StringRef> string_values = 0; |
| } | ||
| string_values.emplace_back(_data->data + _offset, length); | ||
| _offset += length; | ||
| ColumnSelectVector::DataReadType read_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'read_type' is not initialized [cppcoreguidelines-init-variables]
ColumnSelectVector::DataReadType read_type;
^| while (size_t run_length = select_vector.get_next_run<has_filter>(&read_type)) { | ||
| switch (read_type) { | ||
| case ColumnSelectVector::CONTENT: { | ||
| std::vector<StringRef> string_values; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'string_values' is not initialized [cppcoreguidelines-init-variables]
| std::vector<StringRef> string_values; | |
| std::vector<StringRef> string_values = 0; |
| } | ||
|
|
||
| // read the bitwidth of each miniblock | ||
| uint8_t* bit_width_data = _delta_bit_widths.data(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'bit_width_data' is not initialized [cppcoreguidelines-init-variables]
| uint8_t* bit_width_data = _delta_bit_widths.data(); | |
| uint8_t* bit_width_data = nullptr = _delta_bit_widths.data(); |
| while (size_t run_length = select_vector.get_next_run<has_filter>(&read_type)) { | ||
| switch (read_type) { | ||
| case ColumnSelectVector::CONTENT: { | ||
| std::vector<StringRef> string_values; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'string_values' is not initialized [cppcoreguidelines-init-variables]
| std::vector<StringRef> string_values; | |
| std::vector<StringRef> string_values = 0; |
| auto& column_data = static_cast<ColumnType&>(*doris_column).get_data(); | ||
| size_t data_index = column_data.size(); | ||
| column_data.resize(data_index + select_vector.num_values() - select_vector.num_filtered()); | ||
| ColumnSelectVector::DataReadType read_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'read_type' is not initialized [cppcoreguidelines-init-variables]
ColumnSelectVector::DataReadType read_type;
^| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| #include <gen_cpp/PlanNodes_types.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'gen_cpp/PlanNodes_types.h' file not found [clang-diagnostic-error]
#include <gen_cpp/PlanNodes_types.h>
^| auto* src_data = static_cast<const ColumnVector<NumberType>*>(src_col)->get_data().data(); | ||
| dst_col->resize(rows); | ||
| DecimalScaleParams& scale_params = _convert_params->decimal_scale; | ||
| auto* data = static_cast<ColumnDecimal<Decimal<Int64>>*>(dst_col)->get_data().data(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'data' is not initialized [cppcoreguidelines-init-variables]
| auto* data = static_cast<ColumnDecimal<Decimal<Int64>>*>(dst_col)->get_data().data(); | |
| auto* data = nullptr = static_cast<ColumnDecimal<Decimal<Int64>>*>(dst_col)->get_data().data(); |
| DataTypePtr src_type; | ||
| ParquetConvert::convert_data_type_from_parquet(physical_type, src_type, type, &need_convert); | ||
|
|
||
| ColumnPtr src_column = doris_column; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'src_column' is not initialized [cppcoreguidelines-init-variables]
| ColumnPtr src_column = doris_column; | |
| ColumnPtr src_column = 0 = doris_column; |
|
run buildall |
1 similar comment
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
| DataTypePtr src_type; | ||
| RETURN_IF_ERROR( | ||
|
|
||
| ParquetConvert::convert_data_type_from_parquet(physical_type, src_type, type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'src_column' is not initialized [cppcoreguidelines-init-variables]
| ParquetConvert::convert_data_type_from_parquet(physical_type, src_type, type, | |
| ColumnPtr src_column = 0 = doris_column; |
28a1b8c to
7417819
Compare
|
run buildall |
2 similar comments
|
run buildall |
|
run buildall |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
|
run buildall |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
|
run buildall |
|
TeamCity be ut coverage result: |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
|
run buildall |
|
TeamCity be ut coverage result: |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
|
run p0 |
1 similar comment
|
run p0 |
| } | ||
|
|
||
| template <bool is_nullable> | ||
| struct int128totimestamp : public ColumnConvert { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Int128ToTimestamp, use upper case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there int128 column in parquet, and why convert to timestamp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int128 -> decimal128 / largeint
bb8b178 to
0adc21f
Compare
|
run buildall |
|
TeamCity be ut coverage result: |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
|
LGTM |
1.Reconstruct the logic of decode to read parquet. The parquet reader first reads the data according to the parquet physical type, and then performs a type conversion. 2.Support hive alter table.
1.Reconstruct the logic of decode to read parquet. The parquet reader first reads the data according to the parquet physical type, and then performs a type conversion. 2.Support hive alter table.
1.Reconstruct the logic of decode to read parquet. The parquet reader first reads the data according to the parquet physical type, and then performs a type conversion. 2.Support hive alter table.
This reverts commit a4e415a.
…ader (#32873) Following #25138, unified schema change interface for parquet and orc reader, and can be applied to other format readers as well. Unified schema change interface for all format readers: - First, read the data according to the column type of the file into source column; - Second, convert source column to the destination column with type planned by FE.
…ader (#32873) Following #25138, unified schema change interface for parquet and orc reader, and can be applied to other format readers as well. Unified schema change interface for all format readers: - First, read the data according to the column type of the file into source column; - Second, convert source column to the destination column with type planned by FE.
Proposed changes
Issue Number: close #xxx
1.Reconstruct the logic of decode to read parquet. The parquet reader first reads the data according to the parquet physical type, and then performs a type conversion.
2.Support hive alter table.
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...