-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[enhancement](function truncate) truncate can use column as scale argument #32746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run compile |
|
run compile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
|
run compile |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
run compile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
|
run compile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
|
run buildall |
TPC-H: Total hot run time: 38640 ms |
|
TeamCity be ut coverage result: |
TPC-DS: Total hot run time: 186873 ms |
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G' |
|
run p0 |
|
run p0 |
| || (children.get(1) instanceof CastExpr | ||
| && children.get(1).getChild(0) instanceof IntLiteral), | ||
| "2nd argument of function round/floor/ceil/truncate must be literal"); | ||
| "2nd argument of function round/floor/ceil must be literal"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not support all round family function? if truncate support slot as 2nd arg, other round family function should support this feature too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not support all round family function? if truncate support slot as 2nd arg, other round family function should support this feature too
Need extra work and many tests on be, maybe one extra week.
This PR completes a lot of basic work, support for other functions will be much easier, so I list them as TODO task.
be/src/vec/functions/round.h
Outdated
| FunctionRoundingImpl<ScaleMode::Zero>::apply(col->get_data()[i], scale, | ||
| vec_res[i]); | ||
| } else if (scale_arg > 0) { | ||
| size_t scale = int_exp10(col_scale_i32.get_data()[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
col_scale_i32.get_data()[i] seems no need get again, could use scale_arg directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can.
| typename ColumnVector<T>::Container& vec_res = col_res->get_data(); | ||
|
|
||
| for (size_t i = 0; i < intput_rows_cound; ++i) { | ||
| const Int16 scale_arg = col_scale_i32.get_data()[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in apply_vec_vec function, have check all data of scale_arg in [min, max]
here seems not have this check, it's no need or in other function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we need to do the check. I will re-organize the check loop
ClickBench: Total hot run time: 30.98 s |
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G' |
|
|
||
| if (scale_arg > std::numeric_limits<Int16>::max() || | ||
| scale_arg < std::numeric_limits<Int16>::min()) { | ||
| throw doris::Exception(ErrorCode::OUT_OF_BOUND, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw maybe will break auto-vectorization. basing on we prefer correct path's performance than wrong path, maybe a flag bool is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: actually this loop is separated from the loop that processes the data block, so the exception will not break the auto vectorization.
| } else if (children.size() == 2) { | ||
| Expr scaleExpr = children.get(1); | ||
| if (scaleExpr instanceof IntLiteral | ||
| || (scaleExpr instanceof CastExpr && scaleExpr.getChild(0) instanceof IntLiteral)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why consider castexpr here specially? what if it's other exprs which result is constexpr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For expr whose result is constexpr, we will go to branch in L146.
| public: | ||
| static FunctionPtr create() { return std::make_shared<FunctionTruncate>(); } | ||
|
|
||
| ColumnNumbers get_arguments_that_are_always_constant() const override { return {}; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if {}, no need to override this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It must be override. Since function_rounding, instread of IFuntion, is the base class and it has already implemented another retrun result.
| is_column_const(*block.get_by_position(arguments[1]).column)) { | ||
| // truncate(ColumnConst, ColumnConst) | ||
| auto col_general = assert_cast<const ColumnConst&>(*column_general.column) | ||
| .get_data_column() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_data_column_ptr directly? no clone_resized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense
regression-test/suites/query_p0/sql_functions/math_functions/test_function_truncate.groovy
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
| // should NOT behave like two column arguments, so we can not use const column default implementation | ||
| bool use_default_implementation_for_constants() const override { return false; } | ||
|
|
||
| Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'execute_impl' exceeds recommended size/complexity thresholds [readability-function-size]
Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
^Additional context
be/src/vec/functions/function_truncate.h:79: 159 lines including whitespace and comments (threshold 80)
Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
^| if constexpr (IsDataTypeNumber<DataType> || IsDataTypeDecimal<DataType>) { | ||
| using FieldType = typename DataType::FieldType; | ||
| res = Dispatcher<FieldType, RoundingMode::Trunc, | ||
| TieBreakingMode::Auto>::apply_vec_const(col_general, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: redundant boolean literal in conditional return statement [readability-simplify-boolean-expr]
be/src/vec/functions/function_truncate.h:102:
- if constexpr (IsDataTypeNumber<DataType> || IsDataTypeDecimal<DataType>) {
- using FieldType = typename DataType::FieldType;
- res = Dispatcher<FieldType, RoundingMode::Trunc,
- TieBreakingMode::Auto>::apply_vec_const(col_general, scale_arg);
- return true;
- }
-
- return false;
+ return IsDataTypeNumber<DataType> || IsDataTypeDecimal<DataType>;|
|
||
| if constexpr (IsDataTypeNumber<DataType> || IsDataTypeDecimal<DataType>) { | ||
| using FieldType = typename DataType::FieldType; | ||
| res = Dispatcher<FieldType, RoundingMode::Trunc, TieBreakingMode::Auto>:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: redundant boolean literal in conditional return statement [readability-simplify-boolean-expr]
be/src/vec/functions/function_truncate.h:178:
- if constexpr (IsDataTypeNumber<DataType> || IsDataTypeDecimal<DataType>) {
- using FieldType = typename DataType::FieldType;
- res = Dispatcher<FieldType, RoundingMode::Trunc, TieBreakingMode::Auto>::
- apply_const_vec(&const_col_general, column_scale.column.get());
- return true;
- }
-
- return false;
+ return IsDataTypeNumber<DataType> || IsDataTypeDecimal<DataType>;|
|
||
| if constexpr (IsDataTypeNumber<DataType> || IsDataTypeDecimal<DataType>) { | ||
| using FieldType = typename DataType::FieldType; | ||
| res = Dispatcher<FieldType, RoundingMode::Trunc, TieBreakingMode::Auto>:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: redundant boolean literal in conditional return statement [readability-simplify-boolean-expr]
be/src/vec/functions/function_truncate.h:211:
- if constexpr (IsDataTypeNumber<DataType> || IsDataTypeDecimal<DataType>) {
- using FieldType = typename DataType::FieldType;
- res = Dispatcher<FieldType, RoundingMode::Trunc, TieBreakingMode::Auto>::
- apply_vec_vec(column_general.column.get(), column_scale.column.get());
- return true;
- }
- return false;
+ return IsDataTypeNumber<DataType> || IsDataTypeDecimal<DataType>;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
| // should NOT behave like two column arguments, so we can not use const column default implementation | ||
| bool use_default_implementation_for_constants() const override { return false; } | ||
|
|
||
| Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'execute_impl' exceeds recommended size/complexity thresholds [readability-function-size]
Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
^Additional context
be/src/vec/functions/function_truncate.h:79: 161 lines including whitespace and comments (threshold 80)
Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
^| res = Dispatcher<FieldType, RoundingMode::Trunc, | ||
| TieBreakingMode::Auto>::apply_vec_const(col_general, | ||
| scale_arg); | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: redundant boolean literal in conditional return statement [readability-simplify-boolean-expr]
be/src/vec/functions/function_truncate.h:103:
- if constexpr (IsDataTypeNumber<DataType> || IsDataTypeDecimal<DataType>) {
- using FieldType = typename DataType::FieldType;
- res = Dispatcher<FieldType, RoundingMode::Trunc,
- TieBreakingMode::Auto>::apply_vec_const(col_general,
- scale_arg);
- return true;
- }
-
- return false;
+ return IsDataTypeNumber<DataType> || IsDataTypeDecimal<DataType>;| using FieldType = typename DataType::FieldType; | ||
| res = Dispatcher<FieldType, RoundingMode::Trunc, TieBreakingMode::Auto>:: | ||
| apply_vec_const(column_general.column.get(), scale_arg); | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: redundant boolean literal in conditional return statement [readability-simplify-boolean-expr]
be/src/vec/functions/function_truncate.h:145:
- if constexpr (IsDataTypeNumber<DataType> || IsDataTypeDecimal<DataType>) {
- using FieldType = typename DataType::FieldType;
- res = Dispatcher<FieldType, RoundingMode::Trunc, TieBreakingMode::Auto>::
- apply_vec_const(column_general.column.get(), scale_arg);
- return true;
- }
-
- return false;
+ return IsDataTypeNumber<DataType> || IsDataTypeDecimal<DataType>;| using FieldType = typename DataType::FieldType; | ||
| res = Dispatcher<FieldType, RoundingMode::Trunc, TieBreakingMode::Auto>:: | ||
| apply_const_vec(&const_col_general, column_scale.column.get()); | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: redundant boolean literal in conditional return statement [readability-simplify-boolean-expr]
be/src/vec/functions/function_truncate.h:180:
- if constexpr (IsDataTypeNumber<DataType> || IsDataTypeDecimal<DataType>) {
- using FieldType = typename DataType::FieldType;
- res = Dispatcher<FieldType, RoundingMode::Trunc, TieBreakingMode::Auto>::
- apply_const_vec(&const_col_general, column_scale.column.get());
- return true;
- }
-
- return false;
+ return IsDataTypeNumber<DataType> || IsDataTypeDecimal<DataType>;| using FieldType = typename DataType::FieldType; | ||
| res = Dispatcher<FieldType, RoundingMode::Trunc, TieBreakingMode::Auto>:: | ||
| apply_vec_vec(column_general.column.get(), column_scale.column.get()); | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: redundant boolean literal in conditional return statement [readability-simplify-boolean-expr]
be/src/vec/functions/function_truncate.h:213:
- if constexpr (IsDataTypeNumber<DataType> || IsDataTypeDecimal<DataType>) {
- using FieldType = typename DataType::FieldType;
- res = Dispatcher<FieldType, RoundingMode::Trunc, TieBreakingMode::Auto>::
- apply_vec_vec(column_general.column.get(), column_scale.column.get());
- return true;
- }
- return false;
+ return IsDataTypeNumber<DataType> || IsDataTypeDecimal<DataType>;|
run buildall |
|
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 38778 ms |
zclllyybb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
…ument (apache#32746) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…ument (#32746) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…ument (#32746) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1. fix compute wrong scale of round_bankers/truncate/round/dround/floor/dfloor/ceil/dceil, introduced by #32746 and #34391 this sql will throw exception: ```sql create table test_round_bankers_scale( id bigint, a decimal(32,6) ) properties ("replication_num" = "1"); select round_bankers(a, 1-2) from test_round_bankers_scale order by id (1105, 'errCode = 2, detailMessage = (172.20.48.119)[INTERNAL_ERROR]output type not match expr type , col name , expected type Nullable(Decimal(32, 6)) , real type Nullable(Decimal(32, 0))') ``` The scale of return type of the round_bankers function depends on the type of its arguments. If the second argument is an integer constant, that constant is used as the scale. If it is not a constant, the scale of the first argument is used instead. This leads to a problem: for expressions like `1 - 2`, the scale used in the computation differs before and after constant folding, which can cause the Backend to throw an exception: type mismatch So we need to fold the arguments into constants before binding the function, to ensure that the scale will not change 2. keep idempotent of function signature. when we select a signature for a function, we should not change signature after do some rewrite
Function truncate can use column as its scale argument. The precision infer rule of truncate is same with mysql:
see: https://dbfiddle.uk/Bi891fp4
close #32086
docs update: apache/doris-website#516