[GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it #8351
[GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it #8351taiyang-li merged 3 commits intoapache:mainfrom
Conversation
|
Run Gluten Clickhouse CI on x86 |
3 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
9daf8db to
d92a3f8
Compare
|
Run Gluten Clickhouse CI on x86 |
4 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
|
||
| private: | ||
| template <typename T, typename ToDataType> | ||
| template <typename FromDataType, typename ToDataType, typename ColVecType, typename T = FromDataType::FieldType> |
There was a problem hiding this comment.
why not remove T and add "using T = typename FromDataType::FieldType" below
| } | ||
|
|
||
| template <is_decimal FromFieldType, typename ToDataType> | ||
| template <typename FromDataType, typename ToDataType, typename FromFieldType = FromDataType::FieldType> |
There was a problem hiding this comment.
remove FromFieldType, keep template parameters simple.
cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp
Outdated
Show resolved
Hide resolved
a7fc8fe to
8d94e26
Compare
|
Run Gluten Clickhouse CI on x86 |
...c/test/scala/org/apache/gluten/execution/tpch/GlutenClickHouseTPCHSaltNullParquetSuite.scala
Outdated
Show resolved
Hide resolved
| return true; | ||
| } | ||
| } | ||
| else if constexpr (IsDataTypeNumber<FromDataType>) |
There was a problem hiding this comment.
above if and else if should be merged. Remind: use using ColVecType = ColumnVectorOrDecimal<T>;
cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp
Outdated
Show resolved
Hide resolved
cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp
Outdated
Show resolved
Hide resolved
| args.emplace_back(addConstColumn(actions_dag, std::make_shared<DataTypeInt32>(), substrait_type.decimal().scale())); | ||
| result_node = toFunctionNode(actions_dag, "checkDecimalOverflowSparkOrNull", args); | ||
| int decimal_precision = substrait_type.decimal().precision(); | ||
| if (decimal_precision != 0) |
|
Run Gluten Clickhouse CI on x86 |
端到端性能测试 |
| else | ||
| return convertDecimalsImpl<DataTypeDecimal<Decimal256>, ToDataType>(decimal, precision_to, scale_from, scale_to, result); | ||
| { | ||
| if constexpr (std::is_same_v<FromFieldType, BFloat16>) |
| { | ||
| if constexpr (exception_mode == CheckExceptionMode::Null) | ||
| return false; | ||
| else |
There was a problem hiding this comment.
remove useless branch here and any other places.
|
Run Gluten Clickhouse CI on x86 |
|
|
||
| template <typename FromDataType, typename ToDataType> | ||
| requires(IsDataTypeNumber<FromDataType> && IsDataTypeDecimal<ToDataType>) | ||
| static bool convertNumberToDecimalImpl( |
| : static_cast<int>(std::log10(std::fabs(int_part))) + 1; | ||
| /// If the integer part's digits of the number is greater than (precision - scale), e.g. cast(55 as decimal(2, 1)), | ||
| /// then we should return NULL or throw exceptions. | ||
| if (int_part_digits > precision - scale) |
There was a problem hiding this comment.
if and else could be merged return int_part_digits > precision - scale && tryConvertToDecimal<FromDataType, ToDataType>(value, scale, result);
|
|
||
| int int_part_digits = int_part == 0 ? 1 : | ||
| int_part > 0 ? static_cast<int>(std::log10(int_part)) + 1 | ||
| : static_cast<int>(std::log10(std::fabs(int_part))) + 1; |
There was a problem hiding this comment.
I guess std::log10 and std::fabs is too heavy for this function. Maybe it is better:
auto casted_int_part = static_cast<ToDataType::FieldType>(casted_int_part);
bool overflow = casted_int_part >= min_value && casted_int_value <= max_value; min_value/max_value is the minimum/maximum value which could be represented in precision - scale digits. They could be calculated outside for loop, which remove the cost of std::log10 and std::fabs.
| if constexpr (std::is_same_v<FromFieldType, BFloat16>) | ||
| return tryConvertToDecimal<DataTypeFloat32, ToDataType>(static_cast<Float32>(value), scale, result); | ||
| else | ||
| return tryConvertToDecimal<FromDataType, ToDataType>(value, scale, result); |
There was a problem hiding this comment.
I'm curious if (int_part_digits > precision - scale) is true, will tryConvertToDecimal returns false?
| bool success = convertToDecimalImpl<T, ToDataType>(datas[i], precision, scale_from, scale_to, result); | ||
| bool success = convertToDecimalImpl<FromDataType, ToDataType>(datas[i], precision, scale_from, scale_to, result); | ||
|
|
||
| if (success) |
There was a problem hiding this comment.
remove if else in loops if possible
vec_to[i] = static_cast<ToFieldType>(result);
(*vec_null_map_to)[i] = success;|
Run Gluten Clickhouse CI on x86 |
端到端性能测试 |
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
| const typename FromDataType::FieldType & value, | ||
| UInt32 precision, | ||
| UInt32 scale, | ||
| Int64 decimal_int_part_max, |
There was a problem hiding this comment.
It is not enough to represent min/max value. Consider precision = 38 and scale = 0.
There was a problem hiding this comment.
use NativeTypeToDataType::FieldType
|
Run Gluten Clickhouse CI on x86 |
f70615e to
db3a9e2
Compare
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
2 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
0b64fdc to
06ec1c3
Compare
|
Run Gluten Clickhouse CI on x86 |
What changes were proposed in this pull request?
Fixes: #8343 and improve performance(#8351 (comment))
How was this patch tested?
test by ut