[GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it by KevinyhZou · Pull Request #8351 · apache/gluten

KevinyhZou · 2024-12-26T06:44:10Z

What changes were proposed in this pull request?

Fixes: #8343 and improve performance(#8351 (comment))

How was this patch tested?

test by ut

github-actions · 2024-12-26T06:44:28Z

#8343

github-actions · 2024-12-26T06:44:42Z

Run Gluten Clickhouse CI on x86

github-actions · 2024-12-26T06:45:41Z

Run Gluten Clickhouse CI on x86

github-actions · 2024-12-26T08:43:52Z

Run Gluten Clickhouse CI on x86

github-actions · 2024-12-27T01:52:14Z

Run Gluten Clickhouse CI on x86

github-actions · 2024-12-27T07:24:28Z

Run Gluten Clickhouse CI on x86

github-actions · 2024-12-27T07:39:04Z

Run Gluten Clickhouse CI on x86

github-actions · 2024-12-27T10:04:22Z

Run Gluten Clickhouse CI on x86

github-actions · 2024-12-30T08:22:54Z

Run Gluten Clickhouse CI on x86

github-actions · 2024-12-30T09:13:58Z

Run Gluten Clickhouse CI on x86

taiyang-li · 2025-01-06T08:08:11Z

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp


 private:
-    template <typename T, typename ToDataType>
+    template <typename FromDataType, typename ToDataType, typename ColVecType, typename T = FromDataType::FieldType>


why not remove T and add "using T = typename FromDataType::FieldType" below

taiyang-li · 2025-01-06T08:08:59Z

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp

    }

-    template <is_decimal FromFieldType, typename ToDataType>
+    template <typename FromDataType, typename ToDataType, typename FromFieldType = FromDataType::FieldType>


remove FromFieldType, keep template parameters simple.

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp

github-actions · 2025-01-06T09:49:07Z

Run Gluten Clickhouse CI on x86

...c/test/scala/org/apache/gluten/execution/tpch/GlutenClickHouseTPCHSaltNullParquetSuite.scala

taiyang-li · 2025-01-07T01:49:54Z

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp

+                    return true;
+                }
+            }
+            else if constexpr (IsDataTypeNumber<FromDataType>)


above if and else if should be merged. Remind: use using ColVecType = ColumnVectorOrDecimal<T>;

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp

taiyang-li · 2025-01-07T01:54:21Z

cpp-ch/local-engine/Parser/ExpressionParser.cpp

-                args.emplace_back(addConstColumn(actions_dag, std::make_shared<DataTypeInt32>(), substrait_type.decimal().scale()));
-                result_node = toFunctionNode(actions_dag, "checkDecimalOverflowSparkOrNull", args);
+                int decimal_precision = substrait_type.decimal().precision();
+                if (decimal_precision != 0)


if (decimal_precision)

github-actions · 2025-01-07T07:17:56Z

Run Gluten Clickhouse CI on x86

KevinyhZou · 2025-01-07T07:21:40Z

端到端性能测试

测试sql： select count(1) from test_tbl where cast(d as decimal(5,2)) > 1;
数据量：60000000
PR改动前：
2.18s 2.245s， 2.259s
PR改动后：
2.623s， 2.618s，2.659s；

valian耗时：
15.936s，15.011s，16.411s；

taiyang-li · 2025-01-07T08:17:51Z

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp

        else
-            return convertDecimalsImpl<DataTypeDecimal<Decimal256>, ToDataType>(decimal, precision_to, scale_from, scale_to, result);
+        {
+            if constexpr (std::is_same_v<FromFieldType, BFloat16>)


remove useless branch

taiyang-li · 2025-01-07T08:18:43Z

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp

+        {
+            if constexpr (exception_mode == CheckExceptionMode::Null)
+                return false;
+            else


remove useless branch here and any other places.

github-actions · 2025-01-07T08:20:25Z

Run Gluten Clickhouse CI on x86

taiyang-li · 2025-01-07T08:23:42Z

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp

+
+    template <typename FromDataType, typename ToDataType>
+    requires(IsDataTypeNumber<FromDataType> && IsDataTypeDecimal<ToDataType>)
+    static bool convertNumberToDecimalImpl(


ALWAYS_INLINE

taiyang-li · 2025-01-07T08:24:59Z

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp

+                                           : static_cast<int>(std::log10(std::fabs(int_part))) + 1;
+        /// If the integer part's digits of the number is greater than (precision - scale), e.g. cast(55 as decimal(2, 1)),
+        /// then we should return NULL or throw exceptions.
+        if (int_part_digits > precision - scale)


if and else could be merged return int_part_digits > precision - scale && tryConvertToDecimal<FromDataType, ToDataType>(value, scale, result);

taiyang-li · 2025-01-07T08:33:26Z

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp

+
+        int int_part_digits = int_part == 0 ? 1 :
+                              int_part > 0 ? static_cast<int>(std::log10(int_part)) + 1
+                                           : static_cast<int>(std::log10(std::fabs(int_part))) + 1;


I guess std::log10 and std::fabs is too heavy for this function. Maybe it is better:

auto casted_int_part = static_cast<ToDataType::FieldType>(casted_int_part); bool overflow = casted_int_part >= min_value && casted_int_value <= max_value;

min_value/max_value is the minimum/maximum value which could be represented in precision - scale digits. They could be calculated outside for loop, which remove the cost of std::log10 and std::fabs.

taiyang-li · 2025-01-07T08:35:18Z

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp

+            if constexpr (std::is_same_v<FromFieldType, BFloat16>)
+                return tryConvertToDecimal<DataTypeFloat32, ToDataType>(static_cast<Float32>(value), scale, result);
+            else
+                return tryConvertToDecimal<FromDataType, ToDataType>(value, scale, result);


I'm curious if (int_part_digits > precision - scale) is true, will tryConvertToDecimal returns false?

taiyang-li · 2025-01-07T08:42:32Z

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp

-            bool success = convertToDecimalImpl<T, ToDataType>(datas[i], precision, scale_from, scale_to, result);
+            bool success = convertToDecimalImpl<FromDataType, ToDataType>(datas[i], precision, scale_from, scale_to, result);

            if (success)


remove if else in loops if possible

vec_to[i] = static_cast<ToFieldType>(result); (*vec_null_map_to)[i] = success;

github-actions · 2025-01-07T12:37:57Z

Run Gluten Clickhouse CI on x86

KevinyhZou · 2025-01-07T12:39:23Z

端到端性能测试

测试sql： select count(1) from test_tbl where cast(d as decimal(5,2)) > 1;
数据量：60000000
PR改动前：
2.18s 2.245s， 2.259s
PR改动后：
1.988s, 1.891s, 1.933s

valian耗时：
15.936s，15.011s，16.411s；

github-actions · 2025-01-07T12:42:42Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-01-07T13:35:01Z

Run Gluten Clickhouse CI on x86

taiyang-li · 2025-01-07T15:31:36Z

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp

        const typename FromDataType::FieldType & value,
-        UInt32 precision,
        UInt32 scale,
+        Int64 decimal_int_part_max,


It is not enough to represent min/max value. Consider precision = 38 and scale = 0.

use NativeTypeToDataType::FieldType

github-actions · 2025-01-09T02:24:36Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-01-09T02:31:11Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-01-09T06:05:53Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-01-09T06:33:06Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-01-09T07:47:49Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-01-09T10:05:59Z

Run Gluten Clickhouse CI on x86

github-actions bot added the CLICKHOUSE label Dec 26, 2024

KevinyhZou force-pushed the fix_cast_number_to_decimal branch from 9daf8db to d92a3f8 Compare December 27, 2024 07:23

taiyang-li reviewed Jan 6, 2025

View reviewed changes

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp Show resolved Hide resolved

taiyang-li reviewed Jan 6, 2025

View reviewed changes

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp Outdated Show resolved Hide resolved

taiyang-li reviewed Jan 6, 2025

View reviewed changes

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp Show resolved Hide resolved

KevinyhZou force-pushed the fix_cast_number_to_decimal branch from a7fc8fe to 8d94e26 Compare January 6, 2025 09:48

taiyang-li reviewed Jan 7, 2025

View reviewed changes

...c/test/scala/org/apache/gluten/execution/tpch/GlutenClickHouseTPCHSaltNullParquetSuite.scala Outdated Show resolved Hide resolved

taiyang-li reviewed Jan 7, 2025

View reviewed changes

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp Outdated Show resolved Hide resolved

taiyang-li reviewed Jan 7, 2025

View reviewed changes

cpp-ch/local-engine/Functions/SparkFunctionCheckDecimalOverflow.cpp Outdated Show resolved Hide resolved

taiyang-li reviewed Jan 7, 2025

View reviewed changes

taiyang-li requested changes Jan 7, 2025

View reviewed changes

taiyang-li reviewed Jan 7, 2025

View reviewed changes

fix cast number to decimal

db3a9e2

KevinyhZou force-pushed the fix_cast_number_to_decimal branch from f70615e to db3a9e2 Compare January 9, 2025 02:30

simply code

5881bc4

fix ci

06ec1c3

KevinyhZou force-pushed the fix_cast_number_to_decimal branch from 0b64fdc to 06ec1c3 Compare January 9, 2025 10:05

taiyang-li approved these changes Jan 10, 2025

View reviewed changes

taiyang-li merged commit 66e816f into apache:main Jan 10, 2025

taiyang-li changed the title ~~[GLUTEN-8343][CH]Fix cast number to decimal~~ [GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it Jan 10, 2025

Conversation

KevinyhZou commented Dec 26, 2024 • edited by taiyang-li Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

github-actions bot commented Dec 26, 2024

Uh oh!

github-actions bot commented Dec 26, 2024

Uh oh!

github-actions bot commented Dec 26, 2024

Uh oh!

github-actions bot commented Dec 26, 2024

Uh oh!

github-actions bot commented Dec 27, 2024

Uh oh!

github-actions bot commented Dec 27, 2024

Uh oh!

github-actions bot commented Dec 27, 2024

Uh oh!

github-actions bot commented Dec 27, 2024

Uh oh!

github-actions bot commented Dec 30, 2024

Uh oh!

github-actions bot commented Dec 30, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 6, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

KevinyhZou commented Jan 7, 2025

端到端性能测试

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taiyang-li Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taiyang-li Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taiyang-li Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

KevinyhZou commented Jan 7, 2025

端到端性能测试

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KevinyhZou commented Dec 26, 2024 •

edited by taiyang-li

Loading

taiyang-li Jan 7, 2025 •

edited

Loading

taiyang-li Jan 7, 2025 •

edited

Loading

taiyang-li Jan 7, 2025 •

edited

Loading