Skip to content

[GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it #8351

Merged
taiyang-li merged 3 commits intoapache:mainfrom
KevinyhZou:fix_cast_number_to_decimal
Jan 10, 2025
Merged

[GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it #8351
taiyang-li merged 3 commits intoapache:mainfrom
KevinyhZou:fix_cast_number_to_decimal

Conversation

@KevinyhZou
Copy link
Copy Markdown
Contributor

@KevinyhZou KevinyhZou commented Dec 26, 2024

What changes were proposed in this pull request?

Fixes: #8343 and improve performance(#8351 (comment))

How was this patch tested?

test by ut

@github-actions
Copy link
Copy Markdown

#8343

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

3 similar comments
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@KevinyhZou KevinyhZou force-pushed the fix_cast_number_to_decimal branch from 9daf8db to d92a3f8 Compare December 27, 2024 07:23
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

4 similar comments
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86


private:
template <typename T, typename ToDataType>
template <typename FromDataType, typename ToDataType, typename ColVecType, typename T = FromDataType::FieldType>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not remove T and add "using T = typename FromDataType::FieldType" below

}

template <is_decimal FromFieldType, typename ToDataType>
template <typename FromDataType, typename ToDataType, typename FromFieldType = FromDataType::FieldType>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove FromFieldType, keep template parameters simple.

@KevinyhZou KevinyhZou force-pushed the fix_cast_number_to_decimal branch from a7fc8fe to 8d94e26 Compare January 6, 2025 09:48
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 6, 2025

Run Gluten Clickhouse CI on x86

return true;
}
}
else if constexpr (IsDataTypeNumber<FromDataType>)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above if and else if should be merged. Remind: use using ColVecType = ColumnVectorOrDecimal<T>;

args.emplace_back(addConstColumn(actions_dag, std::make_shared<DataTypeInt32>(), substrait_type.decimal().scale()));
result_node = toFunctionNode(actions_dag, "checkDecimalOverflowSparkOrNull", args);
int decimal_precision = substrait_type.decimal().precision();
if (decimal_precision != 0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (decimal_precision)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 7, 2025

Run Gluten Clickhouse CI on x86

@KevinyhZou
Copy link
Copy Markdown
Contributor Author

端到端性能测试

测试sql: select count(1) from test_tbl where cast(d as decimal(5,2)) > 1;
数据量:60000000
PR改动前:
2.18s 2.245s, 2.259s
PR改动后:
2.623s, 2.618s,2.659s;

valian耗时:
15.936s,15.011s,16.411s;

else
return convertDecimalsImpl<DataTypeDecimal<Decimal256>, ToDataType>(decimal, precision_to, scale_from, scale_to, result);
{
if constexpr (std::is_same_v<FromFieldType, BFloat16>)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove useless branch

{
if constexpr (exception_mode == CheckExceptionMode::Null)
return false;
else
Copy link
Copy Markdown
Contributor

@taiyang-li taiyang-li Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove useless branch here and any other places.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 7, 2025

Run Gluten Clickhouse CI on x86


template <typename FromDataType, typename ToDataType>
requires(IsDataTypeNumber<FromDataType> && IsDataTypeDecimal<ToDataType>)
static bool convertNumberToDecimalImpl(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALWAYS_INLINE

: static_cast<int>(std::log10(std::fabs(int_part))) + 1;
/// If the integer part's digits of the number is greater than (precision - scale), e.g. cast(55 as decimal(2, 1)),
/// then we should return NULL or throw exceptions.
if (int_part_digits > precision - scale)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if and else could be merged return int_part_digits > precision - scale && tryConvertToDecimal<FromDataType, ToDataType>(value, scale, result);


int int_part_digits = int_part == 0 ? 1 :
int_part > 0 ? static_cast<int>(std::log10(int_part)) + 1
: static_cast<int>(std::log10(std::fabs(int_part))) + 1;
Copy link
Copy Markdown
Contributor

@taiyang-li taiyang-li Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess std::log10 and std::fabs is too heavy for this function. Maybe it is better:

auto casted_int_part = static_cast<ToDataType::FieldType>(casted_int_part); 
bool overflow = casted_int_part >= min_value && casted_int_value <= max_value;   

min_value/max_value is the minimum/maximum value which could be represented in precision - scale digits. They could be calculated outside for loop, which remove the cost of std::log10 and std::fabs.

if constexpr (std::is_same_v<FromFieldType, BFloat16>)
return tryConvertToDecimal<DataTypeFloat32, ToDataType>(static_cast<Float32>(value), scale, result);
else
return tryConvertToDecimal<FromDataType, ToDataType>(value, scale, result);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if (int_part_digits > precision - scale) is true, will tryConvertToDecimal returns false?

bool success = convertToDecimalImpl<T, ToDataType>(datas[i], precision, scale_from, scale_to, result);
bool success = convertToDecimalImpl<FromDataType, ToDataType>(datas[i], precision, scale_from, scale_to, result);

if (success)
Copy link
Copy Markdown
Contributor

@taiyang-li taiyang-li Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove if else in loops if possible

vec_to[i] = static_cast<ToFieldType>(result);
(*vec_null_map_to)[i] = success;

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 7, 2025

Run Gluten Clickhouse CI on x86

@KevinyhZou
Copy link
Copy Markdown
Contributor Author

端到端性能测试

测试sql: select count(1) from test_tbl where cast(d as decimal(5,2)) > 1;
数据量:60000000
PR改动前:
2.18s 2.245s, 2.259s
PR改动后:
1.988s, 1.891s, 1.933s

valian耗时:
15.936s,15.011s,16.411s;

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 7, 2025

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 7, 2025

Run Gluten Clickhouse CI on x86

const typename FromDataType::FieldType & value,
UInt32 precision,
UInt32 scale,
Int64 decimal_int_part_max,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not enough to represent min/max value. Consider precision = 38 and scale = 0.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use NativeTypeToDataType::FieldType

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 9, 2025

Run Gluten Clickhouse CI on x86

@KevinyhZou KevinyhZou force-pushed the fix_cast_number_to_decimal branch from f70615e to db3a9e2 Compare January 9, 2025 02:30
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 9, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 9, 2025

Run Gluten Clickhouse CI on x86

2 similar comments
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 9, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 9, 2025

Run Gluten Clickhouse CI on x86

@KevinyhZou KevinyhZou force-pushed the fix_cast_number_to_decimal branch from 0b64fdc to 06ec1c3 Compare January 9, 2025 10:05
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 9, 2025

Run Gluten Clickhouse CI on x86

@taiyang-li taiyang-li merged commit 66e816f into apache:main Jan 10, 2025
@taiyang-li taiyang-li changed the title [GLUTEN-8343][CH]Fix cast number to decimal [GLUTEN-8343][CH]Fix cast number to decimal and improve performance of it Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CH] Diff when cast Number to Decimal

2 participants