-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feature](functions) impl scalar functions normal_cdf,to_iso8601,from_iso8601_date #40695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
TPC-H: Total hot run time: 38395 ms |
TPC-DS: Total hot run time: 192793 ms |
ClickBench: Total hot run time: 32.12 s |
b924777 to
77ebd97
Compare
|
run buildall |
TPC-H: Total hot run time: 37981 ms |
TPC-DS: Total hot run time: 198063 ms |
ClickBench: Total hot run time: 31.2 s |
| FunctionSignature.ret(VarcharType.SYSTEM_DEFAULT).args(DateTimeV2Type.SYSTEM_DEFAULT), | ||
| FunctionSignature.ret(VarcharType.SYSTEM_DEFAULT).args(DateV2Type.INSTANCE), | ||
| FunctionSignature.ret(VarcharType.SYSTEM_DEFAULT).args(DateTimeType.INSTANCE), | ||
| FunctionSignature.ret(VarcharType.SYSTEM_DEFAULT).args(DateType.INSTANCE), | ||
| FunctionSignature.ret(StringType.INSTANCE).args(DateTimeV2Type.SYSTEM_DEFAULT), | ||
| FunctionSignature.ret(StringType.INSTANCE).args(DateV2Type.INSTANCE), | ||
| FunctionSignature.ret(StringType.INSTANCE).args(DateTimeType.INSTANCE), | ||
| FunctionSignature.ret(StringType.INSTANCE).args(DateType.INSTANCE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should only return string or varchar. just like c/c++ ret type is not a part of function's signature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will fix it
| * ScalarFunction 'to_iso8601'. This class is generated by GenerateFunction. | ||
| */ | ||
| public class ToIso8601 extends ScalarFunction | ||
| implements UnaryExpression, ExplicitlyCastableSignature, PropagateNullableOnDateLikeV2Args { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use PropagateNullableOnDateLikeV2Args? this is a new function should have same nullable property for date/datetime v1 and date/datetime v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will fix it.
| return visitScalarFunction(connectionId, context); | ||
| } | ||
|
|
||
| default R visitNormalCdf(NormalCdf normalCdf, C context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please sort by lexicographical order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will fix it
| size_t result, size_t input_rows_count) { | ||
| const auto* src_column_ptr = block.get_by_position(arguments[0]).column.get(); | ||
|
|
||
| std::regex calendar_regex(R"((\d{4})(-)?(\d{2})(-)?(\d{2}))"); // YYYY-MM-DD or YYYYMMDD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never use std::regex, it's worse than RE2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will change it.
| int day_of_year = std::stoi(match[3]); | ||
|
|
||
| if (is_leap(year)) { | ||
| if (day_of_year < 0 || day_of_year > 366) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better for judgement like this, set it to [[unlikely]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will add it
| struct FromIso8601DateV2 { | ||
| static constexpr auto name = "from_iso8601_date"; | ||
|
|
||
| static bool is_variadic() { return true; } //todo : check is or not is_variadic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this function support variadic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't support variadic. I forgot to delete this line.
| static constexpr auto name = "normal_cdf"; | ||
| String get_name() const override { return name; } | ||
| static FunctionPtr create() { return std::make_shared<FunctionNormalCdf>(); } | ||
| DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { | ||
| return make_nullable(std::make_shared<DataTypeFloat64>()); | ||
| } | ||
| DataTypes get_variadic_argument_types_impl() const override { | ||
| return {std::make_shared<DataTypeFloat64>(), std::make_shared<DataTypeFloat64>(), | ||
| std::make_shared<DataTypeFloat64>()}; | ||
| } | ||
| size_t get_number_of_arguments() const override { return 3; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better add blank line between functions/variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will add it
| std::regex calendar_regex(R"((\d{4})(-)?(\d{2})(-)?(\d{2}))"); // YYYY-MM-DD or YYYYMMDD | ||
| std::regex year_month_regex(R"((\d{4})-(\d{2}))"); // YYYY-MM | ||
| std::regex year_only_regex(R"((\d{4}))"); // YYYY | ||
| std::regex week_regex( | ||
| R"((\d{4})(-)?W(\d{2})(-)?(\d)?)"); // YYYY-Www or YYYY-Www-D or YYYYWww or YYYYWwwD | ||
| std::regex day_of_year_regex(R"((\d{4})(-)?(\d{3}))"); // YYYY-DDD or YYYYDDD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider add const for them and move outside of execute() to improve the efficiency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will use static const to fix it.
b212b6c to
3cb8dde
Compare
|
run buildall |
TPC-H: Total hot run time: 42992 ms |
TPC-DS: Total hot run time: 195634 ms |
|
TeamCity be ut coverage result: |
ClickBench: Total hot run time: 31.07 s |
0dce774 to
e56b87e
Compare
|
run buildall |
02aa222 to
d13fe4a
Compare
|
run buildall |
TPC-H: Total hot run time: 43414 ms |
|
TeamCity be ut coverage result: |
TPC-DS: Total hot run time: 195758 ms |
ClickBench: Total hot run time: 31.06 s |
…_iso8601_date (apache#40695) Added three functions: `normal_cdf`, `to_iso8601`, `from_iso8601_date` 1. `normal_cdf(mean, sd, v) → double` : Compute the Normal cdf with given mean and standard deviation (sd): P(N < value; mean, sd). The mean and value must be real values and the standard deviation must be a real and positive value (all of type DOUBLE). 2. `to_iso8601(DATE/DATETIME) → string` For `DATE` type, it will be converted to `YYYY-MM-DD`. For `DATETIME` type, it will be converted to `YYYY-MM-DDThh:mm:ss.xxxxxx`. 3. `from_iso8601_date(string) → DATE` Convert iso8601 string to `DATE` type. The supported iso8601 string formats are as follows: a. Year : YYYY b. Calendar dates : YYYY-MM-DD , YYYYMMDD , YYYY-MM c. Week dates : YYYY-Www , YYYYWww , YYYY-Www-D , YYYYWwwD d. Ordinal dates : YYYY-DDD , YYYYDDD You can refer to this [document](https://en.wikipedia.org/wiki/ISO_8601) to understand what these formats mean Please note that: a. Each date and time value has a fixed number of digits that must be padded with leading zeros. b. The range of YYYY is 0001-9999. outside these rules, the correctness of the results is not guaranteed. For illegal values, the result is NULL.
before pr : #40695 ## Proposed changes Fix the bug that be will have this error when compiling on mac. ``` be/src/vec/functions/math.cpp:416:39: error: no member named 'numbers' in namespace 'std' constexpr double sqrt2 = std::numbers::sqrt2; ~~~~~^ 1 error generated. ```
…_iso8601_date (apache#40695) ## Proposed changes Added three functions: `normal_cdf`, `to_iso8601`, `from_iso8601_date` 1. `normal_cdf(mean, sd, v) → double` : Compute the Normal cdf with given mean and standard deviation (sd): P(N < value; mean, sd). The mean and value must be real values and the standard deviation must be a real and positive value (all of type DOUBLE). 2. `to_iso8601(DATE/DATETIME) → string` For `DATE` type, it will be converted to `YYYY-MM-DD`. For `DATETIME` type, it will be converted to `YYYY-MM-DDThh:mm:ss.xxxxxx`. 3. `from_iso8601_date(string) → DATE` Convert iso8601 string to `DATE` type. The supported iso8601 string formats are as follows: a. Year : YYYY b. Calendar dates : YYYY-MM-DD , YYYYMMDD , YYYY-MM c. Week dates : YYYY-Www , YYYYWww , YYYY-Www-D , YYYYWwwD d. Ordinal dates : YYYY-DDD , YYYYDDD You can refer to this [document](https://en.wikipedia.org/wiki/ISO_8601) to understand what these formats mean Please note that: a. Each date and time value has a fixed number of digits that must be padded with leading zeros. b. The range of YYYY is 0001-9999. outside these rules, the correctness of the results is not guaranteed. For illegal values, the result is NULL.
…41075) before pr : apache#40695 ## Proposed changes Fix the bug that be will have this error when compiling on mac. ``` be/src/vec/functions/math.cpp:416:39: error: no member named 'numbers' in namespace 'std' constexpr double sqrt2 = std::numbers::sqrt2; ~~~~~^ 1 error generated. ```
…_iso8601_date (#40695 and #41075) (#41600) bp #40695 and #41075 #41075 : fix #40695 mac compile ## Proposed changes Added three functions: `normal_cdf`, `to_iso8601`, `from_iso8601_date` 1. `normal_cdf(mean, sd, v) → double` : Compute the Normal cdf with given mean and standard deviation (sd): P(N < value; mean, sd). The mean and value must be real values and the standard deviation must be a real and positive value (all of type DOUBLE). 2. `to_iso8601(DATE/DATETIME) → string` For `DATE` type, it will be converted to `YYYY-MM-DD`. For `DATETIME` type, it will be converted to `YYYY-MM-DDThh:mm:ss.xxxxxx`. 3. `from_iso8601_date(string) → DATE` Convert iso8601 string to `DATE` type. The supported iso8601 string formats are as follows: a. Year : YYYY b. Calendar dates : YYYY-MM-DD , YYYYMMDD , YYYY-MM c. Week dates : YYYY-Www , YYYYWww , YYYY-Www-D , YYYYWwwD d. Ordinal dates : YYYY-DDD , YYYYDDD You can refer to this [document](https://en.wikipedia.org/wiki/ISO_8601) to understand what these formats mean Please note that: a. Each date and time value has a fixed number of digits that must be padded with leading zeros. b. The range of YYYY is 0001-9999. outside these rules, the correctness of the results is not guaranteed. For illegal values, the result is NULL.
Related PR: #40695 Problem Summary: pr #40695 introduced the function `from_iso8601_date`, which parses the string to get year、mouth、 day, and sets the date value through the `set_time_unit` function. Since `set_time_unit` lacks some judgment on mouth, it may get an illegal date in the end, which may cause core in debug mode. sql : `select from_iso8601_date('2023-00-01');` ``` F20250709 09:50:14.366984 3587796 vdatetime_value.h:1222] Check failed: date_v2_value_.month_ != 0 *** Check failure stack trace: *** @ 0x559bd7050d96 google::LogMessage::SendToLog() @ 0x559bd704d7e0 google::LogMessage::Flush() @ 0x559bd70515d9 google::LogMessageFatal::~LogMessageFatal() @ 0x559bc725a570 doris::DateV2Value<>::set_time_unit<>() @ 0x559bc7257380 doris::vectorized::FromIso8601DateV2::execute() @ 0x559bc7255a88 doris::vectorized::FunctionOtherTypesToDateType<>::execute_impl() @ 0x559bc09e0781 doris::vectorized::DefaultExecutable::execute_impl() @ 0x559bc423aa20 doris::vectorized::PreparedFunctionImpl::_execute_skipped_constant_deal() @ 0x559bc4234938 doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns() @ 0x559bc4233f42 doris::vectorized::PreparedFunctionImpl::default_implementation_for_nulls() @ 0x559bc423a773 doris::vectorized::PreparedFunctionImpl::_execute_skipped_constant_deal() @ 0x559bc4234938 doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns() @ 0x559bc4234a57 doris::vectorized::PreparedFunctionImpl::execute() ```
Related PR: apache#40695 Problem Summary: pr apache#40695 introduced the function `from_iso8601_date`, which parses the string to get year、mouth、 day, and sets the date value through the `set_time_unit` function. Since `set_time_unit` lacks some judgment on mouth, it may get an illegal date in the end, which may cause core in debug mode. sql : `select from_iso8601_date('2023-00-01');` ``` F20250709 09:50:14.366984 3587796 vdatetime_value.h:1222] Check failed: date_v2_value_.month_ != 0 *** Check failure stack trace: *** @ 0x559bd7050d96 google::LogMessage::SendToLog() @ 0x559bd704d7e0 google::LogMessage::Flush() @ 0x559bd70515d9 google::LogMessageFatal::~LogMessageFatal() @ 0x559bc725a570 doris::DateV2Value<>::set_time_unit<>() @ 0x559bc7257380 doris::vectorized::FromIso8601DateV2::execute() @ 0x559bc7255a88 doris::vectorized::FunctionOtherTypesToDateType<>::execute_impl() @ 0x559bc09e0781 doris::vectorized::DefaultExecutable::execute_impl() @ 0x559bc423aa20 doris::vectorized::PreparedFunctionImpl::_execute_skipped_constant_deal() @ 0x559bc4234938 doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns() @ 0x559bc4233f42 doris::vectorized::PreparedFunctionImpl::default_implementation_for_nulls() @ 0x559bc423a773 doris::vectorized::PreparedFunctionImpl::_execute_skipped_constant_deal() @ 0x559bc4234938 doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns() @ 0x559bc4234a57 doris::vectorized::PreparedFunctionImpl::execute() ```
Related PR: apache#40695 Problem Summary: pr apache#40695 introduced the function `from_iso8601_date`, which parses the string to get year、mouth、 day, and sets the date value through the `set_time_unit` function. Since `set_time_unit` lacks some judgment on mouth, it may get an illegal date in the end, which may cause core in debug mode. sql : `select from_iso8601_date('2023-00-01');` ``` F20250709 09:50:14.366984 3587796 vdatetime_value.h:1222] Check failed: date_v2_value_.month_ != 0 *** Check failure stack trace: *** @ 0x559bd7050d96 google::LogMessage::SendToLog() @ 0x559bd704d7e0 google::LogMessage::Flush() @ 0x559bd70515d9 google::LogMessageFatal::~LogMessageFatal() @ 0x559bc725a570 doris::DateV2Value<>::set_time_unit<>() @ 0x559bc7257380 doris::vectorized::FromIso8601DateV2::execute() @ 0x559bc7255a88 doris::vectorized::FunctionOtherTypesToDateType<>::execute_impl() @ 0x559bc09e0781 doris::vectorized::DefaultExecutable::execute_impl() @ 0x559bc423aa20 doris::vectorized::PreparedFunctionImpl::_execute_skipped_constant_deal() @ 0x559bc4234938 doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns() @ 0x559bc4233f42 doris::vectorized::PreparedFunctionImpl::default_implementation_for_nulls() @ 0x559bc423a773 doris::vectorized::PreparedFunctionImpl::_execute_skipped_constant_deal() @ 0x559bc4234938 doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns() @ 0x559bc4234a57 doris::vectorized::PreparedFunctionImpl::execute() ```
Related PR: apache#40695 Problem Summary: pr apache#40695 introduced the function `from_iso8601_date`, which parses the string to get year、mouth、 day, and sets the date value through the `set_time_unit` function. Since `set_time_unit` lacks some judgment on mouth, it may get an illegal date in the end, which may cause core in debug mode. sql : `select from_iso8601_date('2023-00-01');` ``` F20250709 09:50:14.366984 3587796 vdatetime_value.h:1222] Check failed: date_v2_value_.month_ != 0 *** Check failure stack trace: *** @ 0x559bd7050d96 google::LogMessage::SendToLog() @ 0x559bd704d7e0 google::LogMessage::Flush() @ 0x559bd70515d9 google::LogMessageFatal::~LogMessageFatal() @ 0x559bc725a570 doris::DateV2Value<>::set_time_unit<>() @ 0x559bc7257380 doris::vectorized::FromIso8601DateV2::execute() @ 0x559bc7255a88 doris::vectorized::FunctionOtherTypesToDateType<>::execute_impl() @ 0x559bc09e0781 doris::vectorized::DefaultExecutable::execute_impl() @ 0x559bc423aa20 doris::vectorized::PreparedFunctionImpl::_execute_skipped_constant_deal() @ 0x559bc4234938 doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns() @ 0x559bc4233f42 doris::vectorized::PreparedFunctionImpl::default_implementation_for_nulls() @ 0x559bc423a773 doris::vectorized::PreparedFunctionImpl::_execute_skipped_constant_deal() @ 0x559bc4234938 doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns() @ 0x559bc4234a57 doris::vectorized::PreparedFunctionImpl::execute() ```
…53050) (#53438) bp #53050 Related PR: #40695 Problem Summary: pr #40695 introduced the function `from_iso8601_date`, which parses the string to get year、mouth、 day, and sets the date value through the `set_time_unit` function. Since `set_time_unit` lacks some judgment on mouth, it may get an illegal date in the end, which may cause core in debug mode. sql : `select from_iso8601_date('2023-00-01');` ``` F20250709 09:50:14.366984 3587796 vdatetime_value.h:1222] Check failed: date_v2_value_.month_ != 0 *** Check failure stack trace: *** @ 0x559bd7050d96 google::LogMessage::SendToLog() @ 0x559bd704d7e0 google::LogMessage::Flush() @ 0x559bd70515d9 google::LogMessageFatal::~LogMessageFatal() @ 0x559bc725a570 doris::DateV2Value<>::set_time_unit<>() @ 0x559bc7257380 doris::vectorized::FromIso8601DateV2::execute() @ 0x559bc7255a88 doris::vectorized::FunctionOtherTypesToDateType<>::execute_impl() @ 0x559bc09e0781 doris::vectorized::DefaultExecutable::execute_impl() @ 0x559bc423aa20 doris::vectorized::PreparedFunctionImpl::_execute_skipped_constant_deal() @ 0x559bc4234938 doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns() @ 0x559bc4233f42 doris::vectorized::PreparedFunctionImpl::default_implementation_for_nulls() @ 0x559bc423a773 doris::vectorized::PreparedFunctionImpl::_execute_skipped_constant_deal() @ 0x559bc4234938 doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns() @ 0x559bc4234a57 doris::vectorized::PreparedFunctionImpl::execute() ``` ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
…_iso8601_date (apache#40695) Added three functions: `normal_cdf`, `to_iso8601`, `from_iso8601_date` 1. `normal_cdf(mean, sd, v) → double` : Compute the Normal cdf with given mean and standard deviation (sd): P(N < value; mean, sd). The mean and value must be real values and the standard deviation must be a real and positive value (all of type DOUBLE). 2. `to_iso8601(DATE/DATETIME) → string` For `DATE` type, it will be converted to `YYYY-MM-DD`. For `DATETIME` type, it will be converted to `YYYY-MM-DDThh:mm:ss.xxxxxx`. 3. `from_iso8601_date(string) → DATE` Convert iso8601 string to `DATE` type. The supported iso8601 string formats are as follows: a. Year : YYYY b. Calendar dates : YYYY-MM-DD , YYYYMMDD , YYYY-MM c. Week dates : YYYY-Www , YYYYWww , YYYY-Www-D , YYYYWwwD d. Ordinal dates : YYYY-DDD , YYYYDDD You can refer to this [document](https://en.wikipedia.org/wiki/ISO_8601) to understand what these formats mean Please note that: a. Each date and time value has a fixed number of digits that must be padded with leading zeros. b. The range of YYYY is 0001-9999. outside these rules, the correctness of the results is not guaranteed. For illegal values, the result is NULL.
Proposed changes
Added three functions:
normal_cdf,to_iso8601,from_iso8601_datenormal_cdf(mean, sd, v) → double:Compute the Normal cdf with given mean and standard deviation (sd): P(N < value; mean, sd). The mean and value must be real values and the standard deviation must be a real and positive value (all of type DOUBLE).
to_iso8601(DATE/DATETIME) → stringFor
DATEtype, it will be converted toYYYY-MM-DD.For
DATETIMEtype, it will be converted toYYYY-MM-DDThh:mm:ss.xxxxxx.from_iso8601_date(string) → DATEConvert iso8601 string to
DATEtype.The supported iso8601 string formats are as follows:
a. Year : YYYY
b. Calendar dates : YYYY-MM-DD , YYYYMMDD , YYYY-MM
c. Week dates : YYYY-Www , YYYYWww , YYYY-Www-D , YYYYWwwD
d. Ordinal dates : YYYY-DDD , YYYYDDD
You can refer to this document to understand what these formats mean
Please note that:
a. Each date and time value has a fixed number of digits that must be padded with leading zeros.
b. The range of YYYY is 0001-9999.
outside these rules, the correctness of the results is not guaranteed.
For illegal values, the result is NULL.