[VL] Support Spark legacy statistical aggregation function behavior#9181
[VL] Support Spark legacy statistical aggregation function behavior#9181rui-mo merged 1 commit intoapache:mainfrom
Conversation
|
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format? See also: |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
@rui-mo Could you help to review this PR? Thanks. |
| (SQLConf.CASE_SENSITIVE.key, SQLConf.CASE_SENSITIVE.defaultValueString), | ||
| (SQLConf.IGNORE_MISSING_FILES.key, SQLConf.IGNORE_MISSING_FILES.defaultValueString), | ||
| (SQLConf.LEGACY_TIME_PARSER_POLICY.key, SQLConf.LEGACY_TIME_PARSER_POLICY.defaultValueString), | ||
| ( |
There was a problem hiding this comment.
The default value is same with native backend value std::to_string(veloxCfg_->get<bool>(kSparkLegacyStatisticalAggregate, false));. Add the key in L470 is enough.
There was a problem hiding this comment.
I think using the default value that aligns with Spark would be great. Maybe delete the default value in std::to_string(veloxCfg_->get<bool>(kSparkLegacyStatisticalAggregate, false));
There was a problem hiding this comment.
If user doesn't set the config, I think it's better we don't set it too.
There was a problem hiding this comment.
If user doesn't set this config, the SQLConf.LEGACY_STATISTICAL_AGGREGATE.defaultValueString will be used rather than false, although it is also false now.
| @@ -509,6 +510,9 @@ object GlutenConfig { | |||
| (SQLConf.CASE_SENSITIVE.key, SQLConf.CASE_SENSITIVE.defaultValueString), | |||
| (SQLConf.IGNORE_MISSING_FILES.key, SQLConf.IGNORE_MISSING_FILES.defaultValueString), | |||
| (SQLConf.LEGACY_TIME_PARSER_POLICY.key, SQLConf.LEGACY_TIME_PARSER_POLICY.defaultValueString), | |||
There was a problem hiding this comment.
So as LEGACY_TIME_PARSER_POLICY.
rui-mo
left a comment
There was a problem hiding this comment.
Better to add test in Gluten to ensure this config could take effect. Thanks.
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
.../spark33/src/test/scala/org/apache/spark/sql/execution/GlutenSQLAggregateFunctionSuite.scala
Outdated
Show resolved
Hide resolved
|
Run Gluten Clickhouse CI on x86 |
|
@baibaichen could you help to show the log of the failed ClickHouse CI? And the failed CI |
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
The workflow has finally passed. Please help merge this pull request, @rui-mo. |
|
Thanks! |
What changes were proposed in this pull request?
To align with Spark, facebookincubator/velox#12566 introduced
spark.legacy_statistical_aggregateconfiguration, which controls whether NULL or NaN is returned when dividing by zero. This PR enables this config ifspark.sql.legacy.statisticalAggregateis set to true.How was this patch tested?
integration tests