Describe the bug
A few Spark tests with distinct count failed while working on #250: https://github.com/apache/arrow-datafusion-comet/actions/runs/8681807652/job/23805034669?pr=250
[info] TwoLevelAggregateHashMapSuite:
[info] - multiple column distinct count *** FAILED *** (530 milliseconds)
[info] Results do not match for query:
...
[info] == Results ==
[info] !== Correct Answer - 1 == == Spark Answer - 1 ==
[info] !struct<> struct<count(key1, key2, key3):bigint>
[info] ![3] [4] (QueryTest.scala:243)
Spark distinct count aggregation doesn't count null inputs. I.e.,
override lazy val updateExpressions = {
..
Seq(
/* count = */ If(nullableChildren.map(IsNull).reduce(Or), count, count + 1L)
)
}
But seems DataFusion count aggregation function behaves differently.
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response
Describe the bug
A few Spark tests with distinct count failed while working on #250: https://github.com/apache/arrow-datafusion-comet/actions/runs/8681807652/job/23805034669?pr=250
Spark distinct count aggregation doesn't count null inputs. I.e.,
But seems DataFusion
countaggregation function behaves differently.Steps to reproduce
No response
Expected behavior
No response
Additional context
No response