AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[], functions=[count(distinct _1#9)], output=[count(DISTINCT _1)#16L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=40]
+- HashAggregate(keys=[], functions=[partial_count(distinct _1#9)], output=[count#20L])
+- HashAggregate(keys=[_1#9], functions=[], output=[_1#9])
+- Exchange hashpartitioning(_1#9, 5), ENSURE_REQUIREMENTS, [plan_id=37]
+- HashAggregate(keys=[_1#9], functions=[], output=[_1#9])
+- Scan parquet [_1#9] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_1:int>
What is the problem the feature request solves?
We should also support aggregations such as
count(distinct(col)) from tbl. In Spark,produces a plan like the following:
Describe the potential solution
Add the support for
COUNT(DISTINCT)(and similar), so the Spark physical plan can be properly converted to a native plan and executed.Additional context
No response