What is the problem the feature request solves?
The benchmarks in CometAggregateBenchmark show that COUNT is slower than Spark, but SUM is faster than Spark. There should not be so much difference between these two aggregates. I could not reproduce the performance difference in standalone DataFusion.
SUM
Grouped HashAgg Exec: single group key (cardinality 1048576), single aggregate SUM: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL Parquet - Spark (SUM) 1672 1698 37 6.3 159.4 1.0X
SQL Parquet - Comet (Scan) (SUM) 1913 1993 112 5.5 182.5 0.9X
SQL Parquet - Comet (Scan, Exec) (SUM) 669 798 113 15.7 63.8 2.5X
COUNT
Grouped HashAgg Exec: single group key (cardinality 1048576), single aggregate COUNT: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL Parquet - Spark (COUNT) 1796 1827 43 5.8 171.3 1.0X
SQL Parquet - Comet (Scan) (COUNT) 1810 1853 61 5.8 172.6 1.0X
SQL Parquet - Comet (Scan, Exec) (COUNT) 2827 2867 56 3.7 269.6 0.6X
Describe the potential solution
No response
Additional context
No response
What is the problem the feature request solves?
The benchmarks in
CometAggregateBenchmarkshow thatCOUNTis slower than Spark, butSUMis faster than Spark. There should not be so much difference between these two aggregates. I could not reproduce the performance difference in standalone DataFusion.SUM
COUNT
Describe the potential solution
No response
Additional context
No response