Skip to content

Improve performance of COUNT aggregates #744

@andygrove

Description

@andygrove

What is the problem the feature request solves?

The benchmarks in CometAggregateBenchmark show that COUNT is slower than Spark, but SUM is faster than Spark. There should not be so much difference between these two aggregates. I could not reproduce the performance difference in standalone DataFusion.

SUM

Grouped HashAgg Exec: single group key (cardinality 1048576), single aggregate SUM:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL Parquet - Spark (SUM)                                                                    1672           1698          37          6.3         159.4       1.0X
SQL Parquet - Comet (Scan) (SUM)                                                             1913           1993         112          5.5         182.5       0.9X
SQL Parquet - Comet (Scan, Exec) (SUM)                                                        669            798         113         15.7          63.8       2.5X

COUNT

Grouped HashAgg Exec: single group key (cardinality 1048576), single aggregate COUNT:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL Parquet - Spark (COUNT)                                                                    1796           1827          43          5.8         171.3       1.0X
SQL Parquet - Comet (Scan) (COUNT)                                                             1810           1853          61          5.8         172.6       1.0X
SQL Parquet - Comet (Scan, Exec) (COUNT)                                                       2827           2867          56          3.7         269.6       0.6X

Describe the potential solution

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions