perf: Use DataFusion's count_udaf instead of SUM(IF(expr IS NOT NULL, 1, 0))#2407
perf: Use DataFusion's count_udaf instead of SUM(IF(expr IS NOT NULL, 1, 0))#2407andygrove merged 7 commits intoapache:mainfrom
count_udaf instead of SUM(IF(expr IS NOT NULL, 1, 0))#2407Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2407 +/- ##
============================================
+ Coverage 56.12% 57.52% +1.39%
- Complexity 976 1296 +320
============================================
Files 119 147 +28
Lines 11743 13469 +1726
Branches 2251 2352 +101
============================================
+ Hits 6591 7748 +1157
- Misses 4012 4457 +445
- Partials 1140 1264 +124 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I ran TPC-H locally and saw no regression in performance |
| benchmark.addCase(s"SQL Parquet - Comet (Scan) ($aggregateFunction)") { _ => | ||
| withSQLConf(CometConf.COMET_ENABLED.key -> "true") { | ||
| spark.sql(query).noop() | ||
| } | ||
| } |
There was a problem hiding this comment.
It no longer makes sense to benchmark Comet with Scan vs Scan & Exec configurations, especially for aggregates. Comet scans won't accelerate aggregates.
mbutrovich
left a comment
There was a problem hiding this comment.
You know how I love a good negative line count. Thanks for following up on my comment in #2397. Glad to see we can rely on DF for this.
comphead
left a comment
There was a problem hiding this comment.
Thanks @andygrove it is nice removing unnecessary evaluations
|
Merged. Thanks for the reviews @mbutrovich and @comphead |
Which issue does this PR close?
Rationale for this change
Use DataFusion's
count_udafinstead of translatingcount(expr)tosum(if(expr is not null, 1, 0)).When we tried to use
count_udaf(about a year ago - see #744) we ran into performance issues, which is why we added thesumworkaround, but this is no longer needed.What changes are included in this PR?
How are these changes tested?
I ran CometAggregateSuite: