perf: Optimize decimal precision check in decimal aggregates (sum and avg)#952
Conversation
|
Benchmark results: |
|
10 runs of TPC-H q1 @ 100 GB: main branchthis PR |
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #952 +/- ##
============================================
+ Coverage 33.80% 33.81% +0.01%
+ Complexity 852 851 -1
============================================
Files 112 112
Lines 43276 43286 +10
Branches 9572 9572
============================================
+ Hits 14629 14639 +10
Misses 25634 25634
Partials 3013 3013 ☔ View full report in Codecov by Sentry. |
|
I was a bit surprised to see a performance win from changing an if-else to a compound boolean expression, since this seems like something that an optimizing compiler should handle well. I think I confirmed that part by putting the example code above in the Rust playground, and in Release mode the My next guess was that hoisting the implementation from arrow-rs into Comet enabled better inlining opportunities. My ARM assembly is not as proficient as x86, but I believe the relevant bits are below. This is current main branch disassembly for There's a branch and link that isn't present in this PR's disassembly. The compiler is able to better inline the hoisted code. I am not as familiar with Rust's build environment, so I'm not sure if this is expected when calling into code from other crates. I see Comet currently does |
| } | ||
|
|
||
| fn is_nullable(&self) -> bool { | ||
| // SumDecimal is always nullable because overflows can cause null values |
There was a problem hiding this comment.
Wondering if this is true for ANSI.
It looks the previous code is also hardcoding true, but this may be a good time to file an issue if there is not yet.
Thanks @mbutrovich. This is very insightful. I'd like to go ahead and merge this PR but would also like to have a better understanding of why this is actually faster. |
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> N/A ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Apply OSS 0.3.0 changes. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ``` 84cccf7 docs: Add notes for IntelliJ code size limits for code inspections. (apache#985) dcc4a8a fix: The spilled_bytes metric of CometSortExec should be size instead of time (apache#984) f64553b chore: fix compatibility guide (apache#978) 0ee7df8 chore: Enable additional CreateArray tests (apache#928) a690e9d perf: Remove one redundant CopyExec for SMJ (apache#962) a8156b5 chore: update rem expression guide (apache#976) 317a534 fix: Use the number of rows from underlying arrays instead of logical row count from RecordBatch (apache#972) 22561c4 doc: add documentation interlinks (apache#975) b4de8e0 chore: Update benchmarks results based on 0.3.0-rc1 (apache#969) 94093f3 chore: fix publish-to-maven script (apache#966) f31f6cc Generate changelog for 0.3.0 release (apache#964) 5663fc2 fix: div and rem by negative zero (apache#960) 50517f6 perf: Optimize decimal precision check in decimal aggregates (sum and avg) (apache#952) 5b3f7bc fix: CometScanExec on Spark 3.5.2 (apache#915) 8410c71 chore: clarify tarball installation (apache#959) 459b2b0 fix: window function range offset should be long instead of int (apache#733) ``` ## How are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? -->


Which issue does this PR close?
Part of #951
Builds on #948
Rationale for this change
I noticed two areas of overhead in the current approach to verifying decimal precision in decimal aggregates
sumandavg:I tested the following variations of the decimal precision check in Rust playground.
validate_decimal_precision1avoids amemcpythat appears invalidate_decimal_precision2:What changes are included in this PR?
Errand avoids amemcpyHow are these changes tested?