fix coercion of null for decimal math in binary_rules #3549

kmitchener · 2022-09-20T17:49:00Z

Which issue does this PR close?

Closes #3479 . This fix can close that issue -- there's another issue for adding support for casting utf8 to decimal.

Rationale for this change

Prior to this fix:

❯ select null + 1::decimal;
Plan("'Null + Decimal128(38, 10)' can't be evaluated because there isn't a common type to coerce the types to")

After this fix:

❯ select null + 1::decimal;
+-----------------+
| NULL + Int64(1) |
+-----------------+
|                 |
+-----------------+
1 row in set. Query took 0.020 seconds.

What changes are included in this PR?

Are there any user-facing changes?

kmitchener · 2022-09-20T17:50:03Z

datafusion/expr/src/binary_rule.rs

-                ),
-            }
+        (Null, dec_type @ Decimal128(_, _)) | (dec_type @ Decimal128(_, _), Null) => {
+            Some(dec_type.clone())


this is the actual fix, the rest is some refactoring

Could the same coercion (Null to that type) be applied for all the other numeric types as well?

In other words, I wonder if we need to special case Decimal128 ? Is it to skip the special decimal coercion logic below?

Yes, it's basically to shortcut that next match arm for figuring out the Decimal's precision and scale, which depends on the datatype being math'd to the decimal ... none of that matters if it's a null, you just need to return the same datatype. Back further up the chain, the null gets cast to this datatype.

The other number types don't have the rules that Decimals do, so you'll see in the same match they just return their own datatype, no complexity.

codecov-commenter · 2022-09-20T19:56:40Z

Codecov Report

Merging #3549 (c775f70) into master (c7f3a70) will increase coverage by 0.09%.
The diff coverage is 75.00%.

@@            Coverage Diff             @@
##           master    #3549      +/-   ##
==========================================
+ Coverage   85.80%   85.89%   +0.09%     
==========================================
  Files         300      301       +1     
  Lines       55506    56045     +539     
==========================================
+ Hits        47625    48142     +517     
- Misses       7881     7903      +22

Impacted Files	Coverage Δ
datafusion/expr/src/binary_rule.rs	`84.40% <75.00%> (-0.20%)`	⬇️
datafusion/expr/src/logical_plan/plan.rs	`77.75% <0.00%> (-0.10%)`	⬇️
datafusion/proto/src/logical_plan.rs	`17.43% <0.00%> (-0.04%)`	⬇️
datafusion/core/tests/sql/predicates.rs	`100.00% <0.00%> (ø)`
datafusion/core/tests/user_defined_plan.rs	`87.79% <0.00%> (ø)`
...usion/core/src/physical_optimizer/parallel_sort.rs	`100.00% <0.00%> (ø)`
datafusion/optimizer/src/reduce_cross_join.rs	`94.54% <0.00%> (ø)`
...tafusion/optimizer/src/common_subexpr_eliminate.rs	`94.77% <0.00%> (+0.01%)`	⬆️
datafusion/core/src/execution/context.rs	`79.33% <0.00%> (+0.02%)`	⬆️
datafusion/optimizer/src/limit_push_down.rs	`99.69% <0.00%> (+0.02%)`	⬆️
... and 9 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

liukun4515 · 2022-09-21T05:45:22Z

Which issue does this PR close?

It fixes part of #3479 Closes #.

Rationale for this change

Prior to this fix:
❯ select null + 1::decimal;
Plan("'Null + Decimal128(38, 10)' can't be evaluated because there isn't a common type to coerce the types to")
After this fix:
❯ select null + 1::decimal;
+-----------------+
| NULL + Int64(1) |
+-----------------+
|                 |
+-----------------+
1 row in set. Query took 0.020 seconds.
What changes are included in this PR?

Are there any user-facing changes?

why the result is null+int64 not the decimal?

cc @kmitchener

kmitchener · 2022-09-21T10:14:21Z

@liukun4515 it's just the naming of the expression. the actual datatype is Decimal.

❯ select arrow_typeof(null + 1::decimal);
+------------------------------+
| arrowtypeof(NULL + Int64(1)) |
+------------------------------+
| Decimal128(38, 10)           |
+------------------------------+
1 row in set. Query took 0.045 seconds.

Naming of resulting columns after casting is kinda weird, but unrelated to this PR.

❯ select 1::decimal;
+--------------+
| Int64(1)     |
+--------------+
| 1.0000000000 |
+--------------+
1 row in set. Query took 0.008 seconds.
❯ select 1::float;
+----------+
| Int64(1) |
+----------+
| 1        |
+----------+
1 row in set. Query took 0.005 seconds.
❯ select 1::text;
+----------+
| Int64(1) |
+----------+
| 1        |
+----------+
1 row in set. Query took 0.005 seconds.

alamb

LGTM -- thanks @kmitchener

alamb · 2022-09-21T18:02:42Z

datafusion/expr/src/binary_rule.rs

-        (_, DataType::Null) => {
-            if can_cast_types(&DataType::Null, lhs_type) {
-                Some(lhs_type.clone())
+        (DataType::Null, other_type) | (other_type, DataType::Null) => {


this is a nice cleanup

alamb · 2022-09-21T18:03:11Z

datafusion/expr/src/binary_rule.rs

-                ),
-            }
+        (Null, dec_type @ Decimal128(_, _)) | (dec_type @ Decimal128(_, _), Null) => {
+            Some(dec_type.clone())


Could the same coercion (Null to that type) be applied for all the other numeric types as well?

In other words, I wonder if we need to special case Decimal128 ? Is it to skip the special decimal coercion logic below?

liukun4515 · 2022-09-22T00:00:53Z

@liukun4515 it's just the naming of the expression. the actual datatype is Decimal.

❯ select arrow_typeof(null + 1::decimal);
+------------------------------+
| arrowtypeof(NULL + Int64(1)) |
+------------------------------+
| Decimal128(38, 10)           |
+------------------------------+
1 row in set. Query took 0.045 seconds.

Naming of resulting columns after casting is kinda weird, but unrelated to this PR.

❯ select 1::decimal;
+--------------+
| Int64(1)     |
+--------------+
| 1.0000000000 |
+--------------+
1 row in set. Query took 0.008 seconds.
❯ select 1::float;
+----------+
| Int64(1) |
+----------+
| 1        |
+----------+
1 row in set. Query took 0.005 seconds.
❯ select 1::text;
+----------+
| Int64(1) |
+----------+
| 1        |
+----------+
1 row in set. Query took 0.005 seconds.

But this make me confused.
I think it's issue related with #3568

cc @alamb Do you have any comments about this.

this fix looks good to me expect this header.

alamb · 2022-09-22T11:00:45Z

cc @alamb Do you have any comments about this.

I agree the header name for constants looks non ideal. I seem to remember a discussion we had somewhere else to maybe make the names of constants different but I couldn't find it

ursabot · 2022-09-22T11:11:30Z

Benchmark runs are scheduled for baseline = b134fa4 and contender = a080c03. a080c03 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

liukun4515 · 2022-09-23T12:30:59Z

I seem to remember a discussion we had somewhere else to maybe make the names of constants different but I couldn't find it

Maybe this can help us. #3568 (comment)

fix

0107d05

github-actions bot added the logical-expr Logical plan and expressions label Sep 20, 2022

kmitchener commented Sep 20, 2022

View reviewed changes

revert refactor -- handedness matters to decimal rules

c775f70

alamb approved these changes Sep 21, 2022

View reviewed changes

alamb merged commit a080c03 into apache:master Sep 22, 2022

kmitchener deleted the 3479-fix-coercion-of-null-for-decimal-math branch September 27, 2022 16:01

fix coercion of null for decimal math in binary_rules #3549

fix coercion of null for decimal math in binary_rules #3549

Uh oh!

Conversation

kmitchener commented Sep 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

kmitchener Sep 20, 2022

Choose a reason for hiding this comment

Uh oh!

alamb Sep 21, 2022

Choose a reason for hiding this comment

Uh oh!

kmitchener Sep 21, 2022

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Sep 20, 2022

Codecov Report

Uh oh!

liukun4515 commented Sep 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

kmitchener commented Sep 21, 2022

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Sep 21, 2022

Choose a reason for hiding this comment

Uh oh!

alamb Sep 21, 2022

Choose a reason for hiding this comment

Uh oh!

liukun4515 commented Sep 22, 2022

Uh oh!

alamb commented Sep 22, 2022

Uh oh!

ursabot commented Sep 22, 2022

Uh oh!

liukun4515 commented Sep 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kmitchener commented Sep 20, 2022 •

edited

Loading

liukun4515 commented Sep 21, 2022 •

edited

Loading