Skip to content

Conversation

@kmitchener
Copy link
Contributor

@kmitchener kmitchener commented Sep 20, 2022

Which issue does this PR close?

Closes #3479 . This fix can close that issue -- there's another issue for adding support for casting utf8 to decimal.

Rationale for this change

Prior to this fix:

select null + 1::decimal;
Plan("'Null + Decimal128(38, 10)' can't be evaluated because there isn't a common type to coerce the types to")

After this fix:

select null + 1::decimal;
+-----------------+
| NULL + Int64(1) |
+-----------------+
|                 |
+-----------------+
1 row in set. Query took 0.020 seconds.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Sep 20, 2022
),
}
(Null, dec_type @ Decimal128(_, _)) | (dec_type @ Decimal128(_, _), Null) => {
Some(dec_type.clone())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the actual fix, the rest is some refactoring

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the same coercion (Null to that type) be applied for all the other numeric types as well?

In other words, I wonder if we need to special case Decimal128 ? Is it to skip the special decimal coercion logic below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's basically to shortcut that next match arm for figuring out the Decimal's precision and scale, which depends on the datatype being math'd to the decimal ... none of that matters if it's a null, you just need to return the same datatype. Back further up the chain, the null gets cast to this datatype.

The other number types don't have the rules that Decimals do, so you'll see in the same match they just return their own datatype, no complexity.

@codecov-commenter
Copy link

Codecov Report

Merging #3549 (c775f70) into master (c7f3a70) will increase coverage by 0.09%.
The diff coverage is 75.00%.

@@            Coverage Diff             @@
##           master    #3549      +/-   ##
==========================================
+ Coverage   85.80%   85.89%   +0.09%     
==========================================
  Files         300      301       +1     
  Lines       55506    56045     +539     
==========================================
+ Hits        47625    48142     +517     
- Misses       7881     7903      +22     
Impacted Files Coverage Δ
datafusion/expr/src/binary_rule.rs 84.40% <75.00%> (-0.20%) ⬇️
datafusion/expr/src/logical_plan/plan.rs 77.75% <0.00%> (-0.10%) ⬇️
datafusion/proto/src/logical_plan.rs 17.43% <0.00%> (-0.04%) ⬇️
datafusion/core/tests/sql/predicates.rs 100.00% <0.00%> (ø)
datafusion/core/tests/user_defined_plan.rs 87.79% <0.00%> (ø)
...usion/core/src/physical_optimizer/parallel_sort.rs 100.00% <0.00%> (ø)
datafusion/optimizer/src/reduce_cross_join.rs 94.54% <0.00%> (ø)
...tafusion/optimizer/src/common_subexpr_eliminate.rs 94.77% <0.00%> (+0.01%) ⬆️
datafusion/core/src/execution/context.rs 79.33% <0.00%> (+0.02%) ⬆️
datafusion/optimizer/src/limit_push_down.rs 99.69% <0.00%> (+0.02%) ⬆️
... and 9 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@liukun4515
Copy link
Contributor

liukun4515 commented Sep 21, 2022

Which issue does this PR close?

It fixes part of #3479 Closes #.

Rationale for this change

Prior to this fix:

select null + 1::decimal;
Plan("'Null + Decimal128(38, 10)' can't be evaluated because there isn't a common type to coerce the types to")

After this fix:

select null + 1::decimal;
+-----------------+
| NULL + Int64(1) |
+-----------------+
|                 |
+-----------------+
1 row in set. Query took 0.020 seconds.

What changes are included in this PR?

Are there any user-facing changes?

why the result is null+int64 not the decimal?

cc @kmitchener

@kmitchener
Copy link
Contributor Author

@liukun4515 it's just the naming of the expression. the actual datatype is Decimal.

select arrow_typeof(null + 1::decimal);
+------------------------------+
| arrowtypeof(NULL + Int64(1)) |
+------------------------------+
| Decimal128(38, 10)           |
+------------------------------+
1 row in set. Query took 0.045 seconds.

Naming of resulting columns after casting is kinda weird, but unrelated to this PR.

select 1::decimal;
+--------------+
| Int64(1)     |
+--------------+
| 1.0000000000 |
+--------------+
1 row in set. Query took 0.008 seconds.
❯ select 1::float;
+----------+
| Int64(1) |
+----------+
| 1        |
+----------+
1 row in set. Query took 0.005 seconds.
❯ select 1::text;
+----------+
| Int64(1) |
+----------+
| 1        |
+----------+
1 row in set. Query took 0.005 seconds.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- thanks @kmitchener

(_, DataType::Null) => {
if can_cast_types(&DataType::Null, lhs_type) {
Some(lhs_type.clone())
(DataType::Null, other_type) | (other_type, DataType::Null) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nice cleanup

),
}
(Null, dec_type @ Decimal128(_, _)) | (dec_type @ Decimal128(_, _), Null) => {
Some(dec_type.clone())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the same coercion (Null to that type) be applied for all the other numeric types as well?

In other words, I wonder if we need to special case Decimal128 ? Is it to skip the special decimal coercion logic below?

@liukun4515
Copy link
Contributor

@liukun4515 it's just the naming of the expression. the actual datatype is Decimal.

select arrow_typeof(null + 1::decimal);
+------------------------------+
| arrowtypeof(NULL + Int64(1)) |
+------------------------------+
| Decimal128(38, 10)           |
+------------------------------+
1 row in set. Query took 0.045 seconds.

Naming of resulting columns after casting is kinda weird, but unrelated to this PR.

select 1::decimal;
+--------------+
| Int64(1)     |
+--------------+
| 1.0000000000 |
+--------------+
1 row in set. Query took 0.008 seconds.
❯ select 1::float;
+----------+
| Int64(1) |
+----------+
| 1        |
+----------+
1 row in set. Query took 0.005 seconds.
❯ select 1::text;
+----------+
| Int64(1) |
+----------+
| 1        |
+----------+
1 row in set. Query took 0.005 seconds.

But this make me confused.
I think it's issue related with #3568

cc @alamb Do you have any comments about this.

this fix looks good to me expect this header.

@alamb
Copy link
Contributor

alamb commented Sep 22, 2022

cc @alamb Do you have any comments about this.

I agree the header name for constants looks non ideal. I seem to remember a discussion we had somewhere else to maybe make the names of constants different but I couldn't find it

@alamb alamb merged commit a080c03 into apache:master Sep 22, 2022
@ursabot
Copy link

ursabot commented Sep 22, 2022

Benchmark runs are scheduled for baseline = b134fa4 and contender = a080c03. a080c03 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@liukun4515
Copy link
Contributor

I seem to remember a discussion we had somewhere else to maybe make the names of constants different but I couldn't find it

Maybe this can help us. #3568 (comment)

@kmitchener kmitchener deleted the 3479-fix-coercion-of-null-for-decimal-math branch September 27, 2022 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

coercion between decimal and other types lacking, compared to other numeric types

5 participants