-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Decimal multiply kernel should not cause precision loss #5980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decimal multiply kernel should not cause precision loss #5980
Conversation
|
Different to #5675, this doesn't add new expression node |
54397f9 to
343ca79
Compare
|
There is a compilation error. Going to fix it at #6029. |
cb7e326 to
0a88516
Compare
| Some(99193548387), // 0.99193548387 | ||
| None, | ||
| None, | ||
| Some(100813008130), // 1.0081300813 | ||
| Some(100000000000), // 1.0 | ||
| ], | ||
| 21, | ||
| 11, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, this division losses precision. Now we get it back.
| // subtract: decimal array subtract int32 array | ||
| let schema = Arc::new(Schema::new(vec![ | ||
| Field::new("b", DataType::Int32, true), | ||
| Field::new("a", DataType::Decimal128(10, 2), true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the field order is incorrect. But as we did coerce type on both side of the op anyway, so it still worked before. Now we don't coerce the decimal field (which is wrongly bound to Int32Array) before into binary expression, so wrong field causes an error.
| sum(l_extendedprice) as sum_base_price, | ||
| sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, | ||
| sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, | ||
| sum(cast(l_extendedprice as decimal(12,2)) * (1 - l_discount) * (1 + l_tax)) as sum_charge, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| cast(cast(sum(case | ||
| when nation = 'BRAZIL' then volume | ||
| else 0 | ||
| end) as decimal(12,2)) / cast(sum(volume) as decimal(12,2)) as decimal(15,2)) as mkt_share |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pub fn i128_to_str(value: i128, precision: &u8, scale: &i8) -> String { | ||
| big_decimal_to_str( | ||
| BigDecimal::from_str(&Decimal::from_i128_with_scale(value, scale).to_string()) | ||
| BigDecimal::from_str(&Decimal128Type::format_decimal(value, *precision, *scale)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
This deals with the decimal precision issue without additional |
|
I wonder if this already fixes #4024 |
Yea, just verified locally that this can pass |
Dandandan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
|
Thanks @Dandandan |
|
Let's wait ~24hrs so other reviewers can have a chance. |
|
FYI @mingmwang @andygrove this PR also has some effect on performance, as casting is changed (mostly reduced). |
|
Ran the benchmarks for TPCH(SF=1) in memory. Performance is mostly the same, except a ~30% improvement for q1 compared to main 🚀 |
|
🎉 |
Which issue does this PR close?
Closes #5674.
Closes #3387.
Closes #4024.
Rationale for this change
Currently decimal multiplication in DataFusion silently truncates precision of result. It happens generally for regular decimal multiplication which doesn't overflow. Looks like DataFusion uses incomplete decimal precision coercion rule from Spark to coerce sides of decimal multiplication (and other arithmetic operators). The coerced type on two sides of decimal multiplication is not the resulting decimal type of multiplication. This (and how we computes decimal multiplication in the kernels) leads to truncated precision in the result decimal type.
What changes are included in this PR?
TypeCoercionto physical binary operatorAre these changes tested?
Are there any user-facing changes?