-
Notifications
You must be signed in to change notification settings - Fork 1.9k
support decimal scalar value #1394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ce0150f to
9aa8f11
Compare
9aa8f11 to
0c3506d
Compare
|
|
|
@Dandandan @alamb @houqp PTAL |
this is likely due to a new rust version being released with more stringent clippy lint rules Thankfully @xudong963 has a PR up to get it working again: #1395 |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good @liukun4515 -- thank you. I had a few suggestions, but overall this looks just about perfect.
datafusion/src/scalar.rs
Outdated
| /// 64bit float | ||
| Float64(Option<f64>), | ||
| /// 128bit decimal, using the i128 to represent the decimal | ||
| Decimal128(Option<i128>, Option<usize>, Option<usize>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 it seems like we always need the precision and scale (to know what type the Decimal is).
Thus, perhaps this could be changed to be
| Decimal128(Option<i128>, Option<usize>, Option<usize>), | |
| Decimal128(Option<i128>, usize, usize), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice point.
I almost forgot this.
| scale: usize, | ||
| ) -> Result<Self> { | ||
| // make sure the precision and scale is valid | ||
| // TODO const the max precision and min scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arrow spec doesn't seem to say anything about min/max valid scales: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L181-L190
I poked a little around in arrow-rs and I also didn't find any limits on the precision or scale either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use 128bit to represent a decimal value, the max precision is 38.
The scale is always greater or equal to 0 and less than or equal to precision.
In the arrow-rs, the decimal is represented by i128 with precision and scale.
So we can add the MAX_PRECISION = 38 in the datafusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I got this correctly.
The spec doesn't say anything about the the precision/scale of DECIMAL, but we're implementing DECIMAL128 here so 38 should be the max precision.
After apache/arrow-rs#131 is implemented, we can implement DECIMAL256 type in datafusion and the max precision will be 76.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, now in the arrow-rs the decimal256 has not been implemented.
In the arrow-go, it is also not implemented too.
Just in the arrow-java and arrow-c++, the bitwith of 256 has been implemented.
@capkurmagati
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can make this const in other pull request
datafusion/src/scalar.rs
Outdated
| match (self, other) { | ||
| (Decimal128(v1, p1, s1), Decimal128(v2, p2, s2)) => { | ||
| v1.eq(v2) && p1.eq(p2) && s1.eq(s2) | ||
| // TODO how to handle this case: decimal(123,10,1) with decimal(1230,10,2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Int8(Some(100)) is not equal to Int64(Some(100)) with this implementation either , so I think it is consistent with the rest of this comparison of decimal(123,10,1) and decimal(1230,10,2)` are different --
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep.
If we want to compare two values with the different data types, you should convert them to the same data type and then compare them.
datafusion/src/scalar.rs
Outdated
| // any newly added enum variant will require editing this list | ||
| // or else face a compile error | ||
| match (self, other) { | ||
| // TODO decimal type, we just compare the values which have the same precision and scale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code in this PR is correct -- two decimal's can be compared if they have the same precision / scale and they can't be compared (returns None) if they have different precision/scales).
| p.hash(state); | ||
| s.hash(state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could probably skip hashing the precision and scale - and just hash the value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If two decimal values have the same value and diff precision or scale, the hash value is the same.
But if we hash all data in the decimal, the above case will not happen.
It is better to hash all data in decimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point 👍
datafusion/src/scalar.rs
Outdated
| ScalarValue::Decimal128(_, _, _) => { | ||
| // TODO add the default precision and scale for this case | ||
| // DataType::Decimal(38, 0) | ||
| panic!("The Decimal Scalar value with invalid precision or scale."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you changed Decimal as suggested above to always have precision and scale, this code would not be possible
| ScalarValue::Decimal128(v1, _, _) => v1, | ||
| _ => unreachable!(), | ||
| }) | ||
| .collect::<Vec<Option<i128>>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using DecimalBuilder in this way is awkward. I wonder if we could add a function to create Decimal values from an array of i128 (likely it would go in arrow-rs). Something like
let arr = DecimalArray::from_iter_and_scale(array.into_iter(), precision, scale)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can add the from_iter_and_scale function in arrow-rs later and replace this in the followup pull request.
Do you agree this? @alamb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related pr: apache/arrow-rs#1009
datafusion/src/scalar.rs
Outdated
| scale: &usize, | ||
| ) -> ScalarValue { | ||
| let array = array.as_any().downcast_ref::<DecimalArray>().unwrap(); | ||
| // TODO add checker: the precision and scale are same with array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be a good assert! type check to add, though I don't think it would ever happen unless there is some bug
0c3506d to
bbc0a8f
Compare
|
@alamb I have addressed all comments. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great -- thank you @liukun4515
| /// 64bit float | ||
| Float64(Option<f64>), | ||
| /// 128bit decimal, using the i128 to represent the decimal | ||
| Decimal128(Option<i128>, usize, usize), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| p.hash(state); | ||
| s.hash(state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point 👍
## Which issue does this PR close? Closes apache#1388 ## Rationale for this change Following up on apache#1369 and apache#1386 ## What changes are included in this PR? Updated the doc ## How are these changes tested?
Which issue does this PR close?
Closes #1393
From #122 In order to support the decimal data type in the datafusion, we should add decimal scalar value first.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?