Use decimal precision to determine variant decimal type#13692
Use decimal precision to determine variant decimal type#13692rdblue merged 2 commits intoapache:mainfrom
Conversation
| public void testDecimal4() { | ||
| VariantPrimitive<?> value = | ||
| SerializedPrimitive.from( | ||
| new byte[] {primitiveHeader(8), 0x04, (byte) 0xD2, 0x02, (byte) 0x96, 0x49}); |
There was a problem hiding this comment.
why change the value here?
| 0x10, | ||
| 0x22, | ||
| 0x11 | ||
| (byte) 0xFA, |
There was a problem hiding this comment.
Basically, I'm using the right decimal4 and decimal8 values for testing.
(byte) 0xD2, 0x02, (byte) 0x96, 0x49 has value 1234567890 but it should be decimal8 instead of decimal4.
Maybe I should add a check to error out if it's out of the decimal4 range?
| if (bitLength < 32) { | ||
| int precision = value.precision(); | ||
|
|
||
| if (precision >= 1 && precision <= 9) { |
There was a problem hiding this comment.
Are these synonymous? I notice this matches our table in the Spec but i'm a little confused about if the previous behavior we wrote here was a bug.
Is it possible to have a unscaledValue bitlength < 32 and have a precision above 9?
There was a problem hiding this comment.
For the value like 123456.7890, the unscaled fits in int32 while the precision is 10. So they are not exactly the same.
There was a problem hiding this comment.
Yeah this change makes sense to me, my understanding is the parquet physical type must be determined based on precision, because the precision is ultimately what the physical value needs to store. I can see how there are cases where the required amount of bits to represent an unscaled value is less but we still need a higher precision for storing the decimal and preserving the semantic value
|
@aihuaxu Going through the tests it makes me think we had a bug in the previous implementation, is that correct? |
| new byte[] {primitiveHeader(8), 0x04, (byte) 0x15, (byte) 0xCD, (byte) 0x5B, 0x07}); | ||
|
|
||
| assertThat(value.type()).isEqualTo(PhysicalType.DECIMAL4); | ||
| assertThat(value.get()).isEqualTo(new BigDecimal("123456.7890")); |
There was a problem hiding this comment.
So is the issue here that 123456.7890 == Precision 10 so by the spec it shouldn't be Decimal4 it should be Decimal 8
There was a problem hiding this comment.
That's correct. Other engines like Spark and Parquet-Java implements properly.
I think you are right. This may be considered a bug since some values like 123456.7890 should be encoded as decimal8 while it is encoded as decimal4. |
| Variants.of(new BigDecimal("1234567890.987654321")), // decimal8 | ||
| Variants.of(new BigDecimal("-1234567890.987654321")), // decimal8 |
There was a problem hiding this comment.
So these changed because they should be decimal16 right?
| if (bitLength < 32) { | ||
| int precision = value.precision(); | ||
|
|
||
| if (precision >= 1 && precision <= 9) { |
There was a problem hiding this comment.
Yeah this change makes sense to me, my understanding is the parquet physical type must be determined based on precision, because the precision is ultimately what the physical value needs to store. I can see how there are cases where the required amount of bits to represent an unscaled value is less but we still need a higher precision for storing the decimal and preserving the semantic value
|
Thanks, @aihuaxu! I merged this to get it into 1.10. |
Updated decimal encoding to follow Variant spec
Aligned the decimal encoding logic with the Variant spec, selecting the appropriate decimal type based on precision rather than unscaled value range. Updated corresponding tests to reflect the change.