Use decimal precision to determine variant decimal type by aihuaxu · Pull Request #13692 · apache/iceberg

aihuaxu · 2025-07-27T19:36:30Z

Updated decimal encoding to follow Variant spec

Aligned the decimal encoding logic with the Variant spec, selecting the appropriate decimal type based on precision rather than unscaled value range. Updated corresponding tests to reflect the change.

RussellSpitzer · 2025-07-28T18:41:58Z

  public void testDecimal4() {
    VariantPrimitive<?> value =
        SerializedPrimitive.from(
-            new byte[] {primitiveHeader(8), 0x04, (byte) 0xD2, 0x02, (byte) 0x96, 0x49});


why change the value here?

RussellSpitzer · 2025-07-28T18:42:05Z

-              0x10,
-              0x22,
-              0x11
+              (byte) 0xFA,


Basically, I'm using the right decimal4 and decimal8 values for testing.

(byte) 0xD2, 0x02, (byte) 0x96, 0x49 has value 1234567890 but it should be decimal8 instead of decimal4.

Maybe I should add a check to error out if it's out of the decimal4 range?

RussellSpitzer · 2025-07-28T18:51:29Z

-    if (bitLength < 32) {
+    int precision = value.precision();
+
+    if (precision >= 1 && precision <= 9) {


Are these synonymous? I notice this matches our table in the Spec but i'm a little confused about if the previous behavior we wrote here was a bug.

Is it possible to have a unscaledValue bitlength < 32 and have a precision above 9?

For the value like 123456.7890, the unscaled fits in int32 while the precision is 10. So they are not exactly the same.

Yeah this change makes sense to me, my understanding is the parquet physical type must be determined based on precision, because the precision is ultimately what the physical value needs to store. I can see how there are cases where the required amount of bits to represent an unscaled value is less but we still need a higher precision for storing the decimal and preserving the semantic value

RussellSpitzer · 2025-07-28T18:52:32Z

@aihuaxu Going through the tests it makes me think we had a bug in the previous implementation, is that correct?

RussellSpitzer · 2025-07-28T18:54:02Z

+            new byte[] {primitiveHeader(8), 0x04, (byte) 0x15, (byte) 0xCD, (byte) 0x5B, 0x07});

    assertThat(value.type()).isEqualTo(PhysicalType.DECIMAL4);
-    assertThat(value.get()).isEqualTo(new BigDecimal("123456.7890"));


So is the issue here that 123456.7890 == Precision 10 so by the spec it shouldn't be Decimal4 it should be Decimal 8

That's correct. Other engines like Spark and Parquet-Java implements properly.

aihuaxu · 2025-07-28T20:11:24Z

@aihuaxu Going through the tests it makes me think we had a bug in the previous implementation, is that correct?

I think you are right. This may be considered a bug since some values like 123456.7890 should be encoded as decimal8 while it is encoded as decimal4.

amogh-jahagirdar · 2025-07-31T18:43:58Z

-        Variants.of(new BigDecimal("1234567890.987654321")), // decimal8
-        Variants.of(new BigDecimal("-1234567890.987654321")), // decimal8


So these changed because they should be decimal16 right?

amogh-jahagirdar · 2025-07-31T19:05:05Z

-    if (bitLength < 32) {
+    int precision = value.precision();
+
+    if (precision >= 1 && precision <= 9) {


Yeah this change makes sense to me, my understanding is the parquet physical type must be determined based on precision, because the precision is ultimately what the physical value needs to store. I can see how there are cases where the required amount of bits to represent an unscaled value is less but we still need a higher precision for storing the decimal and preserving the semantic value

rdblue · 2025-07-31T20:29:02Z

Thanks, @aihuaxu! I merged this to get it into 1.10.

Use decimal precision to determine variant decimal type

6538ce6

github-actions Bot added API parquet core labels Jul 27, 2025

aihuaxu mentioned this pull request Jul 27, 2025

Test Variant implementation with external test cases apache/parquet-java#3258

Open

Update tests

b9137ff

RussellSpitzer reviewed Jul 28, 2025

View reviewed changes

RussellSpitzer approved these changes Jul 28, 2025

View reviewed changes

amogh-jahagirdar approved these changes Jul 31, 2025

View reviewed changes

rdblue approved these changes Jul 31, 2025

View reviewed changes

rdblue merged commit 15351e6 into apache:main Jul 31, 2025
43 checks passed

		Variants.of(new BigDecimal("1234567890.987654321")), // decimal8
		Variants.of(new BigDecimal("-1234567890.987654321")), // decimal8

Conversation

aihuaxu commented Jul 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amogh-jahagirdar Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer commented Jul 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aihuaxu commented Jul 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amogh-jahagirdar Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rdblue commented Jul 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

amogh-jahagirdar Jul 31, 2025 •

edited

Loading

amogh-jahagirdar Jul 31, 2025 •

edited

Loading