feat: Implement Iceberg values by JanKaul · Pull Request #20 · apache/iceberg-rust

JanKaul · 2023-08-02T18:27:38Z

This pull request defines the representation of iceberg values. Additionally the serialization/deserialization is implemented.

liurenjie1024

Thanks for the effort. I think deriving serialization/deserialization for primitive types is ok. But I'm not sure we should implement ser/de for composite types in this way. When ser/de composite types, we should associate more information(such as schema, type) with it to avoid keeping them in memory.

Fokko

Great work @JanKaul, I left some comments

Fokko · 2023-08-03T16:40:51Z

+            Value::String(_) => Type::Primitive(PrimitiveType::String),
+            Value::UUID(_) => Type::Primitive(PrimitiveType::Uuid),
+            Value::Decimal(dec) => Type::Primitive(PrimitiveType::Decimal {
+                precision: 38,


I think we want to make the precision configurable as well.

I think we then might need to store the precision in another field. I don't know how to calculate the precision from the scale and I don't see a way to store it in the rust decimal.

liurenjie1024 · 2023-08-04T07:28:37Z

I would suggest two changes to this pr:

Create a PrimitiveValue type so that we can be type safe when ser/de from bytes.
Remove the ser/de implementation for composite types, I think we would need to have data types when implementing them, and we can do it later.

Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>

Xuanwo

Thanks, LGTM!

liurenjie1024 · 2023-08-08T08:07:54Z

Thanks @JanKaul I think we are almost there, just need to update date/time with long/int.

liurenjie1024

Thanks, LGTM

Fokko

LGTM, one question about the naming. A lot has happened since I last checked, and I'll leave it up to you to leave it as Literal or revert it to Value, it is up to you

Fokko · 2023-08-09T06:27:02Z

+                PrimitiveLiteral::String(_) => Type::Primitive(PrimitiveType::String),
+                PrimitiveLiteral::UUID(_) => Type::Primitive(PrimitiveType::Uuid),
+                PrimitiveLiteral::Decimal(dec) => Type::Primitive(PrimitiveType::Decimal {
+                    precision: 38,


This isn't always 38. It actually depends on the number of bytes that are read. But we can break that out in a separate chore

We might need to store the precision in an extra field because I don't see a way to get the precision from the rust Decimal. I'm not sure if this can be somehow calculated depending on the mantissa and the scale.

This is how we do it in Python:

MAX_PRECISION = tuple(math.floor(math.log10(math.fabs(math.pow(2, 8 * pos - 1) - 1))) for pos in range(24)) REQUIRED_LENGTH = tuple(next(pos for pos in range(24) if p <= MAX_PRECISION[pos]) for p in range(40)) def decimal_required_bytes(precision: int) -> int: """Compute the number of bytes required to store a precision. Args: precision: The number of digits to store. Returns: The number of bytes required to store a decimal with a certain precision. """ if precision <= 0 or precision >= 40: raise ValueError(f"Unsupported precision, outside of (0, 40]: {precision}") return REQUIRED_LENGTH[precision]

And how we do it in Java: https://github.com/apache/iceberg/blob/f5f543a54ff7460648bb864f4f06a29eb28938b9/api/src/main/java/org/apache/iceberg/types/TypeUtil.java#L679-L715

Fokko · 2023-08-09T06:27:53Z

+// under the License.
+
+/*!
+ * Value in iceberg


A lot of happened since I reviewed this PR the last time. I'm also okay with calling this a value instead of a literal.

Fokko · 2023-08-09T06:28:40Z

+    fn avro_bytes_long() {
+        let bytes = vec![32u8, 0u8, 0u8, 0u8, 0u8, 0u8, 0u8, 0u8];
+
+        check_avro_bytes_serde(


What are we testing here? Keep in mind that Avro uses zig-zag encoding for integers.

Typically the upper and lower bounds are stored as avro "bytes" inside the manifest_entry. These bytes conform to the iceberg binary single value serialization. This is a test to check if bytes serialized into avro "bytes" can be deserialized into a literal value.

Sign in to view

+    }
+}
+
+fn to_optional_literal(value: Result<Literal, Error>) -> Result<Option<Literal>, Error> {


ZENOTME

Thanks! LGTM!

Fokko

Thanks @JanKaul for working on this, and @liurenjie1024 @Xuanwo @ZENOTME for the reviews 🚀

ZENOTME · 2023-10-16T15:09:57Z

+        let reader = apache_avro::Reader::new(&*encoded).unwrap();
+
+        for record in reader {
+            let result = apache_avro::from_value::<ByteBuf>(&record.unwrap()).unwrap();


Sorry, I can't figure it out what is the difference between result and bytes? 🤔 cc @JanKaul

Good catch! The literal is supposed to be written to the Avro Writer instead of the bytes. Like so:

writer.append_ser(Into::<ByteBuf>::into(literal)).unwrap();

I will create a PR to fix this.

I created a PR to fix it.

JanKaul requested review from Fokko, Xuanwo, amogh-jahagirdar and liurenjie1024 August 2, 2023 18:28

liurenjie1024 reviewed Aug 3, 2023

View reviewed changes

liurenjie1024 requested a review from nastra August 3, 2023 03:14

Xuanwo reviewed Aug 3, 2023

View reviewed changes

Comment thread crates/iceberg/src/error.rs Outdated

ZENOTME reviewed Aug 3, 2023

View reviewed changes

Comment thread crates/iceberg/src/spec/values.rs Outdated

Fokko reviewed Aug 3, 2023

View reviewed changes

Comment thread crates/iceberg/src/spec/values.rs Outdated

JanKaul force-pushed the values branch from d6dec7c to 7e1dd99 Compare August 3, 2023 10:07

Xuanwo reviewed Aug 3, 2023

View reviewed changes

Comment thread crates/iceberg/src/spec/values.rs Outdated

Comment thread crates/iceberg/src/spec/values.rs Outdated

Comment thread crates/iceberg/src/spec/values.rs Outdated

Comment thread crates/iceberg/src/spec/values.rs Outdated

JanKaul force-pushed the values branch from cf2257e to fbf2aac Compare August 3, 2023 14:12

Fokko reviewed Aug 3, 2023

View reviewed changes

JanKaul and others added 16 commits August 4, 2023 16:28

implement values

1a5236e

improve getters

1f9de91

fix clippy warnings

20f8e2c

fix clippy warnings

5be7a31

change into bytebuf to from

2ad06e2

add license header

c3fc2c6

Use Long instead of LongInt

50e41e0

Co-authored-by: Renjie Liu <liurenjie2008@gmail.com>

use more general error kind

4775f16

use Naivetime

bb59703

use naivedate

1713492

use naivedatetime

53182e4

fix clippy warnings

af46ba1

use uuid

1982d4c

use orderedfloat

7cab82b

fix clippy warnings

831f821

fix tests

522d85f

JanKaul force-pushed the values branch from 0ec2178 to 8292d42 Compare August 4, 2023 14:34

JanKaul added 4 commits August 4, 2023 19:52

implement list test

7058b93

implement map test

7e032b5

fix error

9fe7c76

fix clippy warnings

0bf2a7b

Xuanwo approved these changes Aug 8, 2023

View reviewed changes

ZENOTME reviewed Aug 9, 2023

View reviewed changes

Comment thread crates/iceberg/src/spec/values.rs Outdated

liurenjie1024 mentioned this pull request Aug 9, 2023

feat: Add transform #26

Merged

change timestamps to int/long

06b5676

liurenjie1024 approved these changes Aug 9, 2023

View reviewed changes

Fokko approved these changes Aug 9, 2023

View reviewed changes

JanKaul added 4 commits August 9, 2023 08:43

convert nulls to None

29488dc

add tests for null

583e3b1

null test for struct

927fa7f

fix clippy warning

451f3d4

Xuanwo reviewed Aug 9, 2023

View reviewed changes

Comment thread crates/iceberg/src/error.rs Outdated

Xuanwo reviewed Aug 9, 2023

View reviewed changes

Comment thread crates/iceberg/src/spec/values.rs Outdated

}

}

fn to_optional_literal(value: Result<Literal, Error>) -> Result<Option<Literal>, Error> {

This comment was marked as duplicate.

Sign in to view

convert json null to option

c3fa7d7

ZENOTME approved these changes Aug 9, 2023

View reviewed changes

Fokko approved these changes Aug 9, 2023

View reviewed changes

Fokko merged commit 4ee05b4 into apache:main Aug 9, 2023

JanKaul deleted the values branch August 9, 2023 14:00

ZENOTME mentioned this pull request Oct 12, 2023

struct value design #77

Closed

ZENOTME reviewed Oct 16, 2023

View reviewed changes

Xuanwo mentioned this pull request Nov 11, 2023

Replace i64 with DateTime #94

Merged

xxchan pushed a commit to xxchan/iceberg-rust that referenced this pull request Mar 12, 2025

feat(iceberg): remove schemas (apache#20)

826391c

xxchan pushed a commit to xxchan/iceberg-rust that referenced this pull request Mar 25, 2025

feat: support remove schemas (apache#20)

eb746dc

xxchan pushed a commit to xxchan/iceberg-rust that referenced this pull request Mar 25, 2025

feat: support remove schemas (apache#20)

46bdd0e

xxchan pushed a commit to xxchan/iceberg-rust that referenced this pull request Mar 25, 2025

feat: support remove schemas (apache#20)

fa6caa9

Conversation

JanKaul commented Aug 2, 2023

Uh oh!

liurenjie1024 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fokko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

liurenjie1024 commented Aug 4, 2023

Uh oh!

Xuanwo left a comment

Choose a reason for hiding this comment

Uh oh!

liurenjie1024 commented Aug 8, 2023

Uh oh!

Uh oh!

liurenjie1024 left a comment

Choose a reason for hiding this comment

Uh oh!

Fokko left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JanKaul Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as duplicate.

Uh oh!

ZENOTME left a comment

Choose a reason for hiding this comment

Uh oh!

Fokko left a comment

Choose a reason for hiding this comment

Uh oh!

ZENOTME Oct 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JanKaul Aug 9, 2023 •

edited

Loading

ZENOTME Oct 16, 2023 •

edited

Loading