Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions variant/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,5 +69,9 @@ resulting in a single `0` byte:
echo -n 'a' | tr a '\0' > primitive_null.value
```

### Modification 2: Created `TimeNTZ/Timestamp with timezone nanos/Timestamp without timezone nanos/UUID` with Iceberg test code

Currently, Spark [does not support](https://github.com/apache/spark/blob/master/common/variant/README.md) Variant values containing UUID, Time, or nanosecond-precision Timestamp. the `primitive_time.[metadata/value]`, `primitive_timestamp_nanos.[metadata/value]`, `primitive_timestampntz_nanos.[metadata/value]` and `primitive_uuid.[metadata/data]` was generated by [Iceberg test code](https://github.com/apache/iceberg/blob/3a4215dbb714477c89681ab94f1197b6ebcbdfff/parquet/src/test/java/org/apache/iceberg/parquet/TestVariantReaders.java#L355)

[Variant]: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md
[primitive types listed in the spec]: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-data-for-primitive-type-basic_type0
8 changes: 6 additions & 2 deletions variant/data_dictionary.json
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,11 @@
"primitive_int8": 42,
"primitive_null": null,
"primitive_string": "This string is longer than 64 bytes and therefore does not fit in a short_string and it also includes several non ascii characters such as \ud83d\udc22, \ud83d\udc96, \u2665\ufe0f, \ud83c\udfa3 and \ud83e\udd26!!",
"primitive_time": "12:33:54:123456",
"primitive_timestamp": "2025-04-16 12:34:56.78-04:00",
"primitive_timestampntz": "2025-04-16 12:34:56.78",
"short_string": "Less than 64 bytes (\u2764\ufe0f with utf8)"
}
"primitive_timestamp_nanos": "2024-11-07T12:33:54.123456789+00:00",
"primitive_timestampntz_nanos": "2024-11-07T12:33:54.123456789",
"primitive_uuid": "f24f9b64-81fa-49d1-b74e-8c09a6e31c56",
"short_string": "Less than 64 bytes (\u2764\ufe0f with utf8)",
}
Binary file added variant/primitive_time.metadata
Binary file not shown.
Binary file added variant/primitive_time.value
Binary file not shown.
Binary file added variant/primitive_timestamp_nanos.metadata
Binary file not shown.
1 change: 1 addition & 0 deletions variant/primitive_timestamp_nanos.value
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
HA:l��
Binary file added variant/primitive_timestampntz_nanos.metadata
Binary file not shown.
1 change: 1 addition & 0 deletions variant/primitive_timestampntz_nanos.value
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
LA:l��
Binary file added variant/primitive_uuid.metadata
Binary file not shown.
1 change: 1 addition & 0 deletions variant/primitive_uuid.value
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
P�O�d��IѷN� ��V
9 changes: 2 additions & 7 deletions variant/regen.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,7 @@
INSERT INTO T VALUES ('primitive_binary', X'31337deadbeefcafe'::Variant);
INSERT INTO T VALUES ('primitive_string', 'This string is longer than 64 bytes and therefore does not fit in a short_string and it also includes several non ascii characters such as 🐢, 💖, ♥️, 🎣 and 🤦!!'::Variant);

-- https://github.com/apache/parquet-testing/issues/79
-- is not clear how to create the following types using Spark SQL
-- TODO TimeNTZ (Type ID 17)
-- TODO 'timestamp with timezone (NANOS)' (Type ID 18)
-- TODO 'timestamp with time zone (NANOS)' (Type ID 19)
-- TODO 'UUID' (Type ID 20)
-- binary artifacts of 'TimeNTZ'/'timestamp with timezone (NANOS)'/'timestamp without time zone (NANOS)'/'UUID' was generated by the iceberg test code, please ref to https://github.com/apache/parquet-testing/pull/92 for more detail

-------------------------------
-- Short string (basic_type=1)
Expand Down Expand Up @@ -170,4 +165,4 @@
# Note: It is possible to write the output to a single parquet file, using a command
# such as:
# spark.sql("SELECT * FROM output").repartition(1).write.parquet('variant.parquet')
# At the time of writing, this file does not have the logical type annotation for VARIANT
# At the time of writing, this file does not have the logical type annotation for VARIANT