From c219ffda26660cc8a8e2006879944f4eb9780052 Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Wed, 21 May 2025 16:07:08 -0400 Subject: [PATCH 1/3] Add primitive_null metadata --- variant/README.md | 18 ++++++++++++++++-- variant/primitive_null.metadata | Bin 0 -> 3 bytes 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/variant/README.md b/variant/README.md index 71e4da8..c1f4b41 100644 --- a/variant/README.md +++ b/variant/README.md @@ -45,8 +45,22 @@ Each example consists of 2 files: ## Regenerating these files -The files were generated by running the [`regen.py`](regen.py) script that uses Apache Spark to -generate the files. +The files in this directory were initially generated by running the [`regen.py`](regen.py) +script which used Apache Spark to generate the files. The files have been subsequently modified +when necessary to ensure that they conform to the Parquet spec. + +### Modification 1: Created metadata for `primitive_null` as a single byte (0x01) + +Per , Spark did not generate +any metadata for `null` and left `primitive_null.metadata` empty. +The metadata for `primitive_null` should be the same 3 bytes as other primitive types +* header = `0x01` +* dictionary_size = `0x00` +* `dictionary_size+1` `1` byte vales values: `0x00` + +```shell +cp primitive_int8.metadata primitive_null.metadata +``` [Variant]: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md [primitive types listed in the spec]: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-data-for-primitive-type-basic_type0 diff --git a/variant/primitive_null.metadata b/variant/primitive_null.metadata index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..12db4781e63a8c821478a5af5c840908f228181d 100644 GIT binary patch literal 3 KcmZQ%U;qFB1^@y8 literal 0 HcmV?d00001 From 06803ae0512adefd352d6286c4c8d148c3033f7f Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Wed, 21 May 2025 16:49:57 -0400 Subject: [PATCH 2/3] Update variant/README.md Co-authored-by: Fokko Driesprong --- variant/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/variant/README.md b/variant/README.md index c1f4b41..f5744de 100644 --- a/variant/README.md +++ b/variant/README.md @@ -49,7 +49,7 @@ The files in this directory were initially generated by running the [`regen.py`] script which used Apache Spark to generate the files. The files have been subsequently modified when necessary to ensure that they conform to the Parquet spec. -### Modification 1: Created metadata for `primitive_null` as a single byte (0x01) +### Modification 1: Created metadata for `primitive_null` as a single byte (`0x01`) Per , Spark did not generate any metadata for `null` and left `primitive_null.metadata` empty. From 7a590cca2dee1c0b3f0f48c13b48f1e71153d922 Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Thu, 22 May 2025 13:12:09 -0400 Subject: [PATCH 3/3] Update variant/README.md --- variant/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/variant/README.md b/variant/README.md index f5744de..e335caf 100644 --- a/variant/README.md +++ b/variant/README.md @@ -56,7 +56,7 @@ any metadata for `null` and left `primitive_null.metadata` empty. The metadata for `primitive_null` should be the same 3 bytes as other primitive types * header = `0x01` * dictionary_size = `0x00` -* `dictionary_size+1` `1` byte vales values: `0x00` +* `dictionary_size + 1 = 1` byte values: `0x00` ```shell cp primitive_int8.metadata primitive_null.metadata