diff --git a/variant/README.md b/variant/README.md index 71e4da8..e335caf 100644 --- a/variant/README.md +++ b/variant/README.md @@ -45,8 +45,22 @@ Each example consists of 2 files: ## Regenerating these files -The files were generated by running the [`regen.py`](regen.py) script that uses Apache Spark to -generate the files. +The files in this directory were initially generated by running the [`regen.py`](regen.py) +script which used Apache Spark to generate the files. The files have been subsequently modified +when necessary to ensure that they conform to the Parquet spec. + +### Modification 1: Created metadata for `primitive_null` as a single byte (`0x01`) + +Per , Spark did not generate +any metadata for `null` and left `primitive_null.metadata` empty. +The metadata for `primitive_null` should be the same 3 bytes as other primitive types +* header = `0x01` +* dictionary_size = `0x00` +* `dictionary_size + 1 = 1` byte values: `0x00` + +```shell +cp primitive_int8.metadata primitive_null.metadata +``` [Variant]: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md [primitive types listed in the spec]: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-data-for-primitive-type-basic_type0 diff --git a/variant/primitive_null.metadata b/variant/primitive_null.metadata index e69de29..12db478 100644 Binary files a/variant/primitive_null.metadata and b/variant/primitive_null.metadata differ