-
Notifications
You must be signed in to change notification settings - Fork 4.1k
GH-46908: [Docs][Format] Add variant extension type docs #47456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f1cd15a
712ad76
aa0611c
fab0ea4
8e4a24c
d0c218b
60e21dd
9c8d05f
e91dbd6
9ca31a5
5352496
35c89a4
4619dc7
d8e1cfc
23f6fb3
b3b4063
5055efe
2490f56
d9c16cd
b415e37
728b19a
6df102c
25c11e0
44da918
5acb724
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -417,7 +417,133 @@ better zero-copy compatibility with various systems that also store booleans usi | |
|
|
||
| Metadata is an empty string. | ||
|
|
||
| ========================= | ||
| .. _parquet_variant_extension: | ||
|
|
||
| Parquet Variant | ||
| =============== | ||
|
|
||
| Variant represents a value that may be one of: | ||
|
|
||
| * Primitive: a type and corresponding value (e.g. ``INT``, ``STRING``) | ||
|
|
||
| * Array: An ordered list of Variant values | ||
|
|
||
| * Object: An unordered collection of string/Variant pairs (i.e. key/value pairs). An object may not contain duplicate keys | ||
|
|
||
| Particularly, this provides a way to represent semi-structured data which is stored as a | ||
| `Parquet Variant <https://github.com/apache/parquet-format/blob/master/VariantEncoding.md>`__ value within Arrow columns in | ||
| a lossless fashion. This also provides the ability to represent `shredded <https://github.com/apache/parquet-format/blob/master/VariantShredding.md>`__ | ||
| variant values. The canonical extension type allows systems to pass Variant encoded data around without special handling unless | ||
| they want to directly interact with the encoded variant data. See the Parquet format specification for details on what the actual | ||
| binary values look like. | ||
|
|
||
| * Extension name: ``arrow.parquet.variant``. | ||
|
|
||
| * The storage type of this extension is a ``Struct`` that obeys the following rules: | ||
|
|
||
| * A *non-nullable* field named ``metadata`` which is of type ``Binary``, ``LargeBinary``, or ``BinaryView``. | ||
|
zeroshade marked this conversation as resolved.
|
||
|
|
||
| * At least one (or both) of the following: | ||
|
|
||
| * A field named ``value`` which is of type ``Binary``, ``LargeBinary``, or ``BinaryView``. | ||
| (unshredded variants consist of just the ``metadata`` and ``value`` fields only) | ||
|
|
||
| * A field named ``typed_value`` which can be a :ref:`variant_primitive_type_mapping` or a ``List``, ``LargeList``, ``ListView`` or ``Struct`` | ||
|
|
||
| * If the ``typed_value`` field is a ``List``, ``LargeList`` or ``ListView`` its elements **must** be *non-nullable* and **must** | ||
| be a ``Struct`` consisting of at least one (or both) of the following: | ||
|
|
||
| * A field named ``value`` which is of type ``Binary``, ``LargeBinary``, or ``BinaryView``. | ||
|
|
||
| * A field named ``typed_value`` which follows the rules outlined above (this allows for arbitrarily nested data). | ||
|
|
||
|
zeroshade marked this conversation as resolved.
|
||
| * If the ``typed_value`` field is a ``Struct``, then its fields **must** be *non-nullable*, representing the fields being shredded | ||
| from the objects, and **must** be a ``Struct`` consisting of at least one (or both) of the following: | ||
|
|
||
| * A field named ``value`` which is of type ``Binary``, ``LargeBinary``, or ``BinaryView``. | ||
|
|
||
| * A field named ``typed_value`` which follows the rules outlined above (this allows for arbitrarily nested data). | ||
|
|
||
| * Extension type parameters: | ||
|
zeroshade marked this conversation as resolved.
|
||
|
|
||
| This type does not have any parameters. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the parquet spec is versioned for forward compatiblity, are we going to ignore that for now?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can add a later optional parameter to denote future versions of the Variant spec, if any?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Arrow spec is also versioned, and the data in the metadata array is the raw Parquet Variant bytes which include the version number |
||
|
|
||
| * Description of the serialization: | ||
|
|
||
| Extension metadata is an empty string. | ||
|
|
||
| .. note:: | ||
|
|
||
| It is also *permissible* for the ``metadata`` field to be dictionary-encoded with a preferred (*but not required*) index type of ``int8``, | ||
| or run-end-encoded with a preferred (*but not required*) runs type of ``int8``. | ||
|
|
||
| .. note:: | ||
|
|
||
| The fields may be in any order, and thus must be accessed by **name** not by *position*. The field names are case sensitive. | ||
|
|
||
| .. _variant_primitive_type_mapping: | ||
|
|
||
| Primitive Type Mappings | ||
| ----------------------- | ||
|
|
||
| +----------------------+------------------------+ | ||
| | Arrow Primitive Type | Variant Primitive Type | | ||
| +======================+========================+ | ||
| | Null | Null | | ||
| +----------------------+------------------------+ | ||
| | Boolean | Boolean (true/false) | | ||
| +----------------------+------------------------+ | ||
| | Int8 | Int8 | | ||
| +----------------------+------------------------+ | ||
| | Uint8 | Int16 | | ||
| +----------------------+------------------------+ | ||
| | Int16 | Int16 | | ||
| +----------------------+------------------------+ | ||
| | Uint16 | Int32 | | ||
| +----------------------+------------------------+ | ||
| | Int32 | Int32 | | ||
| +----------------------+------------------------+ | ||
| | Uint32 | Int64 | | ||
| +----------------------+------------------------+ | ||
| | Int64 | Int64 | | ||
| +----------------------+------------------------+ | ||
| | Float | Float | | ||
| +----------------------+------------------------+ | ||
| | Double | Double | | ||
| +----------------------+------------------------+ | ||
| | Decimal32 | decimal4 | | ||
| +----------------------+------------------------+ | ||
| | Decimal64 | decimal8 | | ||
| +----------------------+------------------------+ | ||
| | Decimal128 | decimal16 | | ||
| +----------------------+------------------------+ | ||
| | Date32 | Date | | ||
| +----------------------+------------------------+ | ||
| | Time64 | TimeNTZ | | ||
| +----------------------+------------------------+ | ||
| | Timestamp(us, UTC) | Timestamp (micro) | | ||
| +----------------------+------------------------+ | ||
| | Timestamp(us) | TimestampNTZ (micro) | | ||
| +----------------------+------------------------+ | ||
| | Timestamp(ns, UTC) | Timestamp (nano) | | ||
| +----------------------+------------------------+ | ||
| | Timestamp(ns) | TimestampNTZ (nano) | | ||
| +----------------------+------------------------+ | ||
| | Binary | Binary | | ||
| +----------------------+------------------------+ | ||
| | LargeBinary | Binary | | ||
| +----------------------+------------------------+ | ||
| | BinaryView | Binary | | ||
| +----------------------+------------------------+ | ||
| | String | String | | ||
| +----------------------+------------------------+ | ||
| | LargeString | String | | ||
| +----------------------+------------------------+ | ||
| | StringView | String | | ||
| +----------------------+------------------------+ | ||
| | UUID extension type | UUID | | ||
| +----------------------+------------------------+ | ||
|
|
||
| Community Extension Types | ||
| ========================= | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.