-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
When writing Arrow data to Parquet, we serialise the schema's IPC representation. This schema is then read back by the Parquet reader, and used to preserve the array type information from the original Arrow data.
We however do not rely on the above mechanism when reading projected columns from a Parquet file; i.e. if we have a file with 3 columns, but we only read 2 columns, we do not yet rely on the serialised arrow schema; and can thus lose type information.
This behaviour was deliberately left out, as the function
parquet_to_arrow_schema_by_columns does not check for the existence of arrow schema in the metadata.
Reporter: Neville Dipale / @nevi-me
Assignee: Carol Nichols / @carols10cents
PRs and other links:
Note: This issue was originally created as ARROW-10168. Please see the migration documentation for further details.