Skip to content

[C++] [Parquet] Writing uint32 does not preserve parquet's LogicalType #28020

@asfimport

Description

@asfimport

When writing a uint32 column, (parquet's) logical type is not written, limiting interoperability with other engines.

Minimal Python

import pyarrow as pa

data = {"uint32", [1, None, 0]}
schema = pa.schema([pa.field('uint32', pa.uint32())])

t = pa.table(data, schema=schema)
pa.parquet.write_table(t, "bla.parquet")

 
Inspecting it with spark:

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

df = spark.read.parquet("bla.parquet")
print(df.select("uint32").schema)

shows StructType(List(StructField(uint32,LongType,true))). "LongType" indicates that the field is interpreted as a 64 bit integer. Further inspection of the metadata shows that both convertedType and logicalType are not being set. Note that this is independent of the arrow-specific schema written in the metadata.

Reporter: Jorge Leitão / @jorgecarleitao

Related issues:

Note: This issue was originally created as ARROW-12201. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions