Skip to content

Missing handling of Timestamp Without Timezone type  #2244

@RussellSpitzer

Description

@RussellSpitzer

Spark currently possess no readers for the Timestamp.withoutZone() type but is able to create tables with this schema. These tables if you attempt to read them from Spark will error out.

Currently in master the only reader for this type that I can see is for the generic case

case TIMESTAMP:
return GenericOrcReaders.timestamps();

Other systems if they hit a table with this type will fail immediately since they do not have valid readers.

This is a bit troubling because this column type is used by default when non-iceberg ORC writers make new files, for example:

spark.sql("CREATE EXTERNAL TABLE mytable (foo timestamp) location '/Users/russellspitzer/Temp/foo'")

spark.sql("INSERT INTO mytable VALUES (now())")


Creates files

File Version: 0.12 with ORC_135
Rows: 1
Compression: SNAPPY
Compression size: 262144
Calendar: Julian/Gregorian
Type: struct<foo:timestamp>

The non-iceberg Spark and Hive Orc readers and writers have no problem dealing with these files but, if an iceberg table is created and these files are added to it then they are unreadable by Iceberg's orc readers and writers.

There is also a related problem with Migrate -- @RussellSpitzer Add Link Here

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions