Skip to content

Spark read failed when migrate hive orc table with timestamp column  #9784

@zzzzming95

Description

@zzzzming95

Apache Iceberg version

iceberg-spark-runtime-3.4_2.12-1.4.3.jar

Query engine

spark

Please describe the bug 🐞

spark.sql("CREATE EXTERNAL TABLE mytable (foo timestamp) STORED AS orc LOCATION '/Users/russellspitzer/Temp/foo'")

spark.sql("INSERT INTO mytable VALUES (now())")

spark.sql("CALL spark_catalog.system.migrate('mytable')")


spark.sql("SELECT * FROM mytable")

i see the old issue : #2245

but when i using spark 3.4 and iceberg-spark-runtime-3.4_2.12-1.4.3.jar , i still get errors.

Caused by: java.lang.IllegalArgumentException: Can not promote TIMESTAMP type to TIMESTAMP
    at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:445)
    at org.apache.iceberg.orc.ORCSchemaUtil.buildOrcProjection(ORCSchemaUtil.java:319)
    at org.apache.iceberg.orc.ORCSchemaUtil.buildOrcProjection(ORCSchemaUtil.java:284)
    at org.apache.iceberg.orc.ORCSchemaUtil.buildOrcProjection(ORCSchemaUtil.java:265)

I think this is because hive and spark treat timestamp data type as timestamp with time zone and the orc file format is also stored as orc timestamp type. But in fact the hive timestamp data type should be stored as timestamp_instant in the orc file.

Iceberg strictly follows the data type specification of orc, that is, orc timestamp is regarded as timestamp without time zone, and orc timestamp_instant is regarded as timestamp with time zone. This causes exceptions to occur.

Please ask the community if there is a solution to this problem? thanks~

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstale

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions