-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Description
Hi, I am trying to write a Spark DataFrame to Iceberg that contains rows that cross an hourly partition threshold (i.e. the DataFrame contains rows in >1 hour). The expected result would be to commit a different file for each partition. However, I am receiving this error:
19/09/30 16:30:52 INFO CodecPool: Got brand-new compressor [.gz]
19/09/30 16:30:52 WARN Writer: Duplicate key: [436073] == [436073]
19/09/30 16:30:52 ERROR Utils: Aborting task
java.lang.IllegalStateException: Already closed file for partition: ec_event_time_hour=2019-09-30-17
at org.apache.iceberg.spark.source.Writer$PartitionedWriter.write(Writer.java:389)
at org.apache.iceberg.spark.source.Writer$PartitionedWriter.write(Writer.java:350)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$2(WriteToDataSourceV2Exec.scala:118)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:116)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.$anonfun$doExecute$2(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Any idea what could be happening here? Do I need to group the DataFrame by hour in order to get two DFs that only contain rows for a single partition?
EDIT: Running the iceberg-spark-runtime built from master.
Metadata
Metadata
Assignees
Labels
No labels