java.lang.IllegalStateException: Already closed file for partition

Hi, I am trying to write a Spark DataFrame to Iceberg that contains rows that cross an hourly partition threshold (i.e. the DataFrame contains rows in >1 hour). The expected result would be to commit a different file for each partition. However, I am receiving this error:

```19/09/30 16:30:51 INFO CodecPool: Got brand-new compressor [.gz]
19/09/30 16:30:52 INFO CodecPool: Got brand-new compressor [.gz]
19/09/30 16:30:52 WARN Writer: Duplicate key: [436073] == [436073]
19/09/30 16:30:52 ERROR Utils: Aborting task
java.lang.IllegalStateException: Already closed file for partition: ec_event_time_hour=2019-09-30-17
	at org.apache.iceberg.spark.source.Writer$PartitionedWriter.write(Writer.java:389)
	at org.apache.iceberg.spark.source.Writer$PartitionedWriter.write(Writer.java:350)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$2(WriteToDataSourceV2Exec.scala:118)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:116)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.$anonfun$doExecute$2(WriteToDataSourceV2Exec.scala:67)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:121)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
```

Any idea what could be happening here? Do I need to group the DataFrame by hour in order to get two DFs that only contain rows for a single partition?

EDIT: Running the iceberg-spark-runtime built from master.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

java.lang.IllegalStateException: Already closed file for partition #508

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

java.lang.IllegalStateException: Already closed file for partition #508

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions