-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Description
When trying to partition on a field of StringType within a NestedType,
Schema schema = new Schema(
Types.NestedField.required(1, "int", Types.IntegerType.get()),
Types.NestedField.required(2, "struct", Types.StructType.of(
Types.NestedField.required(5, "string", Types.StringType.get())
))
);
PartitionSpec partitionSpec = PartitionSpec.builderFor(schema).identity("struct.string").build();
the following exception is thrown when writing to Iceberg:
java.lang.ClassCastException: Cannot cast org.apache.spark.unsafe.types.UTF8String to java.lang.CharSequence
at java.lang.Class.cast(Class.java:3369)
at org.apache.iceberg.spark.source.PartitionKey.get(PartitionKey.java:133)
at org.apache.iceberg.PartitionSpec.get(PartitionSpec.java:150)
at org.apache.iceberg.PartitionSpec.partitionToPath(PartitionSpec.java:166)
at org.apache.iceberg.LocationProviders$DefaultLocationProvider.newDataLocation(LocationProviders.java:58)
at org.apache.iceberg.spark.source.Writer$WriterFactory$OutputFileFactory.newOutputFile(Writer.java:362)
at org.apache.iceberg.spark.source.Writer$BaseWriter.openCurrent(Writer.java:437)
at org.apache.iceberg.spark.source.Writer$PartitionedWriter.write(Writer.java:531)
at org.apache.iceberg.spark.source.Writer$PartitionedWriter.write(Writer.java:498)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:118)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:116)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:146)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
The accessor seems not generated corretly. WrappedPositionAccessor is expected to be used to access that String field but Position2Accessor is generated instead.
Suspect:
In PartitionKey#newAccessor():
} else if (accessor instanceof PositionAccessor) {
return new Position2Accessor(position, size, (PositionAccessor) accessor);
} else if (...)
All sub-classes of PositionAccessor (such as StringAccessor, DecimalAccessor and BytesAccessor) will be wrapped into Position2Accessor, which is not expected. They are supposed to be wrapped into WrappedPositionAccessor.
Metadata
Metadata
Assignees
Labels
No labels