Skip to content

WrappedPositionAccessor is not generated but Position2Accessor #994

@waterlx

Description

@waterlx

When trying to partition on a field of StringType within a NestedType,

Schema schema = new Schema(
    Types.NestedField.required(1, "int", Types.IntegerType.get()),
    Types.NestedField.required(2, "struct", Types.StructType.of(
        Types.NestedField.required(5, "string", Types.StringType.get())
    ))
);

PartitionSpec partitionSpec = PartitionSpec.builderFor(schema).identity("struct.string").build();

the following exception is thrown when writing to Iceberg:

java.lang.ClassCastException: Cannot cast org.apache.spark.unsafe.types.UTF8String to java.lang.CharSequence
	at java.lang.Class.cast(Class.java:3369)
	at org.apache.iceberg.spark.source.PartitionKey.get(PartitionKey.java:133)
	at org.apache.iceberg.PartitionSpec.get(PartitionSpec.java:150)
	at org.apache.iceberg.PartitionSpec.partitionToPath(PartitionSpec.java:166)
	at org.apache.iceberg.LocationProviders$DefaultLocationProvider.newDataLocation(LocationProviders.java:58)
	at org.apache.iceberg.spark.source.Writer$WriterFactory$OutputFileFactory.newOutputFile(Writer.java:362)
	at org.apache.iceberg.spark.source.Writer$BaseWriter.openCurrent(Writer.java:437)
	at org.apache.iceberg.spark.source.Writer$PartitionedWriter.write(Writer.java:531)
	at org.apache.iceberg.spark.source.Writer$PartitionedWriter.write(Writer.java:498)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:118)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:116)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:146)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)

The accessor seems not generated corretly. WrappedPositionAccessor is expected to be used to access that String field but Position2Accessor is generated instead.

Suspect:
In PartitionKey#newAccessor():

} else if (accessor instanceof PositionAccessor) {
return new Position2Accessor(position, size, (PositionAccessor) accessor);
} else if (...)

All sub-classes of PositionAccessor (such as StringAccessor, DecimalAccessor and BytesAccessor) will be wrapped into Position2Accessor, which is not expected. They are supposed to be wrapped into WrappedPositionAccessor.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions