Skip to content

druid-deltalake-extensions support for StructType #16782

@Donutellko

Description

@Donutellko

Description

Hello, we are trying to load data from a DeltaTable, and facing an issue with StructType (manually formatted error message, full stack trace in the attachments: druid-delta-unsupported-StructType.log ):

Failed to sample data: Unsupported data type[
  struct(
    StructField(name=FieldOne,type=string,nullable=true,metadata={}), 
    StructField(name=FieldTwo,type=string,nullable=true,metadata={})
  )
] for fieldName[MetaData].
        at org.apache.druid.error.DruidException$DruidExceptionBuilder.build(DruidException.java:460)
        at ...
        at org.apache.druid.error.InvalidInput.exception(InvalidInput.java:30)
        at org.apache.druid.delta.input.DeltaInputRow.getValue(DeltaInputRow.java:201)
        at org.apache.druid.delta.input.DeltaInputRow._getRaw(DeltaInputRow.java:163)
        at org.apache.druid.delta.input.DeltaInputRow.<init>(DeltaInputRow.java:74)
        at org.apache.druid.delta.input.DeltaInputSourceReader$DeltaInputSourceIterator.next(DeltaInputSourceReader.java:140)
        at ...

Using apache/druid:30.0.0

Expected behavior:

  • StructType's StructFields are loaded as a set of columns with a common prefix: MetaData.FieldOne, MetaData.FieldTwo, ...;
  • or (at least) StructType is loaded as a JSON string.
  • Additionally, I would like to discuss a possibility of loading delta ArrayType as a JSON string.

Motivation

  • Storing StructType is a common approach for DeltaTables, and with Spark they are widely used to group some fields and accessing them like this: .select(col("MetaData.FieldOne")). Supporting loading this data seems indispensable for a common use of druid-delta-extensions.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions