Skip to content

Comet native sort lacks row-format support for Struct(Map(...)) sort keys #4123

@andygrove

Description

@andygrove

Describe the bug

When the sort key is a struct containing a Map, Comet's native sort fails with:

org.apache.comet.CometNativeException
Not yet implemented: Row format support not yet implemented for: [SortField {
  options: SortOptions { descending: false, nulls_first: true },
  data_type: Struct([Field {
    name: "data",
    data_type: Map(Field { name: "entries", data_type: Struct([
      Field { name: "key", data_type: Utf8 },
      Field { name: "value", data_type: Utf8 }
    ]) }, false)
  }])
}]

This surfaces in Spark 4.1.1's new having-and-order-by-recursive-type-name-resolution.sql at query #38:

SELECT col1.data['key']
FROM VALUES (NAMED_STRUCT('data', MAP('key', 'value', 'num', '42'))) t (col1)
GROUP BY col1
HAVING col1.data['num'] IS NOT NULL
ORDER BY col1.data['key'];

Expected behavior

Comet should fall back to Spark when the sort key includes types not supported by the Arrow row format (Struct/Map combinations are a known gap upstream).

Workaround

The file is currently disabled when Comet is enabled via --SET spark.comet.enabled = false at the top of the file in dev/diffs/4.1.1.diff.

Additional context

PR #4093 enables Spark 4.1.1 in the Spark SQL Tests workflow. The underlying limitation lives in arrow-row in DataFusion / Arrow.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions