-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Describe the enhancement requested
This enhancement request would be a continuation of the previous enhancement done in #39065 to support nested fields where the nesting type is a struct.
Here I would like to apply a predicate pushdown on the x subfield of a list<element: struct<x: double not null, y: double not null>>
required group field_id=-1 schema {
optional binary field_id=-1 id (String);
optional group field_id=-1 geometry (List) {
repeated group field_id=-1 list {
optional group field_id=-1 element {
required double field_id=-1 x;
required double field_id=-1 y;
}
}
}
}
When trying to apply the following expression as parquet::Dataset::ScanBuilder::Filter(),
auto fieldRefX = arrow::FieldRef(arrow::FieldRef("geometry", "element"), "x");
expression =cp::less_equal(cp::field_ref(arrow::FieldRef(fieldRefX)), cp::literal(m_sFilterEnvelope.MaxX))
I get the following error: nested paths only supported for structs
(I tried to remove that check, but I then get the following error: Function 'struct_field' has no kernel matching input types (list<element: struct<x: double not null, y: double not null>>))
Beyond the technical difficulties in implementing that, I guess there's a potential ambiguity of what such filtering means. Would that mean that a row is selected if all corresponding entries in the list match the predicate, or if just one would. For my use case (spatial filtering directly applied on GeoArrow struct/separated encoded geometry columns, for non-Point geometry types, in GeoParquet files), the later would be what I'm looking for.
CC @jorisvandenbossche @paleolimbot
Component(s)
C++, Parquet