Skip to content

GetIndexedField doesn't support indexing Map types #7824

@swgillespie

Description

@swgillespie

Describe the bug

If you load a Parquet file that has a column of type Map, you can't write a query involving GetIndexedField that queries it. This would appear to be because GetIndexedField only specifically supports structs and lists and not maps.

To Reproduce

DataFusion CLI v31.0.0
❯ create external table test stored as parquet location '../scratch';
0 rows in set. Query took 0.014 seconds.

❯ show columns from test;
+---------------+--------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| table_catalog | table_schema | table_name | column_name | data_type

                                                                                               | is_nullable |
+---------------+--------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| datafusion    | public       | test       | ints        | Map(Field { name: "entries", data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false) | NO          |
| datafusion    | public       | test       | strings     | Map(Field { name: "entries", data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false)  | NO          |
| datafusion    | public       | test       | timestamp   | Utf8

                                                                                               | NO          |
+---------------+--------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
❯ select avg(ints['bytes']), strings['method'] from test group by strings['method'];
Error during planning: The expression to get an indexed field is only valid for `List` or `Struct` types, got Map(Field { name: "entries", data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false)

Expected behavior

I would expect the above query

SELECT avg(ints['bytes']), strings['method']
FROM test 
GROUP BY strings['method'];

to work and produce a result set with two columns.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions