-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
If you load a Parquet file that has a column of type Map, you can't write a query involving GetIndexedField that queries it. This would appear to be because GetIndexedField only specifically supports structs and lists and not maps.
To Reproduce
DataFusion CLI v31.0.0
❯ create external table test stored as parquet location '../scratch';
0 rows in set. Query took 0.014 seconds.
❯ show columns from test;
+---------------+--------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| table_catalog | table_schema | table_name | column_name | data_type
| is_nullable |
+---------------+--------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| datafusion | public | test | ints | Map(Field { name: "entries", data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false) | NO |
| datafusion | public | test | strings | Map(Field { name: "entries", data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false) | NO |
| datafusion | public | test | timestamp | Utf8
| NO |
+---------------+--------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
❯ select avg(ints['bytes']), strings['method'] from test group by strings['method'];
Error during planning: The expression to get an indexed field is only valid for `List` or `Struct` types, got Map(Field { name: "entries", data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false)
Expected behavior
I would expect the above query
SELECT avg(ints['bytes']), strings['method']
FROM test
GROUP BY strings['method'];to work and produce a result set with two columns.
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working