feat: Add map_extract module and function#11969
Conversation
|
support related array tests after #11712 |
| /// The function takes a single argument that must be a MapArray | ||
| MapArray, | ||
| /// Specialized Signature for MapExtract and similar functions | ||
| MapArrayAndKey, |
There was a problem hiding this comment.
Would it be possible to use TypeSignature::UserDefined to avoid enumeration bloat and make it more cohesive?
There was a problem hiding this comment.
Only if there are more than 1 functions utilize the same signature, we could add it
|
I plan to review this tomorrow |
| {POST: 41, HEAD: 33, PATCH: } | ||
| ``` | ||
|
|
||
| ### `map_extract` |
The clickhouse provides a similar function |
e0bcabb to
5e978b4
Compare
| # map_extract with columns | ||
| query ??? | ||
| select map_extract(column1, 1), map_extract(column1, 4), map_extract(column1, 5) from map_array_table_1; | ||
| ---- | ||
| [1, , 3] NULL NULL | ||
| NULL [1, , 3] [4, , 6] | ||
| NULL NULL NULL | ||
|
|
||
| query ??? | ||
| select map_extract(column1, column2), map_extract(column1, column3), map_extract(column1, column4) from map_array_table_1; | ||
| ---- | ||
| [1, , 3] [1, , 3] [1, , 3] | ||
| [4, , 6] [4, , 6] [4, , 6] | ||
| NULL NULL NULL | ||
|
|
||
| query ??? | ||
| select map_extract(column1, column2), map_extract(column1, column3), map_extract(column1, column4) from map_array_table_2; | ||
| ---- | ||
| [1, , 3] NULL [1, , 3] | ||
| [4, , 6] NULL [4, , 6] | ||
| NULL NULL NULL | ||
|
|
||
| query ??? | ||
| select map_extract(column1, 1), map_extract(column1, 4), map_extract(column1, 5) from map_array_table_2; | ||
| ---- | ||
| [1, , 3] NULL NULL | ||
| NULL [1, , 3] [4, , 6] | ||
| NULL NULL NULL | ||
|
|
There was a problem hiding this comment.
test map_extract in a table
Thank you -- I found it now Description: Alias of Example map_extract(map([100, 5], [42, 43]), 100)Result: |
| select map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'a'), map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'b'), | ||
| map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'c'), map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'd'); | ||
| ---- | ||
| 1 NULL 3 NULL |
There was a problem hiding this comment.
By my understanding of duckdb's doc the values should be a list not a scalar. In other words the result should be [1], [], [3], [] not 1, NULL, 3, NULL:
D select map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'a');
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ map_extract(main.map(main.list_value('a', 'b', 'c'), main.list_value(1, NULL, 3)), 'a') │
│ int32[] │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│ [1] │
└─────────────────────────────────────────────────────────────────────────────────────────┘
D
D select map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'd');
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ map_extract(main.map(main.list_value('a', 'b', 'c'), main.list_value(1, NULL, 3)), 'd') │
│ int32[] │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│ [] │
└─────────────────────────────────────────────────────────────────────────────────────────┘There was a problem hiding this comment.
WHile this choice is strange to me (a list that will always have 1 or 0 elements) I think it will make for an easy implementation -- you shouldn't have to make special implementations for each type of key -- instead you should just be able to calculate the offsets directly and make a list array that points to the keys array
There was a problem hiding this comment.
@Weijun-H Here is a like to old converstation about how DataFuion strives not to invent / determine semantics.
#11436 (comment)
I should have pointed it earlier, sorry about that. We should be consistent with postgres/duckdb
There was a problem hiding this comment.
I find it unintuitive to return a list in this context because, in a Map, each key is unique by definition. Returning a list implies the possibility of multiple values for a single key, which contradicts the core concept of how a Map operates. The key uniqueness in a Map should naturally lead to a single value or NULL, making a list redundant and potentially confusing. I refactored the code so it can avoid to make special implementations for each type of key now.
There was a problem hiding this comment.
I check the history of map_extract in duckdb and find that there is no discussion about it. It is created by one man in early day
https://github.com/duckdb/duckdb/blob/a196a022ffcf427aa21184d295bb70caa42e0655/src/function/scalar/nested/map_extract.cpp
Given the rationale is not clear, I don't have strong opinion to strictly follow what duckdb is in this case
There was a problem hiding this comment.
It is created by one man
I think this describes quite a bit of duckdb 😆
594f463 to
c5b23ec
Compare
|
Changed to follow DuckDB now |
| let query_key = query_keys_array.slice(row_index, 1); | ||
|
|
||
| let value_index = (0..len) | ||
| .find(|&i| keys.slice(start + i, 1) == Arc::<dyn Array>::clone(&query_key)); |
There was a problem hiding this comment.
Nit: I dont think we need a clone for every iteration. How about creating a clone outside of this loop.
| return exec_err!("map_extract expects two arguments"); | ||
| } | ||
| let map_type = &arg_types[0]; | ||
| let map_fields = get_map_entry_field(map_type)?; |
There was a problem hiding this comment.
It seems like it is better to return data type directly?
There was a problem hiding this comment.
The value type is not List, so we cannot return it directly.
Co-authored-by: Jay Zhan <jayzhan211@gmail.com>
|
Thanks All |
Which issue does this PR close?
Closes #11935
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?