Skip to content

The struct value should not have duplicate and null name #11438

@goldmedal

Description

@goldmedal

Is your feature request related to a problem or challenge?

As discussed in #11361 (comment), I file this issue.
In most databases, the struct type (aka row or record type) doesn't allows the duplicate field name and null name. However, both of them are allowed in DataFusion:

query I
select {'scalar': 27, null: 1, 'null': NULL}['null'];
----
1

query I
select {'scalar': 27, 'scalar': 1, 'null': NULL}['scalar'];
----
27

They cause some weird behaviors if created from duplicate or null names.

Similar behaviors are not allowed by other databases (e.g. DuckDB):

D select {'1':1, '1':1};
Binder Error: Duplicate struct entry name "1"
D select {'1':1, null:1};
Parser Error: syntax error at or near "null"
LINE 1: select {'1':1, null:1};
                       ^

As @alamb mentioned #11361 (comment), the spec of StructArray doesn't say anything about those limitations. We might need to handle this behavior in DataFusion.

Describe the solution you'd like

We should check if the duplicate or null name exists when invoking named_struct.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions