Skip to content

Type Coercion fails for List with inner type struct which has large/view types #14154

@ion-elgreco

Description

@ion-elgreco

Describe the bug

A LargeList(Struct({"foo": LargeUtf8})) cannot be coerced to List(Struct({"foo": Utf8})). It however it works fine for LargeList(LargeUtf8) -> List(Utf8) and Struct({"foo": LargeUtf8}) -> Struct({"foo": Utf8}).

To Reproduce

import polars as pl
from deltalake import DeltaTable

tmp_path = "test_table__"
df = pl.DataFrame({"foo": [1], "bar": [[{"foo": "!"}]]})
df.write_delta(tmp_path, mode="overwrite", overwrite_schema=True)

DeltaTable(tmp_path).merge(
    df.to_arrow(compat_level=1),
    predicate="s.foo = t.foo",
    source_alias="s",
    target_alias="t",
    large_dtypes=None,
).when_matched_update_all().execute()
DeltaError: Generic DeltaTable error: type_coercion
caused by
Error during planning: Failed to coerce then ([LargeList(Field { name: "item", data_type: Struct([Field { name: "foo", data_type: Utf8View, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "element", data_type: Struct([Field { name: "foo", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "element", data_type: Struct([Field { name: "foo", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "element", data_type: Struct([Field { name: "foo", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })]) and else (None) to common types in CASE WHEN expression

Expected behavior

Be able to coerce Large/view and normal arrow types in deeply nested types.

Additional context

Luckly we still can downcast in python using the large_dtypes=False, but datafusion should be able to coerce any deeply nested dtype.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingregressionSomething that used to work no longer does

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions