Skip to content

Be more permissive with nullability in merge_insert #4518

@wjones127

Description

@wjones127

If a dataset has non-nullable columns, and we pass data with nullable fields but no actual nulls, we should be able to write without error. We support this in insert(), but it doesn't not work with merge_insert() right now.

import pyarrow as pa
import lance

schema = pa.schema(
    [
        pa.field("id", pa.int64(), nullable=False),
        pa.field("value", pa.int64(), nullable=False),
    ]
)
data = pa.table({"id": [1, 2, 3], "value": [0, 0, 0]}, schema=schema)

ds = lance.write_dataset(data, "memory://")

# Nullable, but no actual nulls
new_schema = pa.schema(
    [
        pa.field("id", pa.int64(), nullable=True),
        pa.field("value", pa.int64(), nullable=True),
    ]
)
new_data = pa.table({"id": [4, 5, 6], "value": [10, 20, 30]}, schema=new_schema)

ds.insert(new_data)  # Works

# Fails:
(
    ds.merge_insert("id")
    .when_matched_update_all()
    .when_not_matched_insert_all()
    .execute(new_data)
)

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggood first issueGood for newcomers

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions