If a dataset has non-nullable columns, and we pass data with nullable fields but no actual nulls, we should be able to write without error. We support this in insert(), but it doesn't not work with merge_insert() right now.
import pyarrow as pa
import lance
schema = pa.schema(
[
pa.field("id", pa.int64(), nullable=False),
pa.field("value", pa.int64(), nullable=False),
]
)
data = pa.table({"id": [1, 2, 3], "value": [0, 0, 0]}, schema=schema)
ds = lance.write_dataset(data, "memory://")
# Nullable, but no actual nulls
new_schema = pa.schema(
[
pa.field("id", pa.int64(), nullable=True),
pa.field("value", pa.int64(), nullable=True),
]
)
new_data = pa.table({"id": [4, 5, 6], "value": [10, 20, 30]}, schema=new_schema)
ds.insert(new_data) # Works
# Fails:
(
ds.merge_insert("id")
.when_matched_update_all()
.when_not_matched_insert_all()
.execute(new_data)
)
If a dataset has non-nullable columns, and we pass data with nullable fields but no actual nulls, we should be able to write without error. We support this in
insert(), but it doesn't not work withmerge_insert()right now.