Design
When compaction happens, the row addresses change. To be able to handle row-level conflicts, we will need compaction to emit a change_of_address mapping.
Update and delete transactions generate an affected_rows bitmap that tell them which row addresses need to be masked from the table. When they rebase on top of a compaction, they need run their affected_rows bitmap through the change_of_address mapping.
When compaction rebases on top of update/delete transactions, it needs to run the new deletion files through it's change_of_address mapping to generate new deletion files for the compacted data. Rows that were originally deleted and compacted out of the data will not appear in the mapping and will be dropped. But rows that have been deleted after will be mapped to an actual location.
TODO: update/delete then compact
Design
When compaction happens, the row addresses change. To be able to handle row-level conflicts, we will need compaction to emit a
change_of_addressmapping.Update and delete transactions generate an
affected_rowsbitmap that tell them which row addresses need to be masked from the table. When they rebase on top of a compaction, they need run theiraffected_rowsbitmap through thechange_of_addressmapping.When compaction rebases on top of update/delete transactions, it needs to run the new deletion files through it's
change_of_addressmapping to generate new deletion files for the compacted data. Rows that were originally deleted and compacted out of the data will not appear in the mapping and will be dropped. But rows that have been deleted after will be mapped to an actual location.TODO:
update/deletethencompactchange_of_address()parameter / method to theCommitBuilderrow_id_mapor thechanged_row_addrhttps://github.com/lancedb/lance/blob/0fad40d676b24d9355a6b2dbe626111684f1696a/rust/lance/src/dataset/optimize.rs#L590-L596 field to thechange_of_address()mapping.plan_compactionthenCompactionTask.execute())