Skip to content

Auto conflict resolution for update/delete then compact #4604

@wjones127

Description

@wjones127

Design

When compaction happens, the row addresses change. To be able to handle row-level conflicts, we will need compaction to emit a change_of_address mapping.

Update and delete transactions generate an affected_rows bitmap that tell them which row addresses need to be masked from the table. When they rebase on top of a compaction, they need run their affected_rows bitmap through the change_of_address mapping.

When compaction rebases on top of update/delete transactions, it needs to run the new deletion files through it's change_of_address mapping to generate new deletion files for the compacted data. Rows that were originally deleted and compacted out of the data will not appear in the mapping and will be dropped. But rows that have been deleted after will be mapped to an actual location.

TODO: update/delete then compact

  • Add change_of_address() parameter / method to the CommitBuilder
  • Wire the row_id_map or the changed_row_addr https://github.com/lancedb/lance/blob/0fad40d676b24d9355a6b2dbe626111684f1696a/rust/lance/src/dataset/optimize.rs#L590-L596 field to the change_of_address() mapping.
  • Implement the rebase with the change of address, using the map to adjust the deletion vectors.
  • Write a test validating that if we commit update prior to compaction, we can successfully rebase the compaction
    • Run compaction without committing (plan_compaction then CompactionTask.execute())
    • Run update query to update one row
    • Run delete query to delete one row
    • Commit compaction
    • Validate the row that was deleted is now gone
    • Validate the row that was updated has new value and old value doesn't exist

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions