feat(core): Support file deletion in incremental scan#4
Conversation
vustef
left a comment
There was a problem hiding this comment.
Thanks Gerald. A few clarifications. Very nice tests btw!
There was a problem hiding this comment.
are row numbers supposed to be 0-based?
There was a problem hiding this comment.
Hmm good question. In the positional delete files they are, so I did this for consistency.
| vec![], // No appends (file was added and fully deleted) | ||
| vec![ | ||
| // Positions from positional deletes (snapshot 3): 0, 1, 2 | ||
| // The deleted file task (which would emit 0, 1, 2) is NOT emitted |
There was a problem hiding this comment.
Would it make sense to modify the test so that it would emit different thing for positional deletes vs for deleted fiel task?
There was a problem hiding this comment.
I somewhat guess I'd have to expose this in the public api somehow, which I do not really want to.
There was a problem hiding this comment.
You can just change the test so that it fails if it returns positional deletes only (e.g. delete only row 0 with positional deletes)
Closes RAI-43291
Support Deleted Files in Incremental Scans
Summary
This PR implements support for deleted data files in incremental scans by introducing streaming logic for
IncrementalFileScanTask::Delete. Previously, encountering a deleted file during an incremental scan would result in a "Feature Unsupported" error. Now, deleted files are properly handled by emitting delete records for all positions in the file (0 to N-1, where N is the total record count).Changes
Implementation
iceberg-rust/crates/iceberg/src/arrow/incremental.rsNew Helper Function:
process_incremental_deleted_file_task(lines 314-353)total_records - 1(0-indexed)poscolumn (UInt64) and_filecolumn (file path)Updated
IncrementalFileScanTask::DeleteMatch Arm (lines 126-161)deleted_file_task.base.record_countprocess_incremental_deleted_file_taskto generate delete recordsdeletes_txchannel (correct channel for delete records)Tests
iceberg-rust/crates/iceberg/src/scan/incremental/tests.rsThree comprehensive tests were added/updated to verify the implementation:
test_incremental_scan_append_then_delete_file(lines 1591-1667)test_incremental_scan_positional_deletes_then_file_delete(lines 1669-1756)test_incremental_scan_with_deleted_files_cancellation(updated, lines 1758+)Key Design Decisions
Position Indexing
Delete records use 0-indexed positions (0 to N-1) to match Iceberg's positional delete semantics and ensure consistency with existing positional delete handling.
No File I/O Required
The implementation generates delete records without reading the actual deleted file. It only uses the
record_countmetadata from the manifest entry, making it efficient.Interaction with Positional Deletes
The existing cancellation logic in the incremental scan planner (at
mod.rs:481) prevents double-deletes by filtering positional deletes when appropriate:Double-Delete Behavior
The system may emit "redundant" deletes in certain scenarios. For example:
This is expected behavior because incremental scans are stateless with respect to individual record states - they only track file-level changes between snapshots.