fix: filter stale row IDs in TakeExec for FTS/vector after delete#6042
fix: filter stale row IDs in TakeExec for FTS/vector after delete#6042Xuanwo merged 4 commits intolance-format:mainfrom
Conversation
When stable row IDs are enabled, FTS and vector indexes may return row IDs for rows that have since been deleted. The row ID index excludes deleted rows, so get_row_addrs() would silently drop these entries via filter_map, producing an addresses array shorter than the input batch. The downstream merge_with_schema then failed with "Attempt to merge two RecordBatch with different sizes". Fix: track which row IDs are valid in get_row_addrs() and return a validity mask. In map_batch(), filter the input batch to remove rows whose IDs no longer exist before merging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR Review: fix: filter stale row IDs in TakeExec for FTS/vector after deleteNo P0/P1 issues found. The fix is correct and well-targeted. SummaryThe root cause is clear: Minor observations (not blocking)
LGTM |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Xuanwo
left a comment
There was a problem hiding this comment.
Thank you for working on this!
# Conflicts: # rust/lance/tests/query/inverted.rs
When stable row IDs are enabled, FTS and vector indexes may return row IDs for rows that have since been deleted. The row ID index excludes deleted rows, so get_row_addrs() would silently drop these entries via filter_map, producing an addresses array shorter than the input batch. The downstream merge_with_schema then failed with "Attempt to merge two RecordBatch with different sizes".
Fix: track which row IDs are valid in get_row_addrs() and return a validity mask. In map_batch(), filter the input batch to remove rows whose IDs no longer exist before merging.