fix: take_blobs_by_indices fails with stable row IDs on fragment 1+#5392
Merged
jackye1995 merged 6 commits intolance-format:mainfrom Dec 4, 2025
Merged
Conversation
When `enable_stable_row_ids=true`, `take_blobs_by_indices` was failing with "index out of bounds" for rows in fragment 1+. The bug was caused by passing row addresses to `blob::take_blobs` which expected row IDs. Root cause: - `take_blobs_by_indices` converts indices to row addresses - It passed these addresses to `take_blobs` which calls `take_builder` - `TakeBuilder.get_row_addrs()` looked up the values in the row ID index - For fragment 0: addresses (0,1,2) matched row IDs (0,1,2) by accident - For fragment 1+: addresses (4294967296+) didn't match any row IDs - This caused empty results and missing `_rowaddr` column → panic Fix: - Add `take_blobs_by_addresses()` that uses `TakeBuilder::try_new_from_addresses` to bypass the row ID index lookup - Update `take_blobs_by_indices` to call the new function - Add defensive fix in `do_take_rows` to include `_rowaddr` column in empty batches 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
jackye1995
reviewed
Dec 3, 2025
jackye1995
reviewed
Dec 3, 2025
Address PR feedback: - Make take_blobs_by_addresses public (was pub(super)) - Add Dataset::take_blobs_by_addresses public method - Simplify take_blobs_by_indices doc to remove internal details - Add proper public documentation for take_blobs_by_addresses This allows callers to use row addresses directly when they already have them, providing flexibility alongside take_blobs (for row IDs) and take_blobs_by_indices (for row indices/offsets). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Contributor
|
@jmhsieh looks like there is a merge conflict, could you resolve? |
# Conflicts: # rust/lance/src/dataset/blob.rs
Contributor
Author
updated and giving a go |
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
jackye1995
pushed a commit
to jackye1995/lance
that referenced
this pull request
Dec 5, 2025
…ance-format#5392) When `enable_stable_row_ids=true`, `take_blobs_by_indices` was failing with "index out of bounds" for rows in fragment 1+. The bug was caused by passing row addresses to `blob::take_blobs` which expected row IDs. Root cause: - `take_blobs_by_indices` converts indices to row addresses - It passed these addresses to `take_blobs` which calls `take_builder` - `TakeBuilder.get_row_addrs()` looked up the values in the row ID index - For fragment 0: addresses (0,1,2) matched row IDs (0,1,2) by accident - For fragment 1+: addresses (4294967296+) didn't match any row IDs - This caused empty results and missing `_rowaddr` column → panic Fix: - Add `take_blobs_by_addresses()` that uses `TakeBuilder::try_new_from_addresses` to bypass the row ID index lookup - Update `take_blobs_by_indices` to call the new function - Add defensive fix in `do_take_rows` to include `_rowaddr` column in empty batches 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>
jackye1995
pushed a commit
to jackye1995/lance
that referenced
this pull request
Dec 5, 2025
…ance-format#5392) When `enable_stable_row_ids=true`, `take_blobs_by_indices` was failing with "index out of bounds" for rows in fragment 1+. The bug was caused by passing row addresses to `blob::take_blobs` which expected row IDs. Root cause: - `take_blobs_by_indices` converts indices to row addresses - It passed these addresses to `take_blobs` which calls `take_builder` - `TakeBuilder.get_row_addrs()` looked up the values in the row ID index - For fragment 0: addresses (0,1,2) matched row IDs (0,1,2) by accident - For fragment 1+: addresses (4294967296+) didn't match any row IDs - This caused empty results and missing `_rowaddr` column → panic Fix: - Add `take_blobs_by_addresses()` that uses `TakeBuilder::try_new_from_addresses` to bypass the row ID index lookup - Update `take_blobs_by_indices` to call the new function - Add defensive fix in `do_take_rows` to include `_rowaddr` column in empty batches 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>
jackye1995
pushed a commit
that referenced
this pull request
Dec 5, 2025
…5392) When `enable_stable_row_ids=true`, `take_blobs_by_indices` was failing with "index out of bounds" for rows in fragment 1+. The bug was caused by passing row addresses to `blob::take_blobs` which expected row IDs. Root cause: - `take_blobs_by_indices` converts indices to row addresses - It passed these addresses to `take_blobs` which calls `take_builder` - `TakeBuilder.get_row_addrs()` looked up the values in the row ID index - For fragment 0: addresses (0,1,2) matched row IDs (0,1,2) by accident - For fragment 1+: addresses (4294967296+) didn't match any row IDs - This caused empty results and missing `_rowaddr` column → panic Fix: - Add `take_blobs_by_addresses()` that uses `TakeBuilder::try_new_from_addresses` to bypass the row ID index lookup - Update `take_blobs_by_indices` to call the new function - Add defensive fix in `do_take_rows` to include `_rowaddr` column in empty batches 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>
jackye1995
pushed a commit
to jackye1995/lance
that referenced
this pull request
Jan 21, 2026
…ance-format#5392) When `enable_stable_row_ids=true`, `take_blobs_by_indices` was failing with "index out of bounds" for rows in fragment 1+. The bug was caused by passing row addresses to `blob::take_blobs` which expected row IDs. Root cause: - `take_blobs_by_indices` converts indices to row addresses - It passed these addresses to `take_blobs` which calls `take_builder` - `TakeBuilder.get_row_addrs()` looked up the values in the row ID index - For fragment 0: addresses (0,1,2) matched row IDs (0,1,2) by accident - For fragment 1+: addresses (4294967296+) didn't match any row IDs - This caused empty results and missing `_rowaddr` column → panic Fix: - Add `take_blobs_by_addresses()` that uses `TakeBuilder::try_new_from_addresses` to bypass the row ID index lookup - Update `take_blobs_by_indices` to call the new function - Add defensive fix in `do_take_rows` to include `_rowaddr` column in empty batches 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When
enable_stable_row_ids=true,take_blobs_by_indiceswas failing with "index out of bounds" for rows in fragment 1+. The bug was caused by passing row addresses toblob::take_blobswhich expected row IDs.Root cause:
take_blobs_by_indicesconverts indices to row addressestake_blobswhich callstake_builderTakeBuilder.get_row_addrs()looked up the values in the row ID index_rowaddrcolumn → panicFix:
take_blobs_by_addresses()that usesTakeBuilder::try_new_from_addressesto bypass the row ID index lookuptake_blobs_by_indicesto call the new functiondo_take_rowsto include_rowaddrcolumn in empty batches🤖 Generated with Claude Code