Skip to content

fix: take_blobs_by_indices fails with stable row IDs on fragment 1+#5392

Merged
jackye1995 merged 6 commits intolance-format:mainfrom
jmhsieh:jmhsieh/fix-stable_row_id-blobs
Dec 4, 2025
Merged

fix: take_blobs_by_indices fails with stable row IDs on fragment 1+#5392
jackye1995 merged 6 commits intolance-format:mainfrom
jmhsieh:jmhsieh/fix-stable_row_id-blobs

Conversation

@jmhsieh
Copy link
Copy Markdown
Contributor

@jmhsieh jmhsieh commented Dec 2, 2025

When enable_stable_row_ids=true, take_blobs_by_indices was failing with "index out of bounds" for rows in fragment 1+. The bug was caused by passing row addresses to blob::take_blobs which expected row IDs.

Root cause:

  • take_blobs_by_indices converts indices to row addresses
  • It passed these addresses to take_blobs which calls take_builder
  • TakeBuilder.get_row_addrs() looked up the values in the row ID index
  • For fragment 0: addresses (0,1,2) matched row IDs (0,1,2) by accident
  • For fragment 1+: addresses (4294967296+) didn't match any row IDs
  • This caused empty results and missing _rowaddr column → panic

Fix:

  • Add take_blobs_by_addresses() that uses TakeBuilder::try_new_from_addresses to bypass the row ID index lookup
  • Update take_blobs_by_indices to call the new function
  • Add defensive fix in do_take_rows to include _rowaddr column in empty batches

🤖 Generated with Claude Code

When `enable_stable_row_ids=true`, `take_blobs_by_indices` was failing
with "index out of bounds" for rows in fragment 1+. The bug was caused
by passing row addresses to `blob::take_blobs` which expected row IDs.

Root cause:
- `take_blobs_by_indices` converts indices to row addresses
- It passed these addresses to `take_blobs` which calls `take_builder`
- `TakeBuilder.get_row_addrs()` looked up the values in the row ID index
- For fragment 0: addresses (0,1,2) matched row IDs (0,1,2) by accident
- For fragment 1+: addresses (4294967296+) didn't match any row IDs
- This caused empty results and missing `_rowaddr` column → panic

Fix:
- Add `take_blobs_by_addresses()` that uses `TakeBuilder::try_new_from_addresses`
  to bypass the row ID index lookup
- Update `take_blobs_by_indices` to call the new function
- Add defensive fix in `do_take_rows` to include `_rowaddr` column in empty batches

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions github-actions Bot added the bug Something isn't working label Dec 2, 2025
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 2, 2025

Codecov Report

❌ Patch coverage is 68.88889% with 28 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/blob.rs 78.37% 7 Missing and 9 partials ⚠️
rust/lance/src/dataset.rs 12.50% 7 Missing ⚠️
rust/lance/src/dataset/take.rs 37.50% 4 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread rust/lance/src/dataset.rs Outdated
Comment thread rust/lance/src/dataset/blob.rs Outdated
jmhsieh and others added 2 commits December 2, 2025 19:08
Address PR feedback:
- Make take_blobs_by_addresses public (was pub(super))
- Add Dataset::take_blobs_by_addresses public method
- Simplify take_blobs_by_indices doc to remove internal details
- Add proper public documentation for take_blobs_by_addresses

This allows callers to use row addresses directly when they already
have them, providing flexibility alongside take_blobs (for row IDs)
and take_blobs_by_indices (for row indices/offsets).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@jmhsieh jmhsieh marked this pull request as ready for review December 3, 2025 04:22
@jmhsieh jmhsieh requested a review from jackye1995 December 3, 2025 04:22
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

@jackye1995
Copy link
Copy Markdown
Contributor

@jmhsieh looks like there is a merge conflict, could you resolve?

# Conflicts:
#	rust/lance/src/dataset/blob.rs
@jmhsieh
Copy link
Copy Markdown
Contributor Author

jmhsieh commented Dec 3, 2025

@jmhsieh looks like there is a merge conflict, could you resolve?

updated and giving a go

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@jackye1995 jackye1995 merged commit 88c7fcb into lance-format:main Dec 4, 2025
26 checks passed
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Dec 5, 2025
…ance-format#5392)

When `enable_stable_row_ids=true`, `take_blobs_by_indices` was failing
with "index out of bounds" for rows in fragment 1+. The bug was caused
by passing row addresses to `blob::take_blobs` which expected row IDs.

Root cause:
- `take_blobs_by_indices` converts indices to row addresses
- It passed these addresses to `take_blobs` which calls `take_builder`
- `TakeBuilder.get_row_addrs()` looked up the values in the row ID index
- For fragment 0: addresses (0,1,2) matched row IDs (0,1,2) by accident
- For fragment 1+: addresses (4294967296+) didn't match any row IDs
- This caused empty results and missing `_rowaddr` column → panic

Fix:
- Add `take_blobs_by_addresses()` that uses
`TakeBuilder::try_new_from_addresses` to bypass the row ID index lookup
- Update `take_blobs_by_indices` to call the new function
- Add defensive fix in `do_take_rows` to include `_rowaddr` column in
empty batches

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Dec 5, 2025
…ance-format#5392)

When `enable_stable_row_ids=true`, `take_blobs_by_indices` was failing
with "index out of bounds" for rows in fragment 1+. The bug was caused
by passing row addresses to `blob::take_blobs` which expected row IDs.

Root cause:
- `take_blobs_by_indices` converts indices to row addresses
- It passed these addresses to `take_blobs` which calls `take_builder`
- `TakeBuilder.get_row_addrs()` looked up the values in the row ID index
- For fragment 0: addresses (0,1,2) matched row IDs (0,1,2) by accident
- For fragment 1+: addresses (4294967296+) didn't match any row IDs
- This caused empty results and missing `_rowaddr` column → panic

Fix:
- Add `take_blobs_by_addresses()` that uses
`TakeBuilder::try_new_from_addresses` to bypass the row ID index lookup
- Update `take_blobs_by_indices` to call the new function
- Add defensive fix in `do_take_rows` to include `_rowaddr` column in
empty batches

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
jackye1995 pushed a commit that referenced this pull request Dec 5, 2025
…5392)

When `enable_stable_row_ids=true`, `take_blobs_by_indices` was failing
with "index out of bounds" for rows in fragment 1+. The bug was caused
by passing row addresses to `blob::take_blobs` which expected row IDs.

Root cause:
- `take_blobs_by_indices` converts indices to row addresses
- It passed these addresses to `take_blobs` which calls `take_builder`
- `TakeBuilder.get_row_addrs()` looked up the values in the row ID index
- For fragment 0: addresses (0,1,2) matched row IDs (0,1,2) by accident
- For fragment 1+: addresses (4294967296+) didn't match any row IDs
- This caused empty results and missing `_rowaddr` column → panic

Fix:
- Add `take_blobs_by_addresses()` that uses
`TakeBuilder::try_new_from_addresses` to bypass the row ID index lookup
- Update `take_blobs_by_indices` to call the new function
- Add defensive fix in `do_take_rows` to include `_rowaddr` column in
empty batches

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
…ance-format#5392)

When `enable_stable_row_ids=true`, `take_blobs_by_indices` was failing
with "index out of bounds" for rows in fragment 1+. The bug was caused
by passing row addresses to `blob::take_blobs` which expected row IDs.

Root cause:
- `take_blobs_by_indices` converts indices to row addresses
- It passed these addresses to `take_blobs` which calls `take_builder`
- `TakeBuilder.get_row_addrs()` looked up the values in the row ID index
- For fragment 0: addresses (0,1,2) matched row IDs (0,1,2) by accident
- For fragment 1+: addresses (4294967296+) didn't match any row IDs
- This caused empty results and missing `_rowaddr` column → panic

Fix:
- Add `take_blobs_by_addresses()` that uses
`TakeBuilder::try_new_from_addresses` to bypass the row ID index lookup
- Update `take_blobs_by_indices` to call the new function
- Add defensive fix in `do_take_rows` to include `_rowaddr` column in
empty batches

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants