Skip to content

fix: avoid empty range reads for zero-length blobs#6168

Merged
Xuanwo merged 2 commits intolance-format:mainfrom
Xuanwo:xuanwo/fix-blob-empty-range-read
Mar 12, 2026
Merged

fix: avoid empty range reads for zero-length blobs#6168
Xuanwo merged 2 commits intolance-format:mainfrom
Xuanwo:xuanwo/fix-blob-empty-range-read

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented Mar 11, 2026

Blob reads should return empty bytes when the logical blob is empty or the cursor is already at EOF. Today BlobFile::read / read_up_to can still issue a get_range(start..end) request with start == end, which is tolerated by local readers but rejected by cloud object stores.

This showed up while investigating random_blob failures on the original-scale laion10m-full dataset, where legacy blob reads on S3 failed with errors like Range started at 1 and ended at 1. The fix short-circuits empty reads and restores the cursor to blob-relative semantics after read(), and adds regression coverage for both the empty-range case and packed-blob cursor behavior.

@github-actions github-actions Bot added the bug Something isn't working label Mar 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR Review

Clean, well-motivated bugfix. The core changes are correct:

  1. read() cursor fix — The old code set the cursor to position + size (an absolute file offset), but do_with_reader stores the value as a blob-relative cursor. This meant a second read() on a packed blob would compute start = position + (position + size), reading from the wrong offset. Returning size is correct.

  2. Empty-range short-circuits — Correctly avoids issuing get_range(n..n) which cloud stores reject.

One minor note: In read_up_to, the second guard if read_size == 0 (after computing len.min((size - cursor) as usize)) is unreachable — the first guard already ensures cursor < size (so size - cursor >= 1) and len > 0. Not a bug, just dead code. Up to you whether to remove it.

Tests look good — the RejectEmptyRangeObjectStore mock directly reproduces the S3 failure mode, and test_blob_file_read_tracks_relative_cursor validates the cursor semantics on a real file.

LGTM

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 11, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 31 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/blob.rs 50.00% 29 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

Comment thread rust/lance/src/index/vector/ivf.rs Outdated
Comment thread rust/lance/src/index/vector/ivf.rs Outdated
@Xuanwo Xuanwo mentioned this pull request Mar 12, 2026
Xuanwo added a commit that referenced this pull request Mar 12, 2026
This PR moves a few unrelated clippy cleanups out of #6168 so the blob
empty-range fix can stay focused on the regression it addresses. The
changes here are all mechanical simplifications with no intended
behavior change.
@Xuanwo Xuanwo force-pushed the xuanwo/fix-blob-empty-range-read branch 3 times, most recently from 65b6460 to de96032 Compare March 12, 2026 09:02
Short-circuit empty blob reads and keep blob cursors relative to the blob size.
@Xuanwo Xuanwo force-pushed the xuanwo/fix-blob-empty-range-read branch from de96032 to ea2de4b Compare March 12, 2026 09:44
@Xuanwo Xuanwo merged commit 58f3b84 into lance-format:main Mar 12, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants