Skip to content

perf: speed up format v2.2 scans by adding shortcut for full page#5981

Merged
Xuanwo merged 11 commits intomainfrom
xuanwo/fix-full-scan-rep-index
Feb 27, 2026
Merged

perf: speed up format v2.2 scans by adding shortcut for full page#5981
Xuanwo merged 11 commits intomainfrom
xuanwo/fix-full-scan-rep-index

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented Feb 22, 2026

This PR addresses a long-standing issue in the Lance file format (both v2.1 and v2.2) where the rep index must be loaded before reading any full-zipped values. This could cause serious HoL blocking, especially when data is stored on low-latency, high-throughput services like S3.

@westonpace previously reported this issue at #3579. While we already support user requests to cache this index, I implemented that feature. Now I believe we should always cache it by default, as it is low-cost and highly beneficial.

Based on a full scan test using the FineWeb dataset, I observed the following improvements:

Local dataset

  • p50: 900,363 µs → 542,266 µs (1.66x faster, –39.8%)
  • p95: 951,820 µs → 579,505 µs (1.64x faster)
  • p99: 994,691 µs → 713,062 µs (1.40x faster)

S3 dataset

  • p50: 3,981,524 µs → 990,825 µs (4.02x faster, –75.1%)
  • p95: 4,056,506 µs → 1,124,499 µs (3.61x faster)
  • p99: 4,106,640 µs → 1,207,027 µs (3.40x faster)

Additionally, the rep index cache grows linearly; for 200k rows it occupies about 1.6 MiB. This cache will be managed by our global metadata cache. So I think it's totally ok for us to handle it.


This PR includes the following changes:

  • always cache the repetition index when present and populate cached_state immediately
  • split io submission so cached paths submit reads before awaiting and keep non-cached behavior for fallback
  • drop unused cache flag/parameter plumbing and update full zip cache test expectations

Parts of this PR were drafted with assistance from Codex (with gpt-5.3-codex), amp (with claude-4.6) and fully reviewed and edited by me. I take full responsibility for all changes.

@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@Xuanwo Xuanwo changed the title Refine FullZip repetition index scheduling and caching perf: refine FullZip repetition index scheduling and caching Feb 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR Review: perf: refine FullZip repetition index scheduling and caching

Overall: Clean refactoring that simplifies the API and improves I/O scheduling. No P0/P1 issues found.

Summary of Changes

  • Removes the enable_cache flag and cache_repetition_index parameter - repetition index is now always cached when present
  • Optimizes I/O in the cached path by submitting data requests before awaiting
  • Sets cached_state directly during initialize() for immediate availability

Minor Observations

  1. I/O pipelining improvement: In the cached path, io.submit_request(byte_ranges, priority) is now called synchronously before the async block, with the await happening inside. This is a valid optimization that fires I/O earlier. Good change.

  2. Lifetime handling: The removal of ranges.to_vec() is correct - all uses of ranges (extract_byte_ranges_from_cached, compute_rep_index_ranges, num_rows calculation) happen before the async block begins, so no ownership issues.

  3. Test updates: Tests correctly updated to expect FullZipCacheableState instead of NoCachedPageData when rep_index is present.

LGTM 👍

@Xuanwo Xuanwo changed the title perf: refine FullZip repetition index scheduling and caching perf: speed up format v2.2 scan 400% by always cache fullzip rep index Feb 22, 2026
@Xuanwo Xuanwo changed the title perf: speed up format v2.2 scan 400% by always cache fullzip rep index perf: speed up format v2.2 scan 400% by always cache fullzip rep index Feb 22, 2026
@Xuanwo Xuanwo changed the title perf: speed up format v2.2 scan 400% by always cache fullzip rep index perf: speed up format v2.2 scans by always caching fullzip rep index Feb 22, 2026
Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we at least leave in the option to disable it?

Additionally, the rep index cache grows linearly; for 200k rows it occupies about 1.6 MiB. This cache will be managed by our global metadata cache. So I think it's totally ok for us to handle it.

I am a little bit worried about this. If we have billions of rows in a dataset won't that mean we need GBs of RAM (per string/binary column)?

If the goal is to improve full scan performance I think there is potentially another way. If we know we are going to load an entire page then we could shortcut and just read the entire page. Then schedule_ranges could receive a special reader that just returns slices of the page data. This way we avoid needing two stages of I/O for full scans.

@Xuanwo
Copy link
Copy Markdown
Collaborator Author

Xuanwo commented Feb 23, 2026

Can we at least leave in the option to disable it?

Yep, will do.

If we know we are going to load an entire page then we could shortcut and just read the entire page. Then schedule_ranges could receive a special reader that just returns slices of the page data. This way we avoid needing two stages of I/O for full scans.

Seems to be an interesting idea, will give it a try first.

@Xuanwo
Copy link
Copy Markdown
Collaborator Author

Xuanwo commented Feb 24, 2026

Hi @westonpace, I added a FullZipReadSource that works really well. Thank you for your suggestion! Now we don't need to cache the data, but we can still achieve similar performance improvements.

The latest bench result:

  • local p50:874138.5us -> 591128.0us,1.479x(-32.38%)
  • local p95:937716.5us -> 632214.5us,1.483x(-32.58%)
  • s3 p50:4027359.0us -> 1013433.5us,3.974x(-74.84%)
  • s3 p95:4116329.5us -> 1091348.0us,3.772x(-73.49%)
  • s3 p99:6182523.0us -> 1137305.0us,5.436x(-81.60%)

Compared to previous impls:

  • local p50: 542,265.5us -> 591,128us,+9.01%
  • local p95: 579,505us -> 632,214.5us,+9.10%
  • local p99: 713,061.5us -> 720,853.5us,+1.09%
  • s3 p50: 990,825us -> 1,013,433.5us,+2.28%
  • s3 p95: 1,124,499us -> 1,091,348us,-2.95%
  • s3 p99: 1,207,027us -> 1,137,305us,-5.78%

A bit slower on local but I think that's fine.

@Xuanwo Xuanwo changed the title perf: speed up format v2.2 scans by always caching fullzip rep index perf: speed up format v2.2 scans by adding shortcut for full page Feb 24, 2026
Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the shortcut with FullZipReadSource, that looks great. Do we need to get rid of the caching ability though? I think it still might be useful for random access cases.

/// Cached state containing the decoded repetition index
cached_state: Option<Arc<FullZipCacheableState>>,
/// Whether to enable caching of repetition indices
enable_cache: bool,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why get rid of this? It might still be useful for users that want 1 IOP random access on relatively small amounts of data?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to always enable cache before and now we have a shortcut, we can enable the flag back.,

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 26, 2026

Codecov Report

❌ Patch coverage is 84.63950% with 49 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
.../lance-encoding/src/encodings/logical/primitive.rs 84.63% 42 Missing and 7 partials ⚠️

📢 Thoughts on this report? Let us know!

…rep-index

# Conflicts:
#	rust/lance-encoding/src/encodings/logical/primitive.rs
Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for bearing with the review! Also, I'm really happy to see this fix get in, I always felt it was kind of embarrassing we were doing two IOPS in this full page case 😰. I just never got around to fixing it 😆

@Xuanwo Xuanwo merged commit e25f169 into main Feb 27, 2026
29 checks passed
@Xuanwo Xuanwo deleted the xuanwo/fix-full-scan-rep-index branch February 27, 2026 13:06
wjones127 pushed a commit to wjones127/lance that referenced this pull request Mar 4, 2026
…nce-format#5981)

This PR addresses a long-standing issue in the Lance file format (both
v2.1 and v2.2) where the rep index must be loaded before reading any
full-zipped values. This could cause serious HoL blocking, especially
when data is stored on low-latency, high-throughput services like S3.

@westonpace previously reported this issue at
lance-format#3579. While we already
support user requests to cache this index, I implemented that feature.
Now I believe we should always cache it by default, as it is low-cost
and highly beneficial.

Based on a full scan test using the FineWeb dataset, I observed the
following improvements:

**Local dataset**
- p50: 900,363 µs → 542,266 µs (1.66x faster, –39.8%)
- p95: 951,820 µs → 579,505 µs (1.64x faster)
- p99: 994,691 µs → 713,062 µs (1.40x faster)

**S3 dataset**
- p50: 3,981,524 µs → 990,825 µs (4.02x faster, –75.1%)
- p95: 4,056,506 µs → 1,124,499 µs (3.61x faster)
- p99: 4,106,640 µs → 1,207,027 µs (3.40x faster)

Additionally, the rep index cache grows linearly; for 200k rows it
occupies about 1.6 MiB. This cache will be managed by our global
metadata cache. So I think it's totally ok for us to handle it.

---

This PR includes the following changes:

- always cache the repetition index when present and populate
`cached_state` immediately
- split io submission so cached paths submit reads before awaiting and
keep non-cached behavior for fallback
- drop unused cache flag/parameter plumbing and update full zip cache
test expectations


---

**Parts of this PR were drafted with assistance from Codex (with
`gpt-5.3-codex`), amp (with `claude-4.6`) and fully reviewed and edited
by me. I take full responsibility for all changes.**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants