perf: add a chunk cache to avoid decoding duplicated miniblock chunks by niyue · Pull Request #4846 · lance-format/lance

niyue · 2025-09-30T04:26:30Z

Description

When miniblock encoding is used in a Lance file, reading the file with the v2 FileReader via the read_stream_projected API can become inefficient if the provided ReadBatchParams::Indices contains many nearby but non-contiguous row indices.
For example:

29, 168, 180, 194, 376, 559, 574, 665, 666, 667, ..., 968, 969, 970, 973, 975, ...

This kind of access pattern causes the same chunk to be decoded repeatedly, resulting in slow performance and high CPU usage.

Solution

This PR introduces a lightweight single-entry cache in DecodePageTask. While it only helps when chunks are accessed in a somewhat sequential manner, row indices are typically sorted in ascending order, so the cache strikes a balance between saving memory and improving performance.

Test

On a local setup with a Lance file containing 100k rows (each row with a text column of 200+ bytes):

Reading 1700+ nearby but non-contiguous rows at random
zstd is used for general compression
With this change, performance improved by 3x–5x, depending on the dataset.

github-actions · 2025-09-30T04:26:51Z

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

niyue · 2025-09-30T04:30:00Z

        // Now we iterate through each instruction and process it
-        for (instructions, chunk) in self.instructions.iter() {
-            // TODO: It's very possible that we have duplicate `buf` in self.instructions and we
-            // don't want to decode the buf again and again on the same thread.


This PR partially addresses this TODO. It improves performance unless chunks are accessed in a fully random pattern, which would require a HashMap-based cache at the cost of higher memory usage.

Chunks should always be accessed sequentially I believe. We have a requirement at some point in the decoding process for offsets / ranges to be in sorted order.

codecov-commenter · 2025-09-30T05:07:02Z

Codecov Report

❌ Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 81.67%. Comparing base (7e65e8b) to head (24f9d3f).

Files with missing lines	Patch %	Lines
.../lance-encoding/src/encodings/logical/primitive.rs	94.73%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4846      +/-   ##
==========================================
- Coverage   81.67%   81.67%   -0.01%     
==========================================
  Files         334      334              
  Lines      132492   132508      +16     
  Branches   132492   132508      +16     
==========================================
+ Hits       108215   108227      +12     
- Misses      20640    20645       +5     
+ Partials     3637     3636       -1

Flag	Coverage Δ
unittests	`81.67% <94.73%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

niyue · 2025-10-09T04:13:09Z

This PR is ready for review.

The CI still reports one test failure on mac-build (index::vector::ivf::v2::tests::test_build_ivf_pq_4bit::case_3), I tried rebased onto the latest of main branch, and it is still failing. But I’m unable to reproduce it locally on my MacBook. I’m not very familiar with the failing test, it passed in one of my previous pushes, and it doesn’t appear to touch the code paths I modified at all. Please let me know if you think it’s related — I’ll be happy to investigate further. Thanks!

niyue · 2025-10-13T06:00:16Z

I confirmed that the test case index::vector::ivf::v2::tests::test_build_ivf_pq_4bit::case_3 is flaky — when running it on the main branch (commit 7e65e8b0) on my MacBook locally, 4 out of 50 runs failed.

niyue · 2025-10-13T10:10:11Z

I rebased onto the latest main branch and encountered another flaky test dataset::optimize::tests::test_read_btree_index_with_defer_index_remap (as reported here). I’ll leave the code unchanged for now.

niyue · 2025-10-22T02:34:26Z

@westonpace could you please help review this PR when you have a moment? I believe it partially addresses one of the comments you made in the code you previously wrote for this part

niyue · 2025-10-30T05:12:53Z

Hi @westonpace, just wanted to gently check in to see if you’ve had a chance to take a look at this PR.
No rush at all — just making sure it didn’t slip through the cracks 😊

westonpace

Oh, very nice. Do you have a test case / benchmark of any kind that you've been running to verify performance? No need to get it in for this PR but in a future PR it might be nice to add some kind of benchmark like that to help prevent regressions. I guess it would be a benchmark that is reading every other row (or a bunch of rows in the same page) or something like that.

westonpace · 2025-11-04T03:30:57Z

        // Now we iterate through each instruction and process it
-        for (instructions, chunk) in self.instructions.iter() {
-            // TODO: It's very possible that we have duplicate `buf` in self.instructions and we
-            // don't want to decode the buf again and again on the same thread.


Chunks should always be accessed sequentially I believe. We have a requirement at some point in the decoding process for offsets / ranges to be in sorted order.

…us chunks don't have to be decoded multiple times.

westonpace · 2025-11-04T03:33:25Z

Rebased and will merge on green

westonpace · 2025-11-04T03:33:48Z

Hi @westonpace, just wanted to gently check in to see if you’ve had a chance to take a look at this PR.
No rush at all — just making sure it didn’t slip through the cracks 😊

Sorry, this took way longer to get to than it should have.

niyue · 2025-11-04T06:10:36Z

Do you have a test case / benchmark of any kind that you've been running to verify performance?

I tested it within my application, as described in the Test section of this PR description. The workload involves a sequential but non-contiguous row access pattern, retrieving about 2% of the data. With this enhancement, I observed a 3x–5x performance improvement. I expect block-based compression schemes such as Zstd and LZ4 will show similar performance improvements.

I can try to add a benchmark within Lance itself to verify this improvement more systematically later. Thanks.

…lance-format#4846) # Description When miniblock encoding is used in a Lance file, reading the file with the v2 `FileReader` via the `read_stream_projected` API can become inefficient if the provided `ReadBatchParams::Indices` contains many nearby but non-contiguous row indices. For example: ``` 29, 168, 180, 194, 376, 559, 574, 665, 666, 667, ..., 968, 969, 970, 973, 975, ... ``` This kind of access pattern causes the same chunk to be decoded repeatedly, resulting in slow performance and high CPU usage. # Solution This PR introduces a lightweight single-entry cache in `DecodePageTask`. While it only helps when chunks are accessed in a somewhat sequential manner, row indices are typically sorted in ascending order, so the cache strikes a balance between saving memory and improving performance. # Test On a local setup with a Lance file containing 100k rows (each row with a text column of 200+ bytes): * Reading 1700+ nearby but non-contiguous rows at random * `zstd` is used for general compression * With this change, performance improved by 3x–5x, depending on the dataset.

niyue commented Sep 30, 2025

View reviewed changes

niyue changed the title ~~Add a chunk cache to avoid decoding duplicated miniblock chunks~~ perf: add a chunk cache to avoid decoding duplicated miniblock chunks Sep 30, 2025

github-actions Bot added the performance label Sep 30, 2025

niyue force-pushed the feature/chunk-cache branch 2 times, most recently from 611a69c to b7620b1 Compare September 30, 2025 09:03

niyue force-pushed the feature/chunk-cache branch from b7620b1 to 8632a91 Compare October 9, 2025 02:16

niyue force-pushed the feature/chunk-cache branch from 8632a91 to 24f9d3f Compare October 13, 2025 06:02

niyue mentioned this pull request Oct 13, 2025

perf: reading rows for many nearby but non-contiguous row indices decodes the same chunk repetitively #4940

Open

westonpace approved these changes Nov 4, 2025

View reviewed changes

Add a chunk cache when decoding chunks for miniblock so that continuo…

3ef8e50

…us chunks don't have to be decoded multiple times.

westonpace force-pushed the feature/chunk-cache branch from 24f9d3f to 3ef8e50 Compare November 4, 2025 03:33

westonpace merged commit b229e47 into lance-format:main Nov 4, 2025
26 of 27 checks passed

andrea-reale mentioned this pull request Mar 30, 2026

emilk/fix write starvation rerun-io/lance#12

Closed

Conversation

niyue commented Sep 30, 2025

Description

Solution

Test

Uh oh!

github-actions Bot commented Sep 30, 2025

Uh oh!

niyue Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

westonpace Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

niyue commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

niyue commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

niyue commented Oct 13, 2025

Uh oh!

niyue commented Oct 22, 2025

Uh oh!

niyue commented Oct 30, 2025

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

westonpace Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

westonpace commented Nov 4, 2025

Uh oh!

westonpace commented Nov 4, 2025

Uh oh!

Uh oh!

niyue commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Sep 30, 2025 •

edited

Loading

niyue commented Oct 9, 2025 •

edited

Loading

niyue commented Oct 13, 2025 •

edited

Loading