perf: use binary search to skip documents by BubbleCal · Pull Request #5636 · lance-format/lance

BubbleCal · 2026-01-06T11:43:14Z

searching gains ~7% faster

github-actions · 2026-01-06T11:44:20Z

Code Review

Summary

This PR optimizes the PostingIterator::next() method for compressed posting lists by replacing a linear scan with binary search (partition_point), yielding ~7% performance improvement for indexing/searching.

Issues

P1: Redundant block skipping logic

The new code at line 239-244 still performs the block-skipping loop before entering the new binary search loop. However, the new while-loop (lines 245-269) re-calculates block_idx = self.index / BLOCK_SIZE on each iteration and independently handles block advancement. This means the initial block-skipping loop's work may be immediately undone or duplicated. Consider whether the initial block-skipping loop (lines 238-244) is still necessary or if it can be removed to simplify the logic.

P1: Potential infinite loop if decompression produces empty doc_ids

In the new loop at line 261, if compressed.doc_ids is empty after decompression (which shouldn't happen normally but could occur with corrupted data), offset_in_block would be 0 and new_offset would equal block_offset. If new_offset >= compressed.doc_ids.len() (0 >= 0 is false), it would break out correctly. However, if there's ever a case where the block contains fewer docs than expected, the code should handle it gracefully. The current bounds check at line 263 mitigates this, but adding a defensive check or comment would improve robustness.

Minor Observation

The test coverage is good - it tests cross-block seeking and seeking past the end. Consider adding an edge case test for seeking within the same block (e.g., seek from index 5 to 7 within block 0) to ensure the partition_point works correctly within a single block without block transitions.

Overall, this is a solid performance optimization with appropriate test coverage.

…g/inverted-query-opt

codecov · 2026-01-08T06:44:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Xuanwo · 2026-01-08T07:09:23Z

+                while self.index < length {
+                    let block_idx = self.index / BLOCK_SIZE;
+                    let block_offset = self.index % BLOCK_SIZE;
+                    let compressed = unsafe {


what's this for?

the posintg lists are compressed in memory and we do decompress the needed blocks while searching

Xuanwo · 2026-01-08T07:14:04Z

                }
                self.index = self.index.max(block_idx * BLOCK_SIZE);
-                let length = self.list.len();
-                while self.index < length && (self.doc().unwrap().doc_id() as u32) < least_id {


Why is our new PR faster? Is self.doc() a heavy operation? Would it be better to provide an API like compressed_doc(doc_index) instead of maintaining complex logic inside next?

The idea is to use binary search to avoid scanning the entire block.

doc() is the most costly operation during FTS search because it decompresses the block if needed

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

…g/inverted-query-opt

searching gains ~7% faster --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>

BubbleCal added 3 commits January 6, 2026 18:54

Add inverted index optimization notes

b811929

Optimize compressed posting skips

bc19906

Remove inverted query notes from repo

ede5d2d

github-actions Bot added the performance label Jan 6, 2026

Merge branch 'main' of https://github.com/lance-format/lance into yan…

fccbccf

…g/inverted-query-opt

Xuanwo reviewed Jan 8, 2026

View reviewed changes

BubbleCal added 3 commits January 8, 2026 16:50

Refactor posting iterator block loading

28ac0b5

format

1289160

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

Merge branch 'main' of https://github.com/lance-format/lance into yan…

66a2a9b

…g/inverted-query-opt

BubbleCal requested a review from Xuanwo January 8, 2026 09:21

Merge branch 'main' into yang/inverted-query-opt

7325350

Xuanwo approved these changes Jan 15, 2026

View reviewed changes

BubbleCal merged commit e82da34 into main Jan 15, 2026
29 checks passed

BubbleCal deleted the yang/inverted-query-opt branch January 15, 2026 09:41

jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026

perf: use binary search to skip documents (lance-format#5636)

28c98fa

searching gains ~7% faster --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>

andrea-reale mentioned this pull request Mar 30, 2026

emilk/fix write starvation rerun-io/lance#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: use binary search to skip documents#5636

perf: use binary search to skip documents#5636
BubbleCal merged 8 commits intomainfrom
yang/inverted-query-opt

BubbleCal commented Jan 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jan 6, 2026

Uh oh!

codecov Bot commented Jan 8, 2026

Uh oh!

Xuanwo Jan 8, 2026

Uh oh!

BubbleCal Jan 8, 2026

Uh oh!

Xuanwo Jan 8, 2026

Uh oh!

BubbleCal Jan 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BubbleCal commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jan 6, 2026

Code Review

Summary

Issues

Minor Observation

Uh oh!

codecov Bot commented Jan 8, 2026

Codecov Report

Uh oh!

Xuanwo Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

BubbleCal Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Xuanwo Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

BubbleCal Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BubbleCal commented Jan 6, 2026 •

edited

Loading

BubbleCal Jan 8, 2026 •

edited

Loading