perf: calculate cardinality lazily by Xuanwo · Pull Request #5783 · lance-format/lance

Xuanwo · 2026-01-22T12:04:36Z

This PR will fix #5714.

Cardinality calculation is slow, but we only need it when building the dictionary. This PR changes the calculation to a lazy approach.

BITMAP 5M low-float：229.725 (base) → 264.306 (bad) → 215.792 (good)
BTREE 5M low-float：216.628 (base) → 340.683 (bad) → 222.974 (good)

Parts of this PR were drafted with assistance from Codex (with gpt-5.2) and fully reviewed and edited by me. I take full responsibility for all changes.

github-actions · 2026-01-22T12:05:40Z

Code Review

Summary: The PR correctly implements lazy cardinality calculation to fix the performance regression in BITMAP/BTREE index building. The approach is sound.

P1: Minor inefficiency in `get_stat` implementation

In FixedWidthDataBlock::get_stat(), the computed cardinality is cloned twice unnecessarily:

let computed = self.cardinality();
let mut block_info = self.block_info.0.write().unwrap();
Some(
    block_info
        .entry(stat)
        .or_insert_with(|| computed.clone())  // clone here
        .clone(),  // and clone again here
)

Since computed is moved into the closure scope, the first .clone() inside or_insert_with is unnecessary when the entry doesn't exist. Consider:

let computed = self.cardinality();
let mut block_info = self.block_info.0.write().unwrap();
Some(block_info.entry(stat).or_insert(computed).clone())

This avoids one clone when the stat wasn't already present. The entry API's or_insert will only insert computed if the key is vacant, so it's equivalent but slightly more efficient.

Note on race condition (non-issue)

The pattern of releasing the read lock before acquiring the write lock could theoretically allow another thread to compute the same stat concurrently. However, this is benign:

The entry().or_insert_with() pattern ensures only one computation is stored
Duplicate computations would produce identical results
The performance win from avoiding unnecessary computation on the happy path outweighs rare double-computation

Tests: The two new tests adequately verify the lazy behavior.

Verdict: LGTM with the minor optimization suggestion above.

codecov · 2026-01-22T12:44:24Z

Codecov Report

❌ Patch coverage is 88.88889% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-encoding/src/statistics.rs	88.88%	4 Missing ⚠️

📢 Thoughts on this report? Let us know!

wjones127

Nice work. If the cardinality check is a threshold, I wonder if we should early return. For example, if we have a rule that says "dictionary encode if cardinality > 100", then maybe we can just stop computing cardinality as soon as our estimate is above 100.

Xuanwo · 2026-01-26T10:24:24Z

@wjones127, Thanks! Makes sense. In our case the dictionary decision is based on estimated size ratio, so the “cardinality threshold” isn’t a fixed constant, but we could derive an upper bound for cardinality from the ratio and early-exit the HLL once the estimate exceeds that bound.

Would you be OK if I keep this PR focused on fixing the regression (lazy cardinality), and do the early-exit logic as a follow-up PR with benchmarks to validate the impact?

wjones127 · 2026-01-26T16:26:59Z

Would you be OK if I keep this PR focused on fixing the regression (lazy cardinality), and do the early-exit logic as a follow-up PR with benchmarks to validate the impact?

Yeah, definitely do that as a follow up if you want to do it. I think this PR is already valuable as is!

This PR will fix lance-format#5714. Cardinality calculation is slow, but we only need it when building the dictionary. This PR changes the calculation to a lazy approach. ```shell BITMAP 5M low-float：229.725 (base) → 264.306 (bad) → 215.792 (good) BTREE 5M low-float：216.628 (base) → 340.683 (bad) → 222.974 (good) ``` --- **Parts of this PR were drafted with assistance from Codex (with `gpt-5.2`) and fully reviewed and edited by me. I take full responsibility for all changes.**

tomtomwombat · 2026-02-09T06:23:04Z

You may want to consider using https://crates.io/crates/hyperloglockless for the HyperLogLogPlus. It has O(1) cardinality queries and very fast inserts. It's also more accurate. I believe it can be a drop-in replacement.
Also, because you are using a low precision of 4, the HLL++ variant would be unnecessary since it immediately switches to "dense" HLL representation at very few items (wasting time on HLL++'s sparse -> dense conversion).

Disclaimer: I am the author of hyperloglockless.

perf: calculate cardinality lazily

9253569

github-actions Bot added the performance label Jan 22, 2026

wjones127 approved these changes Jan 23, 2026

View reviewed changes

wjones127 self-assigned this Jan 23, 2026

Merge branch 'main' into luban/fix-bitmap-btree-build-t

46c1efa

Xuanwo merged commit 3044051 into main Jan 27, 2026
28 checks passed

Xuanwo deleted the luban/fix-bitmap-btree-build-t branch January 27, 2026 12:11

Xuanwo mentioned this pull request Jan 27, 2026

Make the dict cardinality check is a threshold for early return #5822

Open

andrea-reale mentioned this pull request Mar 30, 2026

emilk/fix write starvation rerun-io/lance#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: calculate cardinality lazily#5783

perf: calculate cardinality lazily#5783
Xuanwo merged 2 commits intomainfrom
luban/fix-bitmap-btree-build-t

Xuanwo commented Jan 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

codecov Bot commented Jan 22, 2026

Uh oh!

wjones127 left a comment

Uh oh!

Xuanwo commented Jan 26, 2026 •

edited

Loading

Uh oh!

wjones127 commented Jan 26, 2026

Uh oh!

Uh oh!

tomtomwombat commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Xuanwo commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jan 22, 2026

Code Review

P1: Minor inefficiency in get_stat implementation

Note on race condition (non-issue)

Uh oh!

codecov Bot commented Jan 22, 2026

Codecov Report

Uh oh!

wjones127 left a comment

Choose a reason for hiding this comment

Uh oh!

Xuanwo commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wjones127 commented Jan 26, 2026

Uh oh!

Uh oh!

tomtomwombat commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Xuanwo commented Jan 22, 2026 •

edited

Loading

P1: Minor inefficiency in `get_stat` implementation

Xuanwo commented Jan 26, 2026 •

edited

Loading