Conversation
Code ReviewSummary: The PR correctly implements lazy cardinality calculation to fix the performance regression in BITMAP/BTREE index building. The approach is sound. P1: Minor inefficiency in
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
wjones127
left a comment
There was a problem hiding this comment.
Nice work. If the cardinality check is a threshold, I wonder if we should early return. For example, if we have a rule that says "dictionary encode if cardinality > 100", then maybe we can just stop computing cardinality as soon as our estimate is above 100.
|
@wjones127, Thanks! Makes sense. In our case the dictionary decision is based on estimated size ratio, so the “cardinality threshold” isn’t a fixed constant, but we could derive an upper bound for cardinality from the ratio and early-exit the HLL once the estimate exceeds that bound. Would you be OK if I keep this PR focused on fixing the regression (lazy cardinality), and do the early-exit logic as a follow-up PR with benchmarks to validate the impact? |
Yeah, definitely do that as a follow up if you want to do it. I think this PR is already valuable as is! |
This PR will fix lance-format#5714. Cardinality calculation is slow, but we only need it when building the dictionary. This PR changes the calculation to a lazy approach. ```shell BITMAP 5M low-float:229.725 (base) → 264.306 (bad) → 215.792 (good) BTREE 5M low-float:216.628 (base) → 340.683 (bad) → 222.974 (good) ``` --- **Parts of this PR were drafted with assistance from Codex (with `gpt-5.2`) and fully reviewed and edited by me. I take full responsibility for all changes.**
|
You may want to consider using https://crates.io/crates/hyperloglockless for the Disclaimer: I am the author of hyperloglockless. |
This PR will fix #5714.
Cardinality calculation is slow, but we only need it when building the dictionary. This PR changes the calculation to a lazy approach.
Parts of this PR were drafted with assistance from Codex (with
gpt-5.2) and fully reviewed and edited by me. I take full responsibility for all changes.