Skip to content

refactor: write bitmap index statistics in file instead#5251

Merged
Xuanwo merged 18 commits intomainfrom
debug-4620
Dec 8, 2025
Merged

refactor: write bitmap index statistics in file instead#5251
Xuanwo merged 18 commits intomainfrom
debug-4620

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented Nov 17, 2025

Close #4620

This PR will write bitmap index statistics in file instead so we don't need to load the entire index file to calculate it.


This PR was primarily authored with Codex using GPT-5-Codex and then hand-reviewed by me. I AM responsible for every change made in this PR. I aimed to keep it aligned with our goals, though I may have missed minor issues. Please flag anything that feels off, I'll fix it quickly.

Signed-off-by: Xuanwo <github@xuanwo.io>
@Xuanwo Xuanwo requested a review from wjones127 November 17, 2025 07:51
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread rust/lance/src/index.rs Outdated
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Nov 17, 2025

Codecov Report

❌ Patch coverage is 75.26882% with 23 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/index.rs 85.13% 7 Missing and 4 partials ⚠️
rust/lance-index/src/scalar/bitmap.rs 33.33% 8 Missing and 2 partials ⚠️
rust/lance-index/src/scalar/bloomfilter.rs 0.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach. I think we should figure out whether or not we want index_types() method or if we want something else. Weston is working on something in parallel in #5221

Comment thread rust/lance-index/src/scalar/bitmap.rs Outdated
Comment thread rust/lance-index/src/scalar/bitmap.rs Outdated
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
Comment thread rust/lance-index/src/scalar/registry.rs Outdated
Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions to modernize io tracking

Comment thread rust/lance/src/index.rs Outdated
Comment thread rust/lance/src/index.rs Outdated
Comment thread rust/lance/src/index.rs Outdated
Xuanwo and others added 4 commits November 22, 2025 02:18
Co-authored-by: Will Jones <willjones127@gmail.com>
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
@Xuanwo Xuanwo requested a review from westonpace November 26, 2025 11:06
@Xuanwo Xuanwo requested a review from wjones127 November 26, 2025 11:06
Comment thread rust/lance/src/index.rs
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve pending CI green.

I think we're good here on backwards / forwards compatibility?

If new code is reading an old index then if let Some(stats) = plugin.load_statistics(index_store.clone(), index_details.as_ref()).await? fails and we fallback to the old approach?

If old code is reading a new index then the old approach should still work since we didn't get rid of any information.

We should probably add "get statistics" to the forwards / backwards compatibility test suites at some point but perhaps can be in a follow-up?

@Xuanwo
Copy link
Copy Markdown
Collaborator Author

Xuanwo commented Dec 8, 2025

I think we're good here on backwards / forwards compatibility?

If new code is reading an old index then if let Some(stats) = plugin.load_statistics(index_store.clone(), index_details.as_ref()).await? fails and we fallback to the old approach?

If old code is reading a new index then the old approach should still work since we didn't get rid of any information.

Yes, we should be good on this, either way works.

We should probably add "get statistics" to the forwards / backwards compatibility test suites at some point but perhaps can be in a follow-up?

Makes sense to me.

Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
@Xuanwo Xuanwo merged commit c86e97f into main Dec 8, 2025
26 checks passed
@Xuanwo Xuanwo deleted the debug-4620 branch December 8, 2025 16:09
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
…#5251)

Close lance-format#4620

This PR will write bitmap index statistics in file instead so we don't
need to load the entire index file to calculate it.

---

**This PR was primarily authored with Codex using GPT-5-Codex and then
hand-reviewed by me. I AM responsible for every change made in this PR.
I aimed to keep it aligned with our goals, though I may have missed
minor issues. Please flag anything that feels off, I'll fix it
quickly.**

---------

Signed-off-by: Xuanwo <github@xuanwo.io>
Co-authored-by: Will Jones <willjones127@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

index_statistics on LABEL_LIST index is very slow

4 participants