Skip to content

fix: schema isn't expected for IVF_PQ#3606

Merged
BubbleCal merged 6 commits intolance-format:mainfrom
BubbleCal:fix-schema-not-match
Mar 26, 2025
Merged

fix: schema isn't expected for IVF_PQ#3606
BubbleCal merged 6 commits intolance-format:mainfrom
BubbleCal:fix-schema-not-match

Conversation

@BubbleCal
Copy link
Copy Markdown
Contributor

@BubbleCal BubbleCal commented Mar 26, 2025

now we drop the __ivf_part_id when shuffling, the corner is that num_partitions=1:

  1. if num_partitions=1 then no shuffling is needed
  2. the shuffler reader would return the data directly
  3. then the __ivf_part_id is not dropped, it's written into the index file as well

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@github-actions github-actions Bot added the bug Something isn't working label Mar 26, 2025
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@BubbleCal BubbleCal requested review from westonpace and wkalt March 26, 2025 06:51
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

Attention: Patch coverage is 92.00000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 78.68%. Comparing base (20cde3b) to head (7d3556e).

Files with missing lines Patch % Lines
rust/lance-index/src/vector/pq/storage.rs 95.16% 1 Missing and 2 partials ⚠️
rust/lance/src/index/vector/ivf/v2.rs 33.33% 2 Missing ⚠️
rust/lance-index/src/vector/storage.rs 90.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3606      +/-   ##
==========================================
+ Coverage   78.67%   78.68%   +0.01%     
==========================================
  Files         258      258              
  Lines       96817    96890      +73     
  Branches    96817    96890      +73     
==========================================
+ Hits        76172    76242      +70     
+ Misses      17578    17576       -2     
- Partials     3067     3072       +5     
Flag Coverage Δ
unittests 78.68% <92.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@BubbleCal BubbleCal marked this pull request as ready for review March 26, 2025 07:45
Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find. A few small questions because I'm not sure why we need extra code to generate an invalid state?

.unwrap()
}

async fn create_pq_storage_with_extra_column() -> ProductQuantizationStorage {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still get PQ storage with an extra column from a real workflow? Or is this just generating some kind of invalid input for testing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just for testing, we shouldn't see any extra column in real workflow

}

#[tokio::test]
async fn test_remap_with_extra_column() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because some old indices will have this extra column and we need to make sure they are supported?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, we saw some feedbacks about this, so add this test to make sure the old indices could work with this fix

@BubbleCal BubbleCal requested a review from westonpace March 26, 2025 14:11
@BubbleCal BubbleCal merged commit 33634d3 into lance-format:main Mar 26, 2025
30 checks passed
Jay-ju pushed a commit to Jay-ju/lance that referenced this pull request Mar 28, 2025
now we drop the `__ivf_part_id` when shuffling, the corner is that
`num_partitions=1`:
1. if `num_partitions=1` then no shuffling is needed
2. the shuffler reader would return the data directly
3. then the `__ivf_part_id` is not dropped, it's written into the index
file as well

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants