Skip to content

fix: avoid creating empty encoding task and part for PrimitiveFieldEncoder#3607

Merged
westonpace merged 3 commits intolance-format:mainfrom
niyue:bugfix/empty-part
Mar 27, 2025
Merged

fix: avoid creating empty encoding task and part for PrimitiveFieldEncoder#3607
westonpace merged 3 commits intolance-format:mainfrom
niyue:bugfix/empty-part

Conversation

@niyue
Copy link
Copy Markdown
Contributor

@niyue niyue commented Mar 26, 2025

PrimitiveFieldEncoder may generate empty parts and their corresponding encoding tasks, especially when max_page_size is small. This is unnecessary and can be confusing, as some empty part information gets recorded at the end, and redundant encoding tasks are processed needlessly. This PR fixes the issue by exiting the loop early when there is no data to process.

@github-actions github-actions Bot added the bug Something isn't working label Mar 26, 2025
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 26, 2025

Codecov Report

Attention: Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 78.70%. Comparing base (47026e0) to head (08627f7).

Files with missing lines Patch % Lines
.../lance-encoding/src/encodings/logical/primitive.rs 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3607      +/-   ##
==========================================
+ Coverage   78.68%   78.70%   +0.02%     
==========================================
  Files         258      258              
  Lines       96897    96900       +3     
  Branches    96897    96900       +3     
==========================================
+ Hits        76241    76268      +27     
+ Misses      17588    17561      -27     
- Partials     3068     3071       +3     
Flag Coverage Δ
unittests 78.70% <66.66%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@niyue niyue force-pushed the bugfix/empty-part branch from 105d616 to 4b35d01 Compare March 26, 2025 11:01
@LuQQiu LuQQiu requested review from westonpace and wjones127 March 26, 2025 16:52
Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had noticed this in some files with large data (which I guess is similar to small pages) and never really tracked down why it happened. I think I see now how it can happen. We should probably have made the for _ in 0..num_parts loop a while offset < array.len() loop.

This fix is fine too.

@westonpace
Copy link
Copy Markdown
Member

Thanks for figuring this out!

@westonpace westonpace merged commit 40142fb into lance-format:main Mar 27, 2025
27 checks passed
Jay-ju pushed a commit to Jay-ju/lance that referenced this pull request Mar 28, 2025
…coder (lance-format#3607)

`PrimitiveFieldEncoder` may generate empty `part`s and their
corresponding encoding tasks, especially when `max_page_size` is small.
This is unnecessary and can be confusing, as some empty part information
gets recorded at the end, and redundant encoding tasks are processed
needlessly. This PR fixes the issue by exiting the loop early when there
is no data to process.

Co-authored-by: LuQQiu <luqiujob@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants