Skip to content

fix: correct null_count aggregation in boolean statistics collection#4839

Merged
wjones127 merged 1 commit intolance-format:mainfrom
YinZheng-Sun:fix_bug
Dec 19, 2025
Merged

fix: correct null_count aggregation in boolean statistics collection#4839
wjones127 merged 1 commit intolance-format:mainfrom
YinZheng-Sun:fix_bug

Conversation

@YinZheng-Sun
Copy link
Copy Markdown
Contributor

@YinZheng-Sun YinZheng-Sun commented Sep 29, 2025

The get_boolean_statistics function was incorrectly breaking early when both true and false values were found in the first array, causing subsequent arrays' null counts to be skipped. This resulted in incorrect null_count values when processing multiple batches.

@github-actions github-actions Bot added the bug Something isn't working label Sep 29, 2025
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this and sorry for missing this PR.

Can you fix clippy and format? This PR is almost ready to go.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Oct 14, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@YinZheng-Sun
Copy link
Copy Markdown
Contributor Author

YinZheng-Sun commented Oct 21, 2025

Thank you for working on this and sorry for missing this PR.

Can you fix clippy and format? This PR is almost ready to go.

done @Xuanwo

@YinZheng-Sun YinZheng-Sun requested a review from Xuanwo October 21, 2025 12:55
@github-actions
Copy link
Copy Markdown
Contributor

Code Review

Summary: This PR fixes a bug where null_count was incorrectly calculated for boolean statistics when multiple arrays were processed. The fix is correct and well-tested.

Assessment: ✅ Approve

The bug fix is straightforward and correct:

  • Before: The break statement exited the loop entirely once both true and false were found, skipping null count accumulation for remaining arrays
  • After: Using continue allows the loop to process all arrays for null count while skipping the expensive value iteration when min/max are already determined

The test case properly validates the fix by checking that null counts are correctly aggregated across multiple arrays.

No issues identified.

@wjones127 wjones127 self-assigned this Dec 19, 2025
@wjones127 wjones127 merged commit e94d32c into lance-format:main Dec 19, 2025
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants