Skip to content

fix: general block decompression mismatch for Lance 2.2 dictionaries#5025

Merged
Xuanwo merged 2 commits intomainfrom
fix-general-compression-not-covered
Oct 21, 2025
Merged

fix: general block decompression mismatch for Lance 2.2 dictionaries#5025
Xuanwo merged 2 commits intomainfrom
fix-general-compression-not-covered

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented Oct 21, 2025

PR #4900 enabled writing dictionary pages through Compression::General, but the read path DefaultDecompressionStrategy::create_block_decompressor still routed those pages directly into BinaryBlockDecompressor. When the compressed payload actually held fixed-width dictionary values (e.g., MiniBlockSchedulerDictionary with fixed Int8 lists), the first byte was reinterpreted as bits_per_offset, panicking inside BinaryBlockDecompressor.

We didn't detect this during PR because we didn't add the test for 2.2 version.

This PR fixed this issue and added a new test to cover.


This PR was primarily authored with Codex using GPT-5-Codex and then hand-reviewed by me. I AM responsible for every change made in this PR. I aimed to keep it aligned with our goals, though I may have missed minor issues. Please flag anything that feels off, I'll fix it quickly.

@Xuanwo Xuanwo requested a review from westonpace October 21, 2025 17:20
@github-actions github-actions Bot added the bug Something isn't working label Oct 21, 2025
Signed-off-by: Xuanwo <github@xuanwo.io>
Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need impl BlockDecompressor for CompressedBufferEncoder?

@Xuanwo
Copy link
Copy Markdown
Collaborator Author

Xuanwo commented Oct 21, 2025

Do we still need impl BlockDecompressor for CompressedBufferEncoder?

I think they don't need anymore, let me create a new PR to remove them.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 72.85714% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.74%. Comparing base (5f60351) to head (7a78a0a).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance-encoding/src/compression.rs 70.17% 11 Missing and 6 partials ⚠️
...ust/lance-encoding/src/encodings/physical/block.rs 84.61% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5025   +/-   ##
=======================================
  Coverage   81.73%   81.74%           
=======================================
  Files         340      340           
  Lines      137432   137499   +67     
  Branches   137432   137499   +67     
=======================================
+ Hits       112332   112395   +63     
+ Misses      21377    21368    -9     
- Partials     3723     3736   +13     
Flag Coverage Δ
unittests 81.74% <72.85%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Xuanwo Xuanwo merged commit 633aaa5 into main Oct 21, 2025
31 checks passed
@Xuanwo Xuanwo deleted the fix-general-compression-not-covered branch October 21, 2025 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants