fix: too large data chunk generated by highly compressed yet nested data with RLE#4431
Merged
fix: too large data chunk generated by highly compressed yet nested data with RLE#4431
Conversation
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4431 +/- ##
==========================================
+ Coverage 81.88% 81.90% +0.02%
==========================================
Files 302 302
Lines 123146 123298 +152
Branches 123146 123298 +152
==========================================
+ Hits 100839 100990 +151
- Misses 18502 18506 +4
+ Partials 3805 3802 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
BubbleCal
approved these changes
Aug 12, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
close #4429
As described in #4429, highly compressed yet nested data using RLE can produce data chunks that exceed our 16KiB threshold. This happens because our RLE encoding currently considers only the data buffer size and does not account for the size of REP/DEF markers, which can consume up to 4 bytes per value.
Ideally, we should include REP/DEF sizes in the calculation, but that would require significant changes. In this PR, I implemented a workaround to address the issue at the cost of a slightly lower compression ratio. A more comprehensive fix will follow after discussion.
This PR also includes a repro as part of our unit test to prevent regression of this bug.