Skip to content

fix: too large data chunk generated by highly compressed yet nested data with RLE#4431

Merged
Xuanwo merged 4 commits intomainfrom
65k-structs-in-one-row
Aug 12, 2025
Merged

fix: too large data chunk generated by highly compressed yet nested data with RLE#4431
Xuanwo merged 4 commits intomainfrom
65k-structs-in-one-row

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented Aug 11, 2025

close #4429


As described in #4429, highly compressed yet nested data using RLE can produce data chunks that exceed our 16KiB threshold. This happens because our RLE encoding currently considers only the data buffer size and does not account for the size of REP/DEF markers, which can consume up to 4 bytes per value.

Ideally, we should include REP/DEF sizes in the calculation, but that would require significant changes. In this PR, I implemented a workaround to address the issue at the cost of a slightly lower compression ratio. A more comprehensive fix will follow after discussion.

This PR also includes a repro as part of our unit test to prevent regression of this bug.

Xuanwo added 3 commits August 11, 2025 17:16
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
@github-actions github-actions Bot added the bug Something isn't working label Aug 11, 2025
Signed-off-by: Xuanwo <github@xuanwo.io>
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.90%. Comparing base (0c69144) to head (0c758c7).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4431      +/-   ##
==========================================
+ Coverage   81.88%   81.90%   +0.02%     
==========================================
  Files         302      302              
  Lines      123146   123298     +152     
  Branches   123146   123298     +152     
==========================================
+ Hits       100839   100990     +151     
- Misses      18502    18506       +4     
+ Partials     3805     3802       -3     
Flag Coverage Δ
unittests 81.90% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Xuanwo Xuanwo merged commit c9a60f5 into main Aug 12, 2025
30 checks passed
@Xuanwo Xuanwo deleted the 65k-structs-in-one-row branch August 12, 2025 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Panic with assertion failed: chunk_bytes <= 16 * 1024 when writing nested structs with V2.1 format

3 participants