feat: large literals block support (>262KB)#30
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdded a compile-time assertion that Changes
Sequence Diagram(s)Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR refines the compressed-literals encoding path to remove an unreachable panic for oversized literal sections, documents the RFC size-format behavior, and adds roundtrip + C cross-validation tests targeting larger literal sections up to the Zstd max block size (128KiB).
Changes:
- Replace an
unimplemented!("too many literals")with anunreachable!and add adebug_assert!guarding theMAX_BLOCK_SIZEinvariant incompress_literals. - Add RFC 8878 Size_Format documentation inline in the literals encoding path.
- Add new large-block roundtrip and Rust↔C cross-validation tests for key size boundaries up to 128KiB (plus 512KiB multi-block inputs).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
zstd/src/encoding/blocks/compressed.rs |
Removes unreachable panic, adds invariant debug_assert!, and documents Size_Format behavior. |
zstd/src/tests/roundtrip_integrity.rs |
Adds large-literals roundtrip tests intended to exercise size-format boundaries and multi-block behavior. |
zstd/tests/cross_validation.rs |
Adds Rust↔C cross-validation for larger inputs and multi-block cases with Huffman-friendly data. |
c422f77 to
1aa762d
Compare
…ge literals
- Replace `unimplemented!("too many literals")` with `unreachable!` and
descriptive message explaining the MAX_BLOCK_SIZE invariant
- Add debug_assert validating literals never exceed MAX_BLOCK_SIZE (128KB)
- Add RFC 8878 §3.1.1.3.1.1 size format documentation comments
- Add roundtrip tests exercising all 4 size format boundaries (0b00–0b11)
- Add cross-validation tests (Rust ↔ C FFI) for large blocks up to 128KB
Closes #15
- Remove redundant debug_assert (unreachable! already guards the invariant) - Clarify RFC comment: only formats 0b10/0b11 are reachable in encoder - Fix test docs: roundtrip tests verify correctness, not specific format selection - Rename test to roundtrip_large_literals (accurate scope)
…sert
- Use `const { assert!(MAX_BLOCK_SIZE <= 262143) }` for compile-time safety
- Replace `_ => unreachable!()` with `_ => (0b11, 18)` wildcard arm
- Add assert for alphabet_size > 0 in test helpers
Eliminates uncoverable dead code that caused 0% patch coverage.
1aa762d to
efe30ae
Compare
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'structured-zstd vs C FFI'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.
| Benchmark suite | Current: 0f7cf20 | Previous: 90378eb | Ratio |
|---|---|---|---|
compress/c_ffi/level1 |
3.401 ms |
2.597 ms |
1.31 |
compress/c_ffi/level3 |
5.043 ms |
4.258 ms |
1.18 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @polaz
Guard against custom Matcher implementations that might produce literals exceeding the 18-bit size format limit (262143 bytes).
Upgrade debug_assert! to assert! — truncated 18-bit writes in release builds would produce corrupt streams silently. The assert fires in all build profiles, preventing invalid output from custom Matcher impls.
…sage
- Move compile-time MAX_BLOCK_SIZE check to idiomatic `const _: () = assert!(...)`
at module scope instead of inline `const { ... }` expression
- Add static panic message to runtime assert (format args omitted to avoid
uncoverable dead code in coverage instrumentation)
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
Summary
unimplemented!("too many literals")with compile-time const assert + runtime assertTechnical Details
The 18-bit size format (max 262,143) always covers
MAX_BLOCK_SIZE(128KB = 131,072), so the original panic was unreachable in normal operation. The fix uses a two-layer safety approach:const { assert!(MAX_BLOCK_SIZE <= 262143) }— proves the standard encoder path is safeassert!(literals.len() < 262144)— hard guard against customMatcherimplementations that might exceed the limit (truncated 18-bit writes would produce corrupt streams)The encoder uses
raw_literalsfor blocks ≤ 1024 bytes, so only size formats 0b10 (14-bit) and 0b11 (18-bit) are reachable incompress_literals. The 0b00/0b01 arms are kept for completeness. New tests verify roundtrip correctness with large inputs (1KB–512KB) cross-validated against C zstd.Test Plan
cargo nextest run -p structured-zstd)Closes #15
Summary by CodeRabbit
Bug Fixes
Tests