Skip to content

feat: large literals block support (>262KB)#30

Merged
polaz merged 6 commits intomainfrom
fix/#15-feat-large-literals-block-support-262kb
Mar 25, 2026
Merged

feat: large literals block support (>262KB)#30
polaz merged 6 commits intomainfrom
fix/#15-feat-large-literals-block-support-262kb

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Mar 25, 2026

Summary

  • Replace unimplemented!("too many literals") with compile-time const assert + runtime assert
  • Add RFC 8878 §3.1.1.3.1.1 size format documentation clarifying encoder vs spec behavior
  • Add roundtrip and cross-validation tests for large literal sections (1KB–128KB)

Technical Details

The 18-bit size format (max 262,143) always covers MAX_BLOCK_SIZE (128KB = 131,072), so the original panic was unreachable in normal operation. The fix uses a two-layer safety approach:

  1. Compile-time: const { assert!(MAX_BLOCK_SIZE <= 262143) } — proves the standard encoder path is safe
  2. Runtime: assert!(literals.len() < 262144) — hard guard against custom Matcher implementations that might exceed the limit (truncated 18-bit writes would produce corrupt streams)

The encoder uses raw_literals for blocks ≤ 1024 bytes, so only size formats 0b10 (14-bit) and 0b11 (18-bit) are reachable in compress_literals. The 0b00/0b01 arms are kept for completeness. New tests verify roundtrip correctness with large inputs (1KB–512KB) cross-validated against C zstd.

Test Plan

  • 72/72 tests pass (cargo nextest run -p structured-zstd)
  • Roundtrip: large inputs at 1KB, 16KB, 64KB, 128KB boundaries
  • Cross-validation: Rust compress → C decompress for large blocks
  • Cross-validation: C compress → Rust decompress for large blocks
  • Multi-block: 512KB data split across multiple 128KB blocks

Closes #15

Summary by CodeRabbit

  • Bug Fixes

    • Added compile-time and runtime safety checks to prevent encoding of out-of-spec literal sizes in compression
  • Tests

    • Added roundtrip integrity tests for large Huffman-friendly data across various sizes (1KB–512KB)
    • Added cross-language roundtrip validation tests for Rust ↔ C compatibility with large payloads

Copilot AI review requested due to automatic review settings March 25, 2026 17:45
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 37a8b705-f6fd-40c0-9333-b050b3d7b1a6

📥 Commits

Reviewing files that changed from the base of the PR and between 90378eb and 0f7cf20.

📒 Files selected for processing (3)
  • zstd/src/encoding/blocks/compressed.rs
  • zstd/src/tests/roundtrip_integrity.rs
  • zstd/tests/cross_validation.rs

📝 Walkthrough

Walkthrough

Added a compile-time assertion that crate::common::MAX_BLOCK_SIZE fits the encoder’s 18-bit literal-size field; replaced the prior unimplemented!() arm with a match branch that emits the 18-bit size format for large literal lengths and added a runtime guard asserting literals.len() <= 262_143. Added deterministic, alphabet-limited test generators and new large/multi-block roundtrip and cross-validation tests.

Changes

Cohort / File(s) Summary
Core encoding logic
zstd/src/encoding/blocks/compressed.rs
Add const/compile-time assert! for MAX_BLOCK_SIZE ≤ 262_143; replace unimplemented!("too many literals") with a match arm that selects the 18-bit size format for large literal lengths and add a runtime assert!(literals.len() <= 262_143, ...); expand Size_Format comments.
Rust unit tests
zstd/src/tests/roundtrip_integrity.rs
Add generate_huffman_friendly deterministic generator and new #[test] cases exercising large literal sizes (around 1KiB, 16KiB, 64KiB, 128KiB) and a 512KiB multi-block roundtrip for both simple and streaming APIs.
Cross-language tests
zstd/tests/cross_validation.rs
Add same deterministic generator and cross-validation #[test] cases verifying Rust→FFI and FFI→Rust roundtrips across large sizes and a 512KiB multi-block scenario.

Sequence Diagram(s)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

  • #15: feat: large literals block support (>262KB) — This change bounds literal-size emission to the 18-bit format and prevents the prior panic; it does not implement the C reference’s 4-stream Huffman encoding for very large literal sections.
  • feat: block splitting for improved compression ratio #23 — Modifies the same literal-compression encoding path referenced by that issue.

Poem

🐰
I nibble bytes that love to sing,
folded small, then stretched a wing.
No panic now, just bounds in sight,
tests hop through blocks from day to night.
Hooray—roundtrips snug and tight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR removes the unimplemented panic and adds safety guards for literal sizes, but does not implement 4-stream Huffman encoding for literals >1KB as required by issue #15. Implement 4-stream Huffman encoding for literals >1KB as specified in issue #15 acceptance criteria; currently the code only replaces the panic with safe handling but does not add the required Huffman multi-stream support.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: large literals block support (>262KB)' directly corresponds to the main change: supporting large literal blocks by removing the panic and implementing proper size encoding.
Out of Scope Changes check ✅ Passed All changes are focused on supporting large literal blocks: compile-time and runtime assertions for size limits, documentation updates, and test coverage for roundtrip and cross-validation scenarios.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/#15-feat-large-literals-block-support-262kb

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refines the compressed-literals encoding path to remove an unreachable panic for oversized literal sections, documents the RFC size-format behavior, and adds roundtrip + C cross-validation tests targeting larger literal sections up to the Zstd max block size (128KiB).

Changes:

  • Replace an unimplemented!("too many literals") with an unreachable! and add a debug_assert! guarding the MAX_BLOCK_SIZE invariant in compress_literals.
  • Add RFC 8878 Size_Format documentation inline in the literals encoding path.
  • Add new large-block roundtrip and Rust↔C cross-validation tests for key size boundaries up to 128KiB (plus 512KiB multi-block inputs).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
zstd/src/encoding/blocks/compressed.rs Removes unreachable panic, adds invariant debug_assert!, and documents Size_Format behavior.
zstd/src/tests/roundtrip_integrity.rs Adds large-literals roundtrip tests intended to exercise size-format boundaries and multi-block behavior.
zstd/tests/cross_validation.rs Adds Rust↔C cross-validation for larger inputs and multi-block cases with Huffman-friendly data.

Comment thread zstd/src/encoding/blocks/compressed.rs Outdated
Comment thread zstd/tests/cross_validation.rs Outdated
Comment thread zstd/tests/cross_validation.rs Outdated
Comment thread zstd/src/tests/roundtrip_integrity.rs Outdated
Comment thread zstd/src/tests/roundtrip_integrity.rs Outdated
@polaz polaz force-pushed the fix/#15-feat-large-literals-block-support-262kb branch from c422f77 to 1aa762d Compare March 25, 2026 18:07
@polaz polaz requested a review from Copilot March 25, 2026 18:08
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread zstd/tests/cross_validation.rs
Comment thread zstd/src/tests/roundtrip_integrity.rs
polaz added 3 commits March 25, 2026 20:17
…ge literals

- Replace `unimplemented!("too many literals")` with `unreachable!` and
  descriptive message explaining the MAX_BLOCK_SIZE invariant
- Add debug_assert validating literals never exceed MAX_BLOCK_SIZE (128KB)
- Add RFC 8878 §3.1.1.3.1.1 size format documentation comments
- Add roundtrip tests exercising all 4 size format boundaries (0b00–0b11)
- Add cross-validation tests (Rust ↔ C FFI) for large blocks up to 128KB

Closes #15
- Remove redundant debug_assert (unreachable! already guards the invariant)
- Clarify RFC comment: only formats 0b10/0b11 are reachable in encoder
- Fix test docs: roundtrip tests verify correctness, not specific format selection
- Rename test to roundtrip_large_literals (accurate scope)
…sert

- Use `const { assert!(MAX_BLOCK_SIZE <= 262143) }` for compile-time safety
- Replace `_ => unreachable!()` with `_ => (0b11, 18)` wildcard arm
- Add assert for alphabet_size > 0 in test helpers

Eliminates uncoverable dead code that caused 0% patch coverage.
@polaz polaz force-pushed the fix/#15-feat-large-literals-block-support-262kb branch from 1aa762d to efe30ae Compare March 25, 2026 18:19
@polaz polaz requested a review from Copilot March 25, 2026 18:19
Copy link
Copy Markdown

@sw-release-bot sw-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'structured-zstd vs C FFI'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite Current: 0f7cf20 Previous: 90378eb Ratio
compress/c_ffi/level1 3.401 ms 2.597 ms 1.31
compress/c_ffi/level3 5.043 ms 4.258 ms 1.18

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread zstd/src/encoding/blocks/compressed.rs
Guard against custom Matcher implementations that might produce
literals exceeding the 18-bit size format limit (262143 bytes).
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread zstd/src/encoding/blocks/compressed.rs
Comment thread zstd/src/encoding/blocks/compressed.rs Outdated
Upgrade debug_assert! to assert! — truncated 18-bit writes in release
builds would produce corrupt streams silently. The assert fires in all
build profiles, preventing invalid output from custom Matcher impls.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread zstd/src/encoding/blocks/compressed.rs Outdated
Comment thread zstd/src/encoding/blocks/compressed.rs Outdated
…sage

- Move compile-time MAX_BLOCK_SIZE check to idiomatic `const _: () = assert!(...)`
  at module scope instead of inline `const { ... }` expression
- Add static panic message to runtime assert (format args omitted to avoid
  uncoverable dead code in coverage instrumentation)
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 25, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 25, 2026

@polaz Sure! I'll re-review the changes in PR #30 now.


🧠 Learnings used
Learnt from: CR
Repo: structured-world/structured-zstd PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-25T04:29:01.996Z
Learning: Avoid suggesting architectural changes that diverge too far from upstream (ruzstd/KillingSpark/zstd-rs fork)
✅ Actions performed

Full review triggered.

@polaz polaz merged commit 1c9c616 into main Mar 25, 2026
14 of 15 checks passed
@polaz polaz deleted the fix/#15-feat-large-literals-block-support-262kb branch March 25, 2026 21:58
@sw-release-bot sw-release-bot Bot mentioned this pull request Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: large literals block support (>262KB)

2 participants