feat(encoding): write frame content size in encoder output#60
Conversation
- FrameCompressor now buffers compressed blocks and writes the frame header after all input is consumed, so the total uncompressed size is known and encoded as Frame_Content_Size (FCS) - StreamingEncoder gains set_pledged_content_size() for callers who know the input size upfront; finish() verifies the pledge - Fix FCS descriptor flag mapping bug (8 => 3, was 3 => 8) - Add find_fcs_field_size() with correct +256 offset range for 2-byte FCS fields (256–65791 vs find_min_size's 256–65535) - Always include window descriptor alongside FCS (no single_segment) to avoid C zstd incompatibility with compressed blocks whose encoder window exceeds the declared content size Closes #16
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughBuffers encoded blocks to compute and emit Frame Content Size (FCS) in headers; adds FCS sizing/serialization utilities; StreamingEncoder gains pledged-content-size API with write-time enforcement and finish-time validation; README updated and tests added for zstd compatibility. Changes
Sequence DiagramsequenceDiagram
participant App as Application
participant SE as StreamingEncoder
participant FC as FrameCompressor
participant FH as FrameHeader
participant W as Writer
App->>SE: set_pledged_content_size(size)
SE->>SE: store pledged_content_size
App->>SE: write(data)
SE->>SE: check pledge / accept or truncate write
SE->>FC: compress_block(data)
FC->>FC: append BlockHeader + payload to all_blocks (track total_uncompressed)
FC-->>SE: return (buffered)
App->>SE: finish()
SE->>SE: if pledged -> validate bytes_consumed == pledged
alt mismatch
SE-->>App: Error(InvalidInput)
else ok
SE->>FH: compute field_size = find_fcs_field_size(total_uncompressed, single_segment)
FH->>FH: serialize_fcs(total_uncompressed, field_size)
FH->>W: write FrameHeader (header_buf)
SE->>W: write buffered blocks (all_blocks)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR enhances the encoder to emit Zstandard Frame Content Size (FCS) in the frame header when the uncompressed size is known, improving decoder pre-allocation and cross-compatibility with the C zstd implementation.
Changes:
- Add correct FCS sizing/serialization utilities and fix the FCS descriptor flag mapping (8-byte FCS flag value).
- Update
FrameCompressorto buffer compressed blocks so the header can be written after total input size is known (and always include the window descriptor). - Add
StreamingEncoder::set_pledged_content_size()and enforce pledge verification infinish(), plus new tests and README capability update.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| zstd/src/encoding/util.rs | Adds find_fcs_field_size() and unit tests for correct FCS field sizing rules. |
| zstd/src/encoding/frame_header.rs | Uses find_fcs_field_size() for descriptor + payload serialization and fixes the FCS flag mapping bug. |
| zstd/src/encoding/frame_compressor.rs | Buffers blocks to write the header with FCS after total input size is known; adds C-zstd compatibility test. |
| zstd/src/encoding/streaming_encoder.rs | Adds pledged content size API and verifies pledge at finish; adds streaming tests. |
| README.md | Documents support for Frame Content Size. |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'structured-zstd vs C FFI'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.
| Benchmark suite | Current: 3308ec2 | Previous: 98c1be0 | Ratio |
|---|---|---|---|
compress/default/small-1k-random/matrix/pure_rust |
6.012 ms |
4.311 ms |
1.39 |
compress/best/small-1k-random/matrix/pure_rust |
0.312 ms |
0.243 ms |
1.28 |
compress/best/small-1k-random/matrix/c_ffi |
0.837 ms |
0.378 ms |
2.21 |
compress/default/small-10k-random/matrix/pure_rust |
7.707 ms |
6.017 ms |
1.28 |
compress/default/small-10k-random/matrix/c_ffi |
0.028 ms |
0.022 ms |
1.27 |
compress/better/small-10k-random/matrix/c_ffi |
0.113 ms |
0.096 ms |
1.18 |
compress/best/small-10k-random/matrix/pure_rust |
0.863 ms |
0.622 ms |
1.39 |
compress/best/small-10k-random/matrix/c_ffi |
0.787 ms |
0.425 ms |
1.85 |
compress/default/small-4k-log-lines/matrix/pure_rust |
5.615 ms |
4.611 ms |
1.22 |
compress/best/small-4k-log-lines/matrix/c_ffi |
0.714 ms |
0.303 ms |
2.36 |
compress/better/decodecorpus-z000033/matrix/pure_rust |
83.996 ms |
56.997 ms |
1.47 |
compress/best/decodecorpus-z000033/matrix/pure_rust |
100.077 ms |
66.517 ms |
1.50 |
compress/best/decodecorpus-z000033/matrix/c_ffi |
23.067 ms |
19.446 ms |
1.19 |
compress/better/high-entropy-1m/matrix/pure_rust |
125.318 ms |
66.849 ms |
1.87 |
compress/best/high-entropy-1m/matrix/pure_rust |
145.539 ms |
56.907 ms |
2.56 |
compress/best/high-entropy-1m/matrix/c_ffi |
1.789 ms |
0.92 ms |
1.94 |
compress/default/low-entropy-1m/matrix/pure_rust |
11.966 ms |
9.692 ms |
1.23 |
compress/best/low-entropy-1m/matrix/pure_rust |
5.569 ms |
4.59 ms |
1.21 |
compress/best/low-entropy-1m/matrix/c_ffi |
1.72 ms |
1.193 ms |
1.44 |
compress/fastest/large-log-stream/matrix/c_ffi |
2.763 ms |
2.282 ms |
1.21 |
decompress/best/high-entropy-1m/c_stream/matrix/c_ffi |
0.168 ms |
0.026 ms |
6.46 |
compress-dict/better/small-10k-random/matrix/c_ffi_with_dict |
0.122 ms |
0.103 ms |
1.18 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @polaz
…ions - Fix find_fcs_field_size doc: only 1-byte encoding is gated on single_segment; 2-byte range (256–65791) works regardless - Fix set_pledged_content_size doc: remove stale Single_Segment claim since encoder always includes the window descriptor - Add debug_assert guards to serialize_fcs against underflow when field_size==2 and val<256, reject unexpected field_size values with unreachable!()
Splice the small header (max 14 bytes) into the front of all_blocks instead of writing header and blocks as two separate drain calls, eliminating the second full-size buffer allocation.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Line 50: Update the README checkbox/line about "Frame Content Size" to clearly
describe the two FCS code paths: state that FrameCompressor automatically learns
and exposes FCS by buffering the entire frame (trading memory for automatic
FCS), whereas StreamingEncoder does not provide FCS unless the caller calls
set_pledged_content_size() before the first write() (allowing streaming with no
buffering); mention the memory tradeoff and behavior so callers know which API
gives FCS automatically and which requires an explicit pledged size.
In `@zstd/src/encoding/frame_compressor.rs`:
- Around line 218-223: The compress() path should reject matchers reporting
window_size() == 0 before attempting any reads: add the same guard used in
StreamingEncoder to check self.state.matcher.window_size() and return an error
(or the same failure variant) from compress() when it equals 0, preventing reads
into an empty block buffer and avoiding emitting an empty frame for non-empty
input; locate the check near the start of frame compressor logic (around the
existing let window_size = self.state.matcher.window_size() usage) and mirror
the error handling behavior used by StreamingEncoder.
In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 88-95: The method set_pledged_content_size currently checks
frame_started before verifying the encoder's sticky error state; call
ensure_open() at the start of set_pledged_content_size() and propagate any error
it returns before doing other checks, so that a prior I/O failure is honored;
then keep the existing frame_started check and only set pledged_content_size
when ensure_open() succeeds and frame has not started.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d3c28615-bd74-47de-a17e-c4d9d6a6b932
📒 Files selected for processing (5)
README.mdzstd/src/encoding/frame_compressor.rszstd/src/encoding/frame_header.rszstd/src/encoding/streaming_encoder.rszstd/src/encoding/util.rs
…e API - Add assert for window_size == 0 in FrameCompressor::compress(), matching the existing guard in StreamingEncoder - Call ensure_open() in set_pledged_content_size() to honor sticky error state before accepting a pledge - Clarify README FCS line: auto in FrameCompressor, pledge-based in StreamingEncoder
… header - Move pledge validation in finish() before ensure_frame_started() to avoid emitting a header with incorrect FCS on mismatch - Reject writes exceeding pledged content size in write() rather than only detecting at finish() - Rename bytes_written → bytes_consumed to clarify it tracks uncompressed input, not drain output - Revert splice prepend to two write_all calls — splice is O(n) memmove vs two cheap sequential writes
The comment claimed validation happens "before emitting the frame header", but the header is already written on the first write() via ensure_frame_started(). Reword to accurately describe that validation happens before finalizing the frame.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 936-943: The test pledged_content_size_mismatch_returns_error uses
the literal "only 10 bytes" which is misleading because its length is 13; update
the test in StreamingEncoder::new / set_pledged_content_size / write_all /
finish to use a clearer literal or comment—e.g., replace the string with "short
string" or update the message/comment to state the actual byte length—so the
test name and data are not confusing to future maintainers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 547f59a3-1cbd-4d54-9a91-6e6421f5365c
📒 Files selected for processing (1)
zstd/src/encoding/streaming_encoder.rs
Replace "only 10 bytes" (actually 13 bytes) with "short payload" and add a clarifying comment.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 936-951: Add a new unit test that verifies writes which would
exceed a previously set pledge are rejected: create a StreamingEncoder via
StreamingEncoder::new(...), call set_pledged_content_size(5) on it, then call
write_all with a buffer longer than 5 and assert the call returns an Err whose
kind() is ErrorKind::InvalidInput; name the test
write_exceeding_pledge_returns_error to match existing tests and place it
alongside the other pledged-content-size tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 22e87764-412c-4a3c-a410-e1503d078216
📒 Files selected for processing (1)
zstd/src/encoding/streaming_encoder.rs
…ial writes - Replace overflow-prone addition with checked_sub + truncation to honor Write partial-write semantics (accept up to remaining pledge) - Add write_exceeding_pledge_returns_error test - Update compress() docstring to document O(compressed_size) buffering
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
zstd/src/encoding/streaming_encoder.rs (1)
936-951:⚠️ Potential issue | 🟡 MinorAdd a dedicated test for write-time pledge overflow rejection.
Coverage still lacks a direct test for the
write()guard path (Line 405-410) where a write exceeding the pledge must fail withInvalidInput.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@zstd/src/encoding/streaming_encoder.rs` around lines 936 - 951, Add a test that sets a pledged content size on a StreamingEncoder and then calls write (the path guarded at the write() method) with a payload that would overflow the pledge so the write itself returns an error; specifically, create a StreamingEncoder via StreamingEncoder::new, call set_pledged_content_size(n).unwrap(), then call write (or write_all) with more than n bytes and assert the returned error is ErrorKind::InvalidInput (matching the write-time guard in write()). Ensure the test targets the write-time pledge overflow case rather than the finish-time check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 978-989: The test no_pledged_size_omits_fcs_from_header currently
asserts header.frame_content_size() == 0 which is ambiguous; instead, after
reading the header via crate::decoding::frame::read_frame_header(...) assert the
Frame Header Descriptor's FCS flag is unset (check the descriptor/flags on the
returned header object rather than frame_content_size). Locate the header type
returned by read_frame_header and add an assertion that the FCS/content-checksum
bit (e.g., the descriptor's fcs/content_checksum flag or the header descriptor
bits) is false/zero to prove the FCS was omitted.
---
Duplicate comments:
In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 936-951: Add a test that sets a pledged content size on a
StreamingEncoder and then calls write (the path guarded at the write() method)
with a payload that would overflow the pledge so the write itself returns an
error; specifically, create a StreamingEncoder via StreamingEncoder::new, call
set_pledged_content_size(n).unwrap(), then call write (or write_all) with more
than n bytes and assert the returned error is ErrorKind::InvalidInput (matching
the write-time guard in write()). Ensure the test targets the write-time pledge
overflow case rather than the finish-time check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 15ba4574-f70a-4957-8d94-c33821ca4508
📒 Files selected for processing (1)
zstd/src/encoding/streaming_encoder.rs
… tests - Move pledge exhaustion check before ensure_frame_started() so that misuse like pledge(0) + write(data) doesn't emit a partial header - Assert descriptor.frame_content_size_bytes() in FCS tests to distinguish "FCS present with value 0" from "FCS omitted"
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/frame_compressor.rs`:
- Around line 450-498: The test fcs_header_written_and_c_zstd_compatible
currently exercises inputs that only produce 4-byte FCS; add a 2-byte FCS case
(e.g. a 300‑byte buffer such as vec![0xCD; 300]) to the inputs array so the test
covers the 256..=65_791 / -256 encoding path; update the inputs variable in that
function (fcs_header_written_and_c_zstd_compatible) to include the new &[u8]
reference for the 300-byte vector so the round-trip and FCS presence assertions
exercise the 2-byte FCS encoding.
In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 967-973: Add a new assertion that exercises the partial-progress
write() path instead of only write_all(): create the StreamingEncoder via
StreamingEncoder::new and set_pledged_content_size(5), then call write(...) with
a buffer longer than 5 and assert it returns Ok(n) where n equals the remaining
pledge (e.g. 5), and then call write(...) again and assert it returns an Err
with ErrorKind::InvalidInput; reference the existing test
write_exceeding_pledge_returns_error, StreamingEncoder::new,
set_pledged_content_size, write_all and write to locate where to add this
additional case.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ac8f3162-e87a-4fd1-a3d1-797c8a28c2fd
📒 Files selected for processing (5)
README.mdzstd/src/encoding/frame_compressor.rszstd/src/encoding/frame_header.rszstd/src/encoding/streaming_encoder.rszstd/src/encoding/util.rs
… test - Add 300-byte input to FCS interop matrix to exercise the 256–65791 two-byte FCS encoding path with -256 offset - Add write_straddling_pledge_reports_partial_progress test to verify write() returns Ok(n) at the pledge boundary instead of hard error
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
Summary
FrameCompressornow automatically writes Frame Content Size (FCS) in the frame header by buffering compressed blocks and deferring the header until total input size is knownStreamingEncodergainsset_pledged_content_size()API for callers who know input size upfront;finish()verifies the pledge matches bytes written8 => 3, was3 => 8which would panic on 8-byte FCS values)find_fcs_field_size()with correct +256 offset range for 2-byte FCS fields (256–65791)single_segment) to avoid C zstd incompatibility with compressed blocks whose encoder window exceeds the declared content sizeTechnical Details
The C reference defaults
ZSTD_c_contentSizeFlagto 1 (enabled) and writes FCS whenever the pledged source size is known. This enables decoders to pre-allocate output buffers, significantly improving decompression speed.FrameCompressor: blocks are accumulated in an intermediate buffer instead of being flushed to the drain per-block. After all input is consumed, the correct header (with FCS) is written first, then the buffered blocks. This trades O(compressed_size) memory for automatic FCS detection without requiring callers to pledge.
StreamingEncoder: since the header must be written before data flows, FCS requires an explicit pledge via
set_pledged_content_size(). Without a pledge, the header omits FCS (matching previous behavior).No
single_segment: the C encoder setssingle_segmentwhenwindowSize >= pledgedSrcSize, which makes the decoder use FCS as window size. Our compressed blocks are encoded against the matcher's actual window (128KB+), so settingsingle_segmentwith a small FCS would give the decoder a window too small for the block format. We always include the window descriptor instead (costs 1 byte per frame).Test Plan
fcs_header_written_and_c_zstd_compatible— all 5 levels × 4 input sizes verified against C zstdpledged_content_size_written_in_header— FCS roundtrip via StreamingEncoderpledged_content_size_mismatch_returns_error— pledge verificationpledged_content_size_after_write_returns_error— API misuse guardpledged_content_size_c_zstd_compatible— streaming with pledge decoded by C zstdno_pledged_size_omits_fcs_from_header— backward compatibilityfind_fcs_field_sizeunit tests for both single-segment and non-single-segment pathsCloses #16
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests