Skip to content

feat(encoding): write frame content size in encoder output#60

Merged
polaz merged 12 commits intomainfrom
feat/#16-feat-frame-content-size-in-encoder-output
Apr 4, 2026
Merged

feat(encoding): write frame content size in encoder output#60
polaz merged 12 commits intomainfrom
feat/#16-feat-frame-content-size-in-encoder-output

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Apr 3, 2026

Summary

  • FrameCompressor now automatically writes Frame Content Size (FCS) in the frame header by buffering compressed blocks and deferring the header until total input size is known
  • StreamingEncoder gains set_pledged_content_size() API for callers who know input size upfront; finish() verifies the pledge matches bytes written
  • Fix FCS descriptor flag mapping bug (8 => 3, was 3 => 8 which would panic on 8-byte FCS values)
  • Add find_fcs_field_size() with correct +256 offset range for 2-byte FCS fields (256–65791)
  • Always include window descriptor alongside FCS (no single_segment) to avoid C zstd incompatibility with compressed blocks whose encoder window exceeds the declared content size

Technical Details

The C reference defaults ZSTD_c_contentSizeFlag to 1 (enabled) and writes FCS whenever the pledged source size is known. This enables decoders to pre-allocate output buffers, significantly improving decompression speed.

FrameCompressor: blocks are accumulated in an intermediate buffer instead of being flushed to the drain per-block. After all input is consumed, the correct header (with FCS) is written first, then the buffered blocks. This trades O(compressed_size) memory for automatic FCS detection without requiring callers to pledge.

StreamingEncoder: since the header must be written before data flows, FCS requires an explicit pledge via set_pledged_content_size(). Without a pledge, the header omits FCS (matching previous behavior).

No single_segment: the C encoder sets single_segment when windowSize >= pledgedSrcSize, which makes the decoder use FCS as window size. Our compressed blocks are encoded against the matcher's actual window (128KB+), so setting single_segment with a small FCS would give the decoder a window too small for the block format. We always include the window descriptor instead (costs 1 byte per frame).

Test Plan

  • 185 lib tests pass (including 1000-iteration roundtrip suites)
  • 15 cross-validation tests (Rust encoder ↔ C decoder and vice versa)
  • New fcs_header_written_and_c_zstd_compatible — all 5 levels × 4 input sizes verified against C zstd
  • New pledged_content_size_written_in_header — FCS roundtrip via StreamingEncoder
  • New pledged_content_size_mismatch_returns_error — pledge verification
  • New pledged_content_size_after_write_returns_error — API misuse guard
  • New pledged_content_size_c_zstd_compatible — streaming with pledge decoded by C zstd
  • New no_pledged_size_omits_fcs_from_header — backward compatibility
  • New find_fcs_field_size unit tests for both single-segment and non-single-segment paths
  • Clippy clean, doc-tests pass

Closes #16

Summary by CodeRabbit

  • New Features

    • Non-streaming compressor now emits Frame Content Size (FCS) automatically; streaming encoder adds an API to pledge an upfront frame size and enforces it.
  • Bug Fixes

    • Corrected FCS field sizing/serialization for external decoder compatibility.
    • Improved empty-input handling and stricter pledge/size validation during streaming writes and finish.
  • Documentation

    • README updated to mark FCS implemented and note streamer pledge requirement.
  • Tests

    • Added tests for pledged-size behavior, round-trip decoding, compatibility, and edge cases.

- FrameCompressor now buffers compressed blocks and writes the frame
  header after all input is consumed, so the total uncompressed size
  is known and encoded as Frame_Content_Size (FCS)
- StreamingEncoder gains set_pledged_content_size() for callers who
  know the input size upfront; finish() verifies the pledge
- Fix FCS descriptor flag mapping bug (8 => 3, was 3 => 8)
- Add find_fcs_field_size() with correct +256 offset range for
  2-byte FCS fields (256–65791 vs find_min_size's 256–65535)
- Always include window descriptor alongside FCS (no single_segment)
  to avoid C zstd incompatibility with compressed blocks whose
  encoder window exceeds the declared content size

Closes #16
Copilot AI review requested due to automatic review settings April 3, 2026 19:05
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 3, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f501afcf-4ddd-4e42-ae32-28218e90eaa9

📥 Commits

Reviewing files that changed from the base of the PR and between a734b37 and 3308ec2.

📒 Files selected for processing (2)
  • zstd/src/encoding/frame_compressor.rs
  • zstd/src/encoding/streaming_encoder.rs

📝 Walkthrough

Walkthrough

Buffers encoded blocks to compute and emit Frame Content Size (FCS) in headers; adds FCS sizing/serialization utilities; StreamingEncoder gains pledged-content-size API with write-time enforcement and finish-time validation; README updated and tests added for zstd compatibility.

Changes

Cohort / File(s) Summary
Documentation
README.md
Marked "Frame Content Size (FCS)" implemented in the compression feature checklist.
Frame Buffering & Emission
zstd/src/encoding/frame_compressor.rs
Buffer encoded blocks into all_blocks, track total_uncompressed, defer FrameHeader serialization until after input is read, adjust empty-input handling, assert non-zero window, and write header then buffered blocks; added std-only compatibility test.
Streaming Encoder (pledge handling)
zstd/src/encoding/streaming_encoder.rs
Added pledged_content_size: Option<u64> and bytes_consumed; set_pledged_content_size() API with pre-write validation; writes enforce pledge (may accept/truncate); finish() validates pledge equals consumed bytes; added related tests.
Frame Header FCS Serialization
zstd/src/encoding/frame_header.rs
Reworked FCS encoding: use find_fcs_field_size(... ) and serialize_fcs(...), update descriptor mapping, replace minify helper with direct serializers and assertions.
FCS Sizing Utility & Tests
zstd/src/encoding/util.rs
Added pub fn find_fcs_field_size(val: u64, single_segment: bool) -> usize implementing 1/2/4/8-byte selection with unit tests; minor doc tweak to find_min_size.

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant SE as StreamingEncoder
    participant FC as FrameCompressor
    participant FH as FrameHeader
    participant W as Writer

    App->>SE: set_pledged_content_size(size)
    SE->>SE: store pledged_content_size

    App->>SE: write(data)
    SE->>SE: check pledge / accept or truncate write
    SE->>FC: compress_block(data)
    FC->>FC: append BlockHeader + payload to all_blocks (track total_uncompressed)
    FC-->>SE: return (buffered)

    App->>SE: finish()
    SE->>SE: if pledged -> validate bytes_consumed == pledged
    alt mismatch
        SE-->>App: Error(InvalidInput)
    else ok
        SE->>FH: compute field_size = find_fcs_field_size(total_uncompressed, single_segment)
        FH->>FH: serialize_fcs(total_uncompressed, field_size)
        FH->>W: write FrameHeader (header_buf)
        SE->>W: write buffered blocks (all_blocks)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I counted hops and every byte I ate,
I pledged a size and guarded every plate.
One, two, four or eight — the header keeps the score,
I buffered all my carrots and wrote them out in store. 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: writing frame content size in encoder output, which is the central objective of the PR.
Linked Issues check ✅ Passed The PR implements all core coding requirements from issue #16: FCS encoding with correct field sizing (1/2/4/8 bytes), pledged-size validation in StreamingEncoder, automatic FCS inference in FrameCompressor, fixed FCS descriptor flag mapping, correct 2-byte field handling with +256 offset, and C zstd compatibility.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing FCS support: FrameCompressor now buffers blocks to compute total size, StreamingEncoder adds pledged-size tracking and validation, frame_header.rs corrects FCS encoding logic, util.rs adds FCS field-sizing logic, and README documents the feature. No unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#16-feat-frame-content-size-in-encoder-output

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the encoder to emit Zstandard Frame Content Size (FCS) in the frame header when the uncompressed size is known, improving decoder pre-allocation and cross-compatibility with the C zstd implementation.

Changes:

  • Add correct FCS sizing/serialization utilities and fix the FCS descriptor flag mapping (8-byte FCS flag value).
  • Update FrameCompressor to buffer compressed blocks so the header can be written after total input size is known (and always include the window descriptor).
  • Add StreamingEncoder::set_pledged_content_size() and enforce pledge verification in finish(), plus new tests and README capability update.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
zstd/src/encoding/util.rs Adds find_fcs_field_size() and unit tests for correct FCS field sizing rules.
zstd/src/encoding/frame_header.rs Uses find_fcs_field_size() for descriptor + payload serialization and fixes the FCS flag mapping bug.
zstd/src/encoding/frame_compressor.rs Buffers blocks to write the header with FCS after total input size is known; adds C-zstd compatibility test.
zstd/src/encoding/streaming_encoder.rs Adds pledged content size API and verifies pledge at finish; adds streaming tests.
README.md Documents support for Frame Content Size.

Comment thread zstd/src/encoding/util.rs Outdated
Comment thread zstd/src/encoding/streaming_encoder.rs Outdated
Comment thread zstd/src/encoding/frame_header.rs Outdated
Comment thread zstd/src/encoding/frame_compressor.rs
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 3, 2026

Codecov Report

❌ Patch coverage is 94.93088% with 11 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
zstd/src/encoding/frame_compressor.rs 91.93% 5 Missing ⚠️
zstd/src/encoding/frame_header.rs 76.92% 3 Missing ⚠️
zstd/src/encoding/streaming_encoder.rs 97.36% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@sw-release-bot sw-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'structured-zstd vs C FFI'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite Current: 3308ec2 Previous: 98c1be0 Ratio
compress/default/small-1k-random/matrix/pure_rust 6.012 ms 4.311 ms 1.39
compress/best/small-1k-random/matrix/pure_rust 0.312 ms 0.243 ms 1.28
compress/best/small-1k-random/matrix/c_ffi 0.837 ms 0.378 ms 2.21
compress/default/small-10k-random/matrix/pure_rust 7.707 ms 6.017 ms 1.28
compress/default/small-10k-random/matrix/c_ffi 0.028 ms 0.022 ms 1.27
compress/better/small-10k-random/matrix/c_ffi 0.113 ms 0.096 ms 1.18
compress/best/small-10k-random/matrix/pure_rust 0.863 ms 0.622 ms 1.39
compress/best/small-10k-random/matrix/c_ffi 0.787 ms 0.425 ms 1.85
compress/default/small-4k-log-lines/matrix/pure_rust 5.615 ms 4.611 ms 1.22
compress/best/small-4k-log-lines/matrix/c_ffi 0.714 ms 0.303 ms 2.36
compress/better/decodecorpus-z000033/matrix/pure_rust 83.996 ms 56.997 ms 1.47
compress/best/decodecorpus-z000033/matrix/pure_rust 100.077 ms 66.517 ms 1.50
compress/best/decodecorpus-z000033/matrix/c_ffi 23.067 ms 19.446 ms 1.19
compress/better/high-entropy-1m/matrix/pure_rust 125.318 ms 66.849 ms 1.87
compress/best/high-entropy-1m/matrix/pure_rust 145.539 ms 56.907 ms 2.56
compress/best/high-entropy-1m/matrix/c_ffi 1.789 ms 0.92 ms 1.94
compress/default/low-entropy-1m/matrix/pure_rust 11.966 ms 9.692 ms 1.23
compress/best/low-entropy-1m/matrix/pure_rust 5.569 ms 4.59 ms 1.21
compress/best/low-entropy-1m/matrix/c_ffi 1.72 ms 1.193 ms 1.44
compress/fastest/large-log-stream/matrix/c_ffi 2.763 ms 2.282 ms 1.21
decompress/best/high-entropy-1m/c_stream/matrix/c_ffi 0.168 ms 0.026 ms 6.46
compress-dict/better/small-10k-random/matrix/c_ffi_with_dict 0.122 ms 0.103 ms 1.18

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

polaz added 2 commits April 3, 2026 22:42
…ions

- Fix find_fcs_field_size doc: only 1-byte encoding is gated on
  single_segment; 2-byte range (256–65791) works regardless
- Fix set_pledged_content_size doc: remove stale Single_Segment
  claim since encoder always includes the window descriptor
- Add debug_assert guards to serialize_fcs against underflow
  when field_size==2 and val<256, reject unexpected field_size
  values with unreachable!()
Splice the small header (max 14 bytes) into the front of all_blocks
instead of writing header and blocks as two separate drain calls,
eliminating the second full-size buffer allocation.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Line 50: Update the README checkbox/line about "Frame Content Size" to clearly
describe the two FCS code paths: state that FrameCompressor automatically learns
and exposes FCS by buffering the entire frame (trading memory for automatic
FCS), whereas StreamingEncoder does not provide FCS unless the caller calls
set_pledged_content_size() before the first write() (allowing streaming with no
buffering); mention the memory tradeoff and behavior so callers know which API
gives FCS automatically and which requires an explicit pledged size.

In `@zstd/src/encoding/frame_compressor.rs`:
- Around line 218-223: The compress() path should reject matchers reporting
window_size() == 0 before attempting any reads: add the same guard used in
StreamingEncoder to check self.state.matcher.window_size() and return an error
(or the same failure variant) from compress() when it equals 0, preventing reads
into an empty block buffer and avoiding emitting an empty frame for non-empty
input; locate the check near the start of frame compressor logic (around the
existing let window_size = self.state.matcher.window_size() usage) and mirror
the error handling behavior used by StreamingEncoder.

In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 88-95: The method set_pledged_content_size currently checks
frame_started before verifying the encoder's sticky error state; call
ensure_open() at the start of set_pledged_content_size() and propagate any error
it returns before doing other checks, so that a prior I/O failure is honored;
then keep the existing frame_started check and only set pledged_content_size
when ensure_open() succeeds and frame has not started.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d3c28615-bd74-47de-a17e-c4d9d6a6b932

📥 Commits

Reviewing files that changed from the base of the PR and between 98c1be0 and 978262f.

📒 Files selected for processing (5)
  • README.md
  • zstd/src/encoding/frame_compressor.rs
  • zstd/src/encoding/frame_header.rs
  • zstd/src/encoding/streaming_encoder.rs
  • zstd/src/encoding/util.rs

Comment thread README.md Outdated
Comment thread zstd/src/encoding/frame_compressor.rs
Comment thread zstd/src/encoding/streaming_encoder.rs
…e API

- Add assert for window_size == 0 in FrameCompressor::compress(),
  matching the existing guard in StreamingEncoder
- Call ensure_open() in set_pledged_content_size() to honor sticky
  error state before accepting a pledge
- Clarify README FCS line: auto in FrameCompressor, pledge-based
  in StreamingEncoder
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Comment thread zstd/src/encoding/streaming_encoder.rs
Comment thread zstd/src/encoding/streaming_encoder.rs
Comment thread zstd/src/encoding/streaming_encoder.rs Outdated
Comment thread zstd/src/encoding/frame_compressor.rs Outdated
… header

- Move pledge validation in finish() before ensure_frame_started() to
  avoid emitting a header with incorrect FCS on mismatch
- Reject writes exceeding pledged content size in write() rather than
  only detecting at finish()
- Rename bytes_written → bytes_consumed to clarify it tracks
  uncompressed input, not drain output
- Revert splice prepend to two write_all calls — splice is O(n)
  memmove vs two cheap sequential writes
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread zstd/src/encoding/streaming_encoder.rs Outdated
The comment claimed validation happens "before emitting the frame
header", but the header is already written on the first write() via
ensure_frame_started(). Reword to accurately describe that validation
happens before finalizing the frame.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 936-943: The test pledged_content_size_mismatch_returns_error uses
the literal "only 10 bytes" which is misleading because its length is 13; update
the test in StreamingEncoder::new / set_pledged_content_size / write_all /
finish to use a clearer literal or comment—e.g., replace the string with "short
string" or update the message/comment to state the actual byte length—so the
test name and data are not confusing to future maintainers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 547f59a3-1cbd-4d54-9a91-6e6421f5365c

📥 Commits

Reviewing files that changed from the base of the PR and between d2ddcf7 and e69754b.

📒 Files selected for processing (1)
  • zstd/src/encoding/streaming_encoder.rs

Comment thread zstd/src/encoding/streaming_encoder.rs
Replace "only 10 bytes" (actually 13 bytes) with "short payload"
and add a clarifying comment.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 936-951: Add a new unit test that verifies writes which would
exceed a previously set pledge are rejected: create a StreamingEncoder via
StreamingEncoder::new(...), call set_pledged_content_size(5) on it, then call
write_all with a buffer longer than 5 and assert the call returns an Err whose
kind() is ErrorKind::InvalidInput; name the test
write_exceeding_pledge_returns_error to match existing tests and place it
alongside the other pledged-content-size tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 22e87764-412c-4a3c-a410-e1503d078216

📥 Commits

Reviewing files that changed from the base of the PR and between e69754b and db737b5.

📒 Files selected for processing (1)
  • zstd/src/encoding/streaming_encoder.rs

Comment thread zstd/src/encoding/streaming_encoder.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread zstd/src/encoding/streaming_encoder.rs
Comment thread zstd/src/encoding/frame_compressor.rs
…ial writes

- Replace overflow-prone addition with checked_sub + truncation to
  honor Write partial-write semantics (accept up to remaining pledge)
- Add write_exceeding_pledge_returns_error test
- Update compress() docstring to document O(compressed_size) buffering
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
zstd/src/encoding/streaming_encoder.rs (1)

936-951: ⚠️ Potential issue | 🟡 Minor

Add a dedicated test for write-time pledge overflow rejection.

Coverage still lacks a direct test for the write() guard path (Line 405-410) where a write exceeding the pledge must fail with InvalidInput.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/encoding/streaming_encoder.rs` around lines 936 - 951, Add a test
that sets a pledged content size on a StreamingEncoder and then calls write (the
path guarded at the write() method) with a payload that would overflow the
pledge so the write itself returns an error; specifically, create a
StreamingEncoder via StreamingEncoder::new, call
set_pledged_content_size(n).unwrap(), then call write (or write_all) with more
than n bytes and assert the returned error is ErrorKind::InvalidInput (matching
the write-time guard in write()). Ensure the test targets the write-time pledge
overflow case rather than the finish-time check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 978-989: The test no_pledged_size_omits_fcs_from_header currently
asserts header.frame_content_size() == 0 which is ambiguous; instead, after
reading the header via crate::decoding::frame::read_frame_header(...) assert the
Frame Header Descriptor's FCS flag is unset (check the descriptor/flags on the
returned header object rather than frame_content_size). Locate the header type
returned by read_frame_header and add an assertion that the FCS/content-checksum
bit (e.g., the descriptor's fcs/content_checksum flag or the header descriptor
bits) is false/zero to prove the FCS was omitted.

---

Duplicate comments:
In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 936-951: Add a test that sets a pledged content size on a
StreamingEncoder and then calls write (the path guarded at the write() method)
with a payload that would overflow the pledge so the write itself returns an
error; specifically, create a StreamingEncoder via StreamingEncoder::new, call
set_pledged_content_size(n).unwrap(), then call write (or write_all) with more
than n bytes and assert the returned error is ErrorKind::InvalidInput (matching
the write-time guard in write()). Ensure the test targets the write-time pledge
overflow case rather than the finish-time check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 15ba4574-f70a-4957-8d94-c33821ca4508

📥 Commits

Reviewing files that changed from the base of the PR and between e69754b and db737b5.

📒 Files selected for processing (1)
  • zstd/src/encoding/streaming_encoder.rs

Comment thread zstd/src/encoding/streaming_encoder.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comment thread zstd/src/encoding/streaming_encoder.rs Outdated
Comment thread zstd/src/encoding/frame_compressor.rs Outdated
Comment thread zstd/src/encoding/streaming_encoder.rs
… tests

- Move pledge exhaustion check before ensure_frame_started() so that
  misuse like pledge(0) + write(data) doesn't emit a partial header
- Assert descriptor.frame_content_size_bytes() in FCS tests to
  distinguish "FCS present with value 0" from "FCS omitted"
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 4, 2026

@coderabbitai re-review

@polaz polaz requested a review from Copilot April 4, 2026 07:09
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 4, 2026

@polaz Sure, I'll re-review the PR now!


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T17:39:58.596Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` intentionally does NOT reserve the frame-content-size (FCS) delta beyond `window_size`. Two cases cover the design space: (1) multi-segment frames where FCS >> window_size — the delta can reach ~99 MiB, and eagerly reserving it would penalize streaming callers; (2) single-segment frames where window_size == FCS — the delta is always 0 so there is nothing to reserve. Incremental growth during decoding is handled by the per-block `MAX_BLOCK_SIZE` (128 KiB) reserve in `execute_sequences`. Do not flag the absence of an FCS-delta reserve in `reset()` as a missing pre-allocation.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T16:36:03.209Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` appears asymmetric to `new()` in its pre-allocation, but the asymmetry is intentional: `decoder_scratch.reset(window_size)` internally calls `DecodeBuffer::reset()` which already reserves `window_size`. Therefore `reset()` only needs to call `scratch.buffer.reserve(frame_content_size)` on top of that for the FCS delta, rather than explicitly reserving `window_size` again as `new()` does. Do not re-flag this as a bug or asymmetry.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/encoding/frame_compressor.rs`:
- Around line 450-498: The test fcs_header_written_and_c_zstd_compatible
currently exercises inputs that only produce 4-byte FCS; add a 2-byte FCS case
(e.g. a 300‑byte buffer such as vec![0xCD; 300]) to the inputs array so the test
covers the 256..=65_791 / -256 encoding path; update the inputs variable in that
function (fcs_header_written_and_c_zstd_compatible) to include the new &[u8]
reference for the 300-byte vector so the round-trip and FCS presence assertions
exercise the 2-byte FCS encoding.

In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 967-973: Add a new assertion that exercises the partial-progress
write() path instead of only write_all(): create the StreamingEncoder via
StreamingEncoder::new and set_pledged_content_size(5), then call write(...) with
a buffer longer than 5 and assert it returns Ok(n) where n equals the remaining
pledge (e.g. 5), and then call write(...) again and assert it returns an Err
with ErrorKind::InvalidInput; reference the existing test
write_exceeding_pledge_returns_error, StreamingEncoder::new,
set_pledged_content_size, write_all and write to locate where to add this
additional case.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ac8f3162-e87a-4fd1-a3d1-797c8a28c2fd

📥 Commits

Reviewing files that changed from the base of the PR and between 98c1be0 and a734b37.

📒 Files selected for processing (5)
  • README.md
  • zstd/src/encoding/frame_compressor.rs
  • zstd/src/encoding/frame_header.rs
  • zstd/src/encoding/streaming_encoder.rs
  • zstd/src/encoding/util.rs

Comment thread zstd/src/encoding/frame_compressor.rs
Comment thread zstd/src/encoding/streaming_encoder.rs
… test

- Add 300-byte input to FCS interop matrix to exercise the 256–65791
  two-byte FCS encoding path with -256 offset
- Add write_straddling_pledge_reports_partial_progress test to verify
  write() returns Ok(n) at the pledge boundary instead of hard error
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 4, 2026

@coderabbitai re-review

@polaz polaz requested a review from Copilot April 4, 2026 07:34
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 4, 2026

@polaz Sure, I'll re-review the PR now!


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T16:36:03.209Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` appears asymmetric to `new()` in its pre-allocation, but the asymmetry is intentional: `decoder_scratch.reset(window_size)` internally calls `DecodeBuffer::reset()` which already reserves `window_size`. Therefore `reset()` only needs to call `scratch.buffer.reserve(frame_content_size)` on top of that for the FCS delta, rather than explicitly reserving `window_size` again as `new()` does. Do not re-flag this as a bug or asymmetry.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T17:39:58.596Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` intentionally does NOT reserve the frame-content-size (FCS) delta beyond `window_size`. Two cases cover the design space: (1) multi-segment frames where FCS >> window_size — the delta can reach ~99 MiB, and eagerly reserving it would penalize streaming callers; (2) single-segment frames where window_size == FCS — the delta is always 0 so there is nothing to reserve. Incremental growth during decoding is handled by the per-block `MAX_BLOCK_SIZE` (128 KiB) reserve in `execute_sequences`. Do not flag the absence of an FCS-delta reserve in `reset()` as a missing pre-allocation.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1554-1559
Timestamp: 2026-04-02T12:03:04.619Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` has an early-return guard at `abs_pos >= u32::MAX` that disables all further HC insertions for streams >4 GiB, degrading `CompressionLevel::Better` to repcodes-only. This is a known limitation tracked in issue `#51`; the proper fix (table rebasing) is deferred. A TODO comment and a limitation note on `CompressionLevel::Better` doc were added in PR `#48`. Do not re-flag this as a new bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 44
File: zstd/src/encoding/frame_compressor.rs:0-0
Timestamp: 2026-03-28T22:55:01.577Z
Learning: In `zstd/src/encoding/frame_compressor.rs`, `FrameCompressor::set_dictionary` returns `Result<Option<Dictionary>, DictionaryDecodeError>` and enforces two fail-fast programmer-error contracts via `Err(...)` rather than panics: (1) `dictionary.id == 0` returns `DictionaryDecodeError::ZeroDictionaryId`; (2) any zero entry in `dictionary.offset_hist` returns `DictionaryDecodeError::ZeroRepeatOffsetInDictionary`. The fallible parsing paths (`Dictionary::decode_dict`, `Dictionary::from_raw_content`) apply the same rejections at input boundaries. Do not flag these as inconsistencies — the uniform `Result`-based contract is intentional.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread zstd/src/encoding/frame_compressor.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: frame content size in encoder output

2 participants