Conversation
- Block-level: calculate exact output size (sum of match lengths + literals buffer length) before executing sequences, issue a single reserve() call instead of per-sequence re-allocations - Frame-level: when frame_content_size is declared in the header, pre-allocate the decode buffer upfront to avoid incremental growth - RLE/Raw blocks: reserve decompressed_size before the push loop Eliminates repeated re-allocations in the hot decode path. Closes #20
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdded targeted pre-allocation calls across decoding: a public Changes
Sequence Diagram(s)sequenceDiagram
participant FrameDecoder
participant BlockDecoder
participant SequenceExecutor
participant DecoderScratch
participant Source
FrameDecoder->>DecoderScratch: validate(window_size)
FrameDecoder->>DecoderScratch: reserve(buffer, window_size) rgba(100,150,240,0.5)
FrameDecoder->>BlockDecoder: hand off block & workspace
BlockDecoder->>DecoderScratch: for RLE/Raw -> reserve(buffer, header.decompressed_size) rgba(120,200,140,0.5)
BlockDecoder->>SequenceExecutor: hand off sequences & scratch
SequenceExecutor->>DecoderScratch: reserve(buffer, MAX_BLOCK_SIZE) rgba(240,180,80,0.5)
SequenceExecutor->>Source: read literals/matches
SequenceExecutor->>DecoderScratch: write/push output bytes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR improves decoder performance by reducing reallocations via pre-allocation of the decode buffer at the frame, block, and sequence-execution levels.
Changes:
- Pre-reserve decode buffer capacity per sequence block using match-length summation plus literals size.
- Pre-reserve decode buffer capacity at frame initialization/reset when
frame_content_sizeis declared (capped). - Pre-reserve decode buffer capacity for Raw/RLE blocks based on
decompressed_size, and adds aDecodeBuffer::reserve()helper.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
zstd/src/decoding/sequence_execution.rs |
Adds block-level reserve based on sequence analysis before executing sequences. |
zstd/src/decoding/frame_decoder.rs |
Adds frame-level reserve based on declared frame content size (or window size fallback). |
zstd/src/decoding/decode_buffer.rs |
Adds a reserve() API on DecodeBuffer to forward to the underlying ring buffer. |
zstd/src/decoding/block_decoder.rs |
Adds pre-reservation for Raw/RLE block decode paths using decompressed_size. |
- Enforce MAXIMUM_ALLOWED_WINDOW_SIZE in FrameDecoderState::new() - Clamp sequence pre-allocation to MAX_BLOCK_SIZE (128KB) - Document decode_block_content, FrameDecoderState::new/reset
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/decoding/frame_decoder.rs`:
- Around line 147-153: The reset() path currently only reserves
decoder_scratch.buffer when frame_header.frame_content_size() (fcs) is declared,
unlike new() which always reserves either fcs or window_size; to fix, modify
reset() (near method reset, using decoder_scratch.buffer.reserve and
frame_header.frame_content_size()) to mirror new() by reserving either fcs (if
>0) or window_size (capped by MAXIMUM_ALLOWED_WINDOW_SIZE) — or if the
reuse-of-existing-capacity behavior is intentional, add a concise comment in
reset() explaining that omitting a reserve when fcs==0 is deliberate to preserve
prior capacity and avoid unnecessary allocations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3ab32175-34ac-47df-bfdb-07c2aeddd12d
📒 Files selected for processing (3)
zstd/src/decoding/block_decoder.rszstd/src/decoding/frame_decoder.rszstd/src/decoding/sequence_execution.rs
- Replace per-block exact sum with constant MAX_BLOCK_SIZE reserve, eliminating extra iteration over sequences and overflow risk - Fix WindowSizeTooBig error message to report the enforced implementation limit (100 MiB) instead of the spec maximum - Make MAXIMUM_ALLOWED_WINDOW_SIZE pub(crate) with doc comment
|
@coderabbitai Re: reset() pre-allocation asymmetry — The asymmetry is expected. decoder_scratch.reset() internally calls DecodeBuffer::reset() which already reserves window_size. Added clarifying comment in reset(). |
|
Tip For best results, initiate chat on the files or code changes.
✏️ Learnings added
🧠 Learnings used |
- Remove FCS-based pre-allocation that could reserve up to 100 MiB even for streaming callers that drain incrementally - Keep window_size reservation in new() for initial capacity - Consolidate duplicate doc comment on MAXIMUM_ALLOWED_WINDOW_SIZE
- Relocate constant from frame_decoder to crate::common - Clarify decode_block_content and reset() doc strings - Reference shared constant from errors module
There was a problem hiding this comment.
♻️ Duplicate comments (1)
zstd/src/decoding/frame_decoder.rs (1)
121-126:⚠️ Potential issue | 🟠 Major
reset()is missing the frame-content-size delta reserve.After Line 139 resets to
window_size, there is no extra reserve whenframe_content_size()exceedswindow_size. That drops the reset-path pre-allocation optimization and can reintroduce growth during decode for larger frames.Proposed fix
pub fn reset(&mut self, source: impl Read) -> Result<(), FrameDecoderError> { let (frame_header, header_size) = frame::read_frame_header(source)?; let window_size = frame_header.window_size()?; + let frame_content_size = frame_header.frame_content_size(); if window_size > MAXIMUM_ALLOWED_WINDOW_SIZE { return Err(FrameDecoderError::WindowSizeTooBig { requested: window_size, }); @@ self.frame_finished = false; self.block_counter = 0; self.decoder_scratch.reset(window_size as usize); + if frame_content_size > window_size { + if let Ok(extra) = (frame_content_size - window_size).try_into() { + self.decoder_scratch.buffer.reserve(extra); + } + } self.bytes_read_counter = u64::from(header_size); self.check_sum = None; self.using_dict = None; Ok(()) }Based on learnings: In
zstd/src/decoding/frame_decoder.rs,FrameDecoderState::reset()should rely onDecodeBuffer::reset()forwindow_sizeand only addscratch.buffer.reserve(frame_content_size)for the FCS delta.Also applies to: 136-143
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@zstd/src/decoding/frame_decoder.rs` around lines 121 - 126, FrameDecoderState::reset() currently only relies on DecodeBuffer::reset() to reserve window_size and misses reserving the extra delta when frame_content_size() > window_size; update FrameDecoderState::reset() to call DecodeBuffer::reset(source) as before and then compute let fcs = self.header.frame_content_size().unwrap_or(0) and, if fcs > window_size, call self.scratch.buffer.reserve((fcs - window_size) as usize) (or reserve_exact as appropriate) so the scratch buffer is pre-allocated for the FCS delta and avoids growth during decode.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@zstd/src/decoding/frame_decoder.rs`:
- Around line 121-126: FrameDecoderState::reset() currently only relies on
DecodeBuffer::reset() to reserve window_size and misses reserving the extra
delta when frame_content_size() > window_size; update FrameDecoderState::reset()
to call DecodeBuffer::reset(source) as before and then compute let fcs =
self.header.frame_content_size().unwrap_or(0) and, if fcs > window_size, call
self.scratch.buffer.reserve((fcs - window_size) as usize) (or reserve_exact as
appropriate) so the scratch buffer is pre-allocated for the FCS delta and avoids
growth during decode.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 203d3f2d-74ce-4a7d-ad44-45ddef0273cb
📒 Files selected for processing (4)
zstd/src/common/mod.rszstd/src/decoding/block_decoder.rszstd/src/decoding/errors.rszstd/src/decoding/frame_decoder.rs
|
@coderabbitai Re: reset() FCS delta reserve (duplicate finding) -- Intentionally omitted. The FCS delta can be up to ~99 MiB for multi-segment frames (FCS=100MB, window=1MB), which is exactly the streaming use case where eager reservation is harmful. Block-level MAX_BLOCK_SIZE (128 KB) reserve per block handles growth during decoding without penalizing streaming callers. For single-segment frames, window_size == FCS so the delta is always 0. |
|
Tip For best results, initiate chat on the files or code changes.
Block-level I also want to flag that the earlier learning I recorded (ID ✏️ Learnings added
🧠 Learnings used |
|
@coderabbitai resume and full review |
|
🧠 Learnings used✅ Actions performedReviews resumed. Full review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/decoding/decode_buffer.rs`:
- Around line 63-69: Add the inline hint to the trivial forwarding method by
annotating the method (e.g., impl DecodeBuffer::reserve or the reserve(&mut
self, additional: usize) method) with #[inline] above its definition so the
compiler can eliminate the small call overhead in hot decode paths; keep the
implementation unchanged (still delegating to the underlying buffer's reserve).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8ee44b1d-2fd5-4edc-8387-0620a8f2e513
📒 Files selected for processing (6)
zstd/src/common/mod.rszstd/src/decoding/block_decoder.rszstd/src/decoding/decode_buffer.rszstd/src/decoding/errors.rszstd/src/decoding/frame_decoder.rszstd/src/decoding/sequence_execution.rs
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'structured-zstd vs C FFI'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.
| Benchmark suite | Current: b6ddf29 | Previous: aa92d4f | Ratio |
|---|---|---|---|
compress/default/small-1k-random/matrix/pure_rust |
5.286 ms |
4.263 ms |
1.24 |
compress/default/small-4k-log-lines/matrix/pure_rust |
5.941 ms |
4.455 ms |
1.33 |
compress/better/low-entropy-1m/matrix/pure_rust |
6.252 ms |
4.882 ms |
1.28 |
compress/better/large-log-stream/matrix/pure_rust |
90.931 ms |
77.634 ms |
1.17 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @polaz
|
@coderabbitai Re: DecodeBuffer::reserve #[inline] hint -- Fixed, added #[inline] to the forwarding method. |
|
Tip For best results, initiate chat on the files or code changes.
🧠 Learnings used |
Summary
MAX_BLOCK_SIZE(128 KB) before executing sequences — the spec maximum for a single decoded block. Eliminates repeated re-allocations in the hot decode loopwindow_sizeis reserved innew()so the first block does not grow from zero capacity.reset()already handles this viaDecodeBuffer::resetdecompressed_sizebefore the push loop for non-compressed block typesMAXIMUM_ALLOWED_WINDOW_SIZE(100 MiB) in bothnew()andreset()paths; fixWindowSizeTooBigerror message to report the actual enforced limitTechnical Details
Block-level reservation uses the constant
MAX_BLOCK_SIZE(128 KB) — the zstd spec guarantees no single block's decoded output exceeds this. This is safe against corrupted inputs and avoids scanning the sequence vector.Frame-level reservation is limited to
window_sizeto preserve streaming-friendly memory behavior. Callers that drain output incrementally keep memory usage nearwindow_size, not the full frame content size.Test Plan
-D warningsCloses #20
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Documentation