perf(decoding): pre-allocate decode buffer from sequence block analysis by polaz · Pull Request #59 · structured-world/structured-zstd

polaz · 2026-04-03T15:58:12Z

Summary

Block-level pre-allocation: reserve MAX_BLOCK_SIZE (128 KB) before executing sequences — the spec maximum for a single decoded block. Eliminates repeated re-allocations in the hot decode loop
Frame-level: ensure window_size is reserved in new() so the first block does not grow from zero capacity. reset() already handles this via DecodeBuffer::reset
RLE/Raw block pre-allocation: reserve decompressed_size before the push loop for non-compressed block types
Safety: enforce MAXIMUM_ALLOWED_WINDOW_SIZE (100 MiB) in both new() and reset() paths; fix WindowSizeTooBig error message to report the actual enforced limit

Technical Details

Block-level reservation uses the constant MAX_BLOCK_SIZE (128 KB) — the zstd spec guarantees no single block's decoded output exceeds this. This is safe against corrupted inputs and avoids scanning the sequence vector.

Frame-level reservation is limited to window_size to preserve streaming-friendly memory behavior. Callers that drain output incrementally keep memory usage near window_size, not the full frame content size.

Test Plan

All 192 unit/integration tests pass
All 8 doc tests pass
Clippy clean with -D warnings
Cross-validation tests (rust↔C compress/decompress) pass
Roundtrip integrity tests (1000 iterations) pass

Closes #20

Summary by CodeRabbit

New Features
- Added a public method to pre-allocate decode buffer capacity.
- Introduced a 100 MiB maximum allowed window-size constant.
Bug Fixes
- Frames requesting window sizes above 100 MiB are now rejected with clearer error messaging.
Refactor
- Improved buffer pre-allocation across the decompression pipeline to reduce reallocations and improve performance.
Documentation
- Expanded docs clarifying reservation and allocation behavior.

- Block-level: calculate exact output size (sum of match lengths + literals buffer length) before executing sequences, issue a single reserve() call instead of per-sequence re-allocations - Frame-level: when frame_content_size is declared in the header, pre-allocate the decode buffer upfront to avoid incremental growth - RLE/Raw blocks: reserve decompressed_size before the push loop Eliminates repeated re-allocations in the hot decode path. Closes #20

coderabbitai · 2026-04-03T15:58:29Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 01cd3b72-40ac-4705-a3cc-1578ab115e91

📥 Commits

Reviewing files that changed from the base of the PR and between 4d38ca7 and b6ddf29.

📒 Files selected for processing (1)

zstd/src/decoding/decode_buffer.rs

📝 Walkthrough

Walkthrough

Added targeted pre-allocation calls across decoding: a public DecodeBuffer::reserve, a new MAXIMUM_ALLOWED_WINDOW_SIZE constant with validation at frame creation, pre-reserves for RLE/Raw blocks using header sizes, and a single reserve before sequence execution. Updated error display and expanded docs.

Changes

Cohort / File(s)	Summary
Decode buffer API `zstd/src/decoding/decode_buffer.rs`	Added `pub fn reserve(&mut self, amount: usize)` to forward capacity reservation to the internal ring buffer.
Common constants `zstd/src/common/mod.rs`	Added `pub const MAXIMUM_ALLOWED_WINDOW_SIZE: u64 = 1024 * 1024 * 100;`.
Frame-level validation & reserve `zstd/src/decoding/frame_decoder.rs`	`FrameDecoderState::new` now rejects window sizes > `MAXIMUM_ALLOWED_WINDOW_SIZE` and initializes `DecoderScratch` then calls `decoder_scratch.buffer.reserve(window_size as usize)` before storing it; doc updated.
Block-level pre-allocation & docs `zstd/src/decoding/block_decoder.rs`	Expanded doc comment for `decode_block_content`; RLE and Raw branches now pre-reserve `workspace.buffer` with `header.decompressed_size as usize` before reads/writes.
Sequence-level pre-allocation `zstd/src/decoding/sequence_execution.rs`	`execute_sequences()` now calls `scratch.buffer.reserve(MAX_BLOCK_SIZE as usize)` once before iterating sequences to reduce reallocations.
Error messaging `zstd/src/decoding/errors.rs`	`FrameDecoderError::WindowSizeTooBig` Display now references `MAXIMUM_ALLOWED_WINDOW_SIZE` as the allowed limit.

Sequence Diagram(s)

sequenceDiagram
    participant FrameDecoder
    participant BlockDecoder
    participant SequenceExecutor
    participant DecoderScratch
    participant Source

    FrameDecoder->>DecoderScratch: validate(window_size)
    FrameDecoder->>DecoderScratch: reserve(buffer, window_size) rgba(100,150,240,0.5)
    FrameDecoder->>BlockDecoder: hand off block & workspace
    BlockDecoder->>DecoderScratch: for RLE/Raw -> reserve(buffer, header.decompressed_size) rgba(120,200,140,0.5)
    BlockDecoder->>SequenceExecutor: hand off sequences & scratch
    SequenceExecutor->>DecoderScratch: reserve(buffer, MAX_BLOCK_SIZE) rgba(240,180,80,0.5)
    SequenceExecutor->>Source: read literals/matches
    SequenceExecutor->>DecoderScratch: write/push output bytes

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰
I nudged the buffer, soft and spry,
One tidy reserve before bytes fly.
No frantic grows, no frantic race,
Sequences settle into place.
A hop, a stash — memory's embrace.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: performance optimization through pre-allocation of decode buffers based on sequence block analysis.
Linked Issues check	✅ Passed	All acceptance criteria from issue `#20` are met: single reserve per block via sequence analysis, frame-level pre-allocation with MAXIMUM_ALLOWED_WINDOW_SIZE enforcement, and no correctness regressions confirmed by comprehensive testing.
Out of Scope Changes check	✅ Passed	All changes directly address issue `#20` objectives: buffer pre-allocation at block and frame levels, window size validation with limits, and error message corrections. No unrelated modifications detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/#20-perf-decoder-pre-allocation-from-sequence-block-an

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-04-03T16:00:13Z

Codecov Report

❌ Patch coverage is 80.00000% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
zstd/src/decoding/frame_decoder.rs	62.50%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

This PR improves decoder performance by reducing reallocations via pre-allocation of the decode buffer at the frame, block, and sequence-execution levels.

Changes:

Pre-reserve decode buffer capacity per sequence block using match-length summation plus literals size.
Pre-reserve decode buffer capacity at frame initialization/reset when frame_content_size is declared (capped).
Pre-reserve decode buffer capacity for Raw/RLE blocks based on decompressed_size, and adds a DecodeBuffer::reserve() helper.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`zstd/src/decoding/sequence_execution.rs`	Adds block-level reserve based on sequence analysis before executing sequences.
`zstd/src/decoding/frame_decoder.rs`	Adds frame-level reserve based on declared frame content size (or window size fallback).
`zstd/src/decoding/decode_buffer.rs`	Adds a `reserve()` API on `DecodeBuffer` to forward to the underlying ring buffer.
`zstd/src/decoding/block_decoder.rs`	Adds pre-reservation for Raw/RLE block decode paths using `decompressed_size`.

- Enforce MAXIMUM_ALLOWED_WINDOW_SIZE in FrameDecoderState::new() - Clamp sequence pre-allocation to MAX_BLOCK_SIZE (128KB) - Document decode_block_content, FrameDecoderState::new/reset

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/frame_decoder.rs`:
- Around line 147-153: The reset() path currently only reserves
decoder_scratch.buffer when frame_header.frame_content_size() (fcs) is declared,
unlike new() which always reserves either fcs or window_size; to fix, modify
reset() (near method reset, using decoder_scratch.buffer.reserve and
frame_header.frame_content_size()) to mirror new() by reserving either fcs (if
>0) or window_size (capped by MAXIMUM_ALLOWED_WINDOW_SIZE) — or if the
reuse-of-existing-capacity behavior is intentional, add a concise comment in
reset() explaining that omitting a reserve when fcs==0 is deliberate to preserve
prior capacity and avoid unnecessary allocations.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3ab32175-34ac-47df-bfdb-07c2aeddd12d

📥 Commits

Reviewing files that changed from the base of the PR and between 290200c and a7c66e0.

📒 Files selected for processing (3)

zstd/src/decoding/block_decoder.rs
zstd/src/decoding/frame_decoder.rs
zstd/src/decoding/sequence_execution.rs

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

- Replace per-block exact sum with constant MAX_BLOCK_SIZE reserve, eliminating extra iteration over sequences and overflow risk - Fix WindowSizeTooBig error message to report the enforced implementation limit (100 MiB) instead of the spec maximum - Make MAXIMUM_ALLOWED_WINDOW_SIZE pub(crate) with doc comment

polaz · 2026-04-03T16:35:42Z

@coderabbitai Re: reset() pre-allocation asymmetry — The asymmetry is expected. decoder_scratch.reset() internally calls DecodeBuffer::reset() which already reserves window_size. Added clarifying comment in reset().

coderabbitai · 2026-04-03T16:36:04Z

Tip

For best results, initiate chat on the files or code changes.

@polaz Thanks for the clear explanation! That makes sense — since decoder_scratch.reset() internally delegates to DecodeBuffer::reset() which already handles the window_size reservation, the extra reserve in reset() only needs to cover the frame-content-size delta on top of what's already reserved. The clarifying comment will help future readers follow that chain of responsibility without having to trace through the internals.

✏️ Learnings added

Learnt from: polaz
URL: https://github.com/structured-world/structured-zstd/pull/59

Timestamp: 2026-04-03T16:36:03.209Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` appears asymmetric to `new()` in its pre-allocation, but the asymmetry is intentional: `decoder_scratch.reset(window_size)` internally calls `DecodeBuffer::reset()` which already reserves `window_size`. Therefore `reset()` only needs to call `scratch.buffer.reserve(frame_content_size)` on top of that for the FCS delta, rather than explicitly reserving `window_size` again as `new()` does. Do not re-flag this as a bug or asymmetry.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1554-1559
Timestamp: 2026-04-02T12:03:04.619Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` has an early-return guard at `abs_pos >= u32::MAX` that disables all further HC insertions for streams >4 GiB, degrading `CompressionLevel::Better` to repcodes-only. This is a known limitation tracked in issue `#51`; the proper fix (table rebasing) is deferred. A TODO comment and a limitation note on `CompressionLevel::Better` doc were added in PR `#48`. Do not re-flag this as a new bug.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

- Remove FCS-based pre-allocation that could reserve up to 100 MiB even for streaming callers that drain incrementally - Keep window_size reservation in new() for initial capacity - Consolidate duplicate doc comment on MAXIMUM_ALLOWED_WINDOW_SIZE

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

- Relocate constant from frame_decoder to crate::common - Clarify decode_block_content and reset() doc strings - Reference shared constant from errors module

coderabbitai

♻️ Duplicate comments (1)

zstd/src/decoding/frame_decoder.rs (1)

121-126: ⚠️ Potential issue | 🟠 Major

reset() is missing the frame-content-size delta reserve.

After Line 139 resets to window_size, there is no extra reserve when frame_content_size() exceeds window_size. That drops the reset-path pre-allocation optimization and can reintroduce growth during decode for larger frames.

Proposed fix

 pub fn reset(&mut self, source: impl Read) -> Result<(), FrameDecoderError> {
     let (frame_header, header_size) = frame::read_frame_header(source)?;
     let window_size = frame_header.window_size()?;
+    let frame_content_size = frame_header.frame_content_size();

     if window_size > MAXIMUM_ALLOWED_WINDOW_SIZE {
         return Err(FrameDecoderError::WindowSizeTooBig {
             requested: window_size,
         });
@@
     self.frame_finished = false;
     self.block_counter = 0;
     self.decoder_scratch.reset(window_size as usize);
+    if frame_content_size > window_size {
+        if let Ok(extra) = (frame_content_size - window_size).try_into() {
+            self.decoder_scratch.buffer.reserve(extra);
+        }
+    }
     self.bytes_read_counter = u64::from(header_size);
     self.check_sum = None;
     self.using_dict = None;
     Ok(())
 }

Based on learnings: In zstd/src/decoding/frame_decoder.rs, FrameDecoderState::reset() should rely on DecodeBuffer::reset() for window_size and only add scratch.buffer.reserve(frame_content_size) for the FCS delta.

Also applies to: 136-143

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@zstd/src/decoding/frame_decoder.rs` around lines 121 - 126,
FrameDecoderState::reset() currently only relies on DecodeBuffer::reset() to
reserve window_size and misses reserving the extra delta when
frame_content_size() > window_size; update FrameDecoderState::reset() to call
DecodeBuffer::reset(source) as before and then compute let fcs =
self.header.frame_content_size().unwrap_or(0) and, if fcs > window_size, call
self.scratch.buffer.reserve((fcs - window_size) as usize) (or reserve_exact as
appropriate) so the scratch buffer is pre-allocated for the FCS delta and avoids
growth during decode.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@zstd/src/decoding/frame_decoder.rs`:
- Around line 121-126: FrameDecoderState::reset() currently only relies on
DecodeBuffer::reset() to reserve window_size and misses reserving the extra
delta when frame_content_size() > window_size; update FrameDecoderState::reset()
to call DecodeBuffer::reset(source) as before and then compute let fcs =
self.header.frame_content_size().unwrap_or(0) and, if fcs > window_size, call
self.scratch.buffer.reserve((fcs - window_size) as usize) (or reserve_exact as
appropriate) so the scratch buffer is pre-allocated for the FCS delta and avoids
growth during decode.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 203d3f2d-74ce-4a7d-ad44-45ddef0273cb

📥 Commits

Reviewing files that changed from the base of the PR and between c006a9b and 4d38ca7.

📒 Files selected for processing (4)

zstd/src/common/mod.rs
zstd/src/decoding/block_decoder.rs
zstd/src/decoding/errors.rs
zstd/src/decoding/frame_decoder.rs

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

polaz · 2026-04-03T17:39:23Z

@coderabbitai Re: reset() FCS delta reserve (duplicate finding) -- Intentionally omitted. The FCS delta can be up to ~99 MiB for multi-segment frames (FCS=100MB, window=1MB), which is exactly the streaming use case where eager reservation is harmful. Block-level MAX_BLOCK_SIZE (128 KB) reserve per block handles growth during decoding without penalizing streaming callers. For single-segment frames, window_size == FCS so the delta is always 0.

coderabbitai · 2026-04-03T17:39:59Z

Tip

For best results, initiate chat on the files or code changes.

@polaz Thanks for the detailed rationale! That two-case argument is precise and important:

Multi-segment frames (FCS ≫ window_size): the delta can be tens of MiB; eagerly reserving it would negate the streaming-friendly memory profile that reset() is supposed to preserve.
Single-segment frames (window_size == FCS): the delta is always zero, so there is nothing extra to reserve.

Block-level MAX_BLOCK_SIZE (128 KiB) per-block reservation then covers incremental growth safely without over-committing upfront.

I also want to flag that the earlier learning I recorded (ID 12338eab) says "reset() only needs to call scratch.buffer.reserve(frame_content_size) on top of that for the FCS delta", which is now misleading — it implies a reserve call happens when in fact it is intentionally absent. Would you like me to remove that learning and replace it with the corrected one below? I can do so if you confirm.

✏️ Learnings added

Learnt from: polaz
URL: https://github.com/structured-world/structured-zstd/pull/59

Timestamp: 2026-04-03T17:39:58.596Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` intentionally does NOT reserve the frame-content-size (FCS) delta beyond `window_size`. Two cases cover the design space: (1) multi-segment frames where FCS >> window_size — the delta can reach ~99 MiB, and eagerly reserving it would penalize streaming callers; (2) single-segment frames where window_size == FCS — the delta is always 0 so there is nothing to reserve. Incremental growth during decoding is handled by the per-block `MAX_BLOCK_SIZE` (128 KiB) reserve in `execute_sequences`. Do not flag the absence of an FCS-delta reserve in `reset()` as a missing pre-allocation.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T16:36:03.209Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` appears asymmetric to `new()` in its pre-allocation, but the asymmetry is intentional: `decoder_scratch.reset(window_size)` internally calls `DecodeBuffer::reset()` which already reserves `window_size`. Therefore `reset()` only needs to call `scratch.buffer.reserve(frame_content_size)` on top of that for the FCS delta, rather than explicitly reserving `window_size` again as `new()` does. Do not re-flag this as a bug or asymmetry.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1554-1559
Timestamp: 2026-04-02T12:03:04.619Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` has an early-return guard at `abs_pos >= u32::MAX` that disables all further HC insertions for streams >4 GiB, degrading `CompressionLevel::Better` to repcodes-only. This is a known limitation tracked in issue `#51`; the proper fix (table rebasing) is deferred. A TODO comment and a limitation note on `CompressionLevel::Better` doc were added in PR `#48`. Do not re-flag this as a new bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1531-1535
Timestamp: 2026-04-01T15:33:17.229Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` intentionally casts `abs_pos: usize` to `u32` (wrapping on streams > 4 GiB). This is safe by design: (1) the chain table slot is computed as `(pos_u32 as usize) & chain_mask`, so only the low bits matter and wrapping leaves the addressed slot unchanged; (2) `chain_candidates` filters any recovered `cur as usize` value against `history_abs_start` — a wrapped (small) value is always less than the current sliding-window start and is discarded. Do not flag this cast as a bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:498-509
Timestamp: 2026-04-02T22:26:07.979Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), `best_level_does_not_regress_vs_better` uses a `<=` (not strict `<`) assertion because the `repeat_offset_fixture(b"HelloWorld", ...)` input is simple enough that HC saturates at both Better (16 candidates) and Best (32 candidates) search depths, producing identical compressed sizes (~30243 bytes). Strict `<` would be a false positive on this fixture. The strict `Best < Better` quality assertion lives in `cross_validation::best_level_beats_better_on_corpus_proxy` on the decodecorpus sample. Do not re-flag the `<=` as a weakened guard.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 44
File: zstd/src/encoding/frame_compressor.rs:0-0
Timestamp: 2026-03-28T22:55:01.577Z
Learning: In `zstd/src/encoding/frame_compressor.rs`, `FrameCompressor::set_dictionary` returns `Result<Option<Dictionary>, DictionaryDecodeError>` and enforces two fail-fast programmer-error contracts via `Err(...)` rather than panics: (1) `dictionary.id == 0` returns `DictionaryDecodeError::ZeroDictionaryId`; (2) any zero entry in `dictionary.offset_hist` returns `DictionaryDecodeError::ZeroRepeatOffsetInDictionary`. The fallible parsing paths (`Dictionary::decode_dict`, `Dictionary::from_raw_content`) apply the same rejections at input boundaries. Do not flag these as inconsistencies — the uniform `Result`-based contract is intentional.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1470-1490
Timestamp: 2026-04-02T07:27:15.870Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::pick_lazy_match` intentionally queries `find_best_match(abs_pos + 1, ...)` and `find_best_match(abs_pos + 2, ...)` *before* those positions are inserted into the hash-chain tables. This matches C zstd lazy evaluation semantics: pre-seeding those positions would allow a position to match against itself, producing invalid self-matches. Do not flag the absent pre-seeding as a bug.

polaz · 2026-04-03T17:44:30Z

@coderabbitai resume and full review

coderabbitai · 2026-04-03T17:44:45Z

@polaz Sure! Resuming automatic reviews and triggering a full review of PR #59 now.

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T16:36:03.209Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` appears asymmetric to `new()` in its pre-allocation, but the asymmetry is intentional: `decoder_scratch.reset(window_size)` internally calls `DecodeBuffer::reset()` which already reserves `window_size`. Therefore `reset()` only needs to call `scratch.buffer.reserve(frame_content_size)` on top of that for the FCS delta, rather than explicitly reserving `window_size` again as `new()` does. Do not re-flag this as a bug or asymmetry.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T17:39:58.596Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` intentionally does NOT reserve the frame-content-size (FCS) delta beyond `window_size`. Two cases cover the design space: (1) multi-segment frames where FCS >> window_size — the delta can reach ~99 MiB, and eagerly reserving it would penalize streaming callers; (2) single-segment frames where window_size == FCS — the delta is always 0 so there is nothing to reserve. Incremental growth during decoding is handled by the per-block `MAX_BLOCK_SIZE` (128 KiB) reserve in `execute_sequences`. Do not flag the absence of an FCS-delta reserve in `reset()` as a missing pre-allocation.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1554-1559
Timestamp: 2026-04-02T12:03:04.619Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` has an early-return guard at `abs_pos >= u32::MAX` that disables all further HC insertions for streams >4 GiB, degrading `CompressionLevel::Better` to repcodes-only. This is a known limitation tracked in issue `#51`; the proper fix (table rebasing) is deferred. A TODO comment and a limitation note on `CompressionLevel::Better` doc were added in PR `#48`. Do not re-flag this as a new bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1470-1490
Timestamp: 2026-04-02T07:27:15.870Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::pick_lazy_match` intentionally queries `find_best_match(abs_pos + 1, ...)` and `find_best_match(abs_pos + 2, ...)` *before* those positions are inserted into the hash-chain tables. This matches C zstd lazy evaluation semantics: pre-seeding those positions would allow a position to match against itself, producing invalid self-matches. Do not flag the absent pre-seeding as a bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1531-1535
Timestamp: 2026-04-01T15:33:17.229Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` intentionally casts `abs_pos: usize` to `u32` (wrapping on streams > 4 GiB). This is safe by design: (1) the chain table slot is computed as `(pos_u32 as usize) & chain_mask`, so only the low bits matter and wrapping leaves the addressed slot unchanged; (2) `chain_candidates` filters any recovered `cur as usize` value against `history_abs_start` — a wrapped (small) value is always less than the current sliding-window start and is discarded. Do not flag this cast as a bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:498-509
Timestamp: 2026-04-02T22:26:07.979Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), `best_level_does_not_regress_vs_better` uses a `<=` (not strict `<`) assertion because the `repeat_offset_fixture(b"HelloWorld", ...)` input is simple enough that HC saturates at both Better (16 candidates) and Best (32 candidates) search depths, producing identical compressed sizes (~30243 bytes). Strict `<` would be a false positive on this fixture. The strict `Best < Better` quality assertion lives in `cross_validation::best_level_beats_better_on_corpus_proxy` on the decodecorpus sample. Do not re-flag the `<=` as a weakened guard.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 44
File: zstd/src/encoding/frame_compressor.rs:0-0
Timestamp: 2026-03-28T22:55:01.577Z
Learning: In `zstd/src/encoding/frame_compressor.rs`, `FrameCompressor::set_dictionary` returns `Result<Option<Dictionary>, DictionaryDecodeError>` and enforces two fail-fast programmer-error contracts via `Err(...)` rather than panics: (1) `dictionary.id == 0` returns `DictionaryDecodeError::ZeroDictionaryId`; (2) any zero entry in `dictionary.offset_hist` returns `DictionaryDecodeError::ZeroRepeatOffsetInDictionary`. The fallible parsing paths (`Dictionary::decode_dict`, `Dictionary::from_raw_content`) apply the same rejections at input boundaries. Do not flag these as inconsistencies — the uniform `Result`-based contract is intentional.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-01T06:49:34.233Z
Learning: In the `structured-zstd` codebase (`zstd/src/encoding/match_generator.rs`), `CompressionLevel::Default` uses the `DfastMatchGenerator` backend (with `pick_lazy_match` lazy evaluation), while `CompressionLevel::Better` uses the `HcMatchGenerator` (hash chain backend) which inserts positions before lazy evaluation inside its own `start_matching` loop. These two backends are independent; review comments about Dfast internals (e.g. position seeding order in `pick_lazy_match`) do not apply to the Better level's implementation path.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:336-339
Timestamp: 2026-04-02T12:03:03.856Z
Learning: In `zstd/src/encoding/match_generator.rs`, `min_primed_tail = 4` is correct for both `MatcherBackend::Dfast` and `MatcherBackend::HashChain` because `HcMatchGenerator::insert_position` unconditionally requires 4 bytes of hash lookahead (`if idx + 4 > concat.len() { return; }`). `backfill_boundary_positions` only re-inserts positions already fully within extended history; it cannot make a <4-byte dictionary tail fragment hashable. Do not suggest lowering `min_primed_tail` below 4 for the `HashChain` backend.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/encoding/frame_compressor.rs:280-287
Timestamp: 2026-04-02T16:31:58.167Z
Learning: In `structured-zstd`, `CompressionLevel::Best` intentionally uses `MatcherBackend::HashChain` (not a binary-tree/DUBT backend) due to memory constraints: a BT matcher for a 16 MiB window requires ~128–256 MB of tables (C zstd level 11 uses chainLog=25 → 256 MB), which is impractical. Best is distinguished from Better by deeper HC search (32 vs 16 candidates), larger hash/chain tables (2M/1M vs 1M/512K), and higher target match length (128 vs 48), keeping memory under 20 MB. It achieves a proportional compression gain (~0.27% better than Better on corpus) comparable to C zstd's L7→L11 delta (~0.14%). A proper BT backend is deferred as a future optimization. Do not flag Best's use of HashChain as missing a BT implementation.

✅ Actions performed

Reviews resumed.

Full review triggered.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/decode_buffer.rs`:
- Around line 63-69: Add the inline hint to the trivial forwarding method by
annotating the method (e.g., impl DecodeBuffer::reserve or the reserve(&mut
self, additional: usize) method) with #[inline] above its definition so the
compiler can eliminate the small call overhead in hot decode paths; keep the
implementation unchanged (still delegating to the underlying buffer's reserve).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8ee44b1d-2fd5-4edc-8387-0620a8f2e513

📥 Commits

Reviewing files that changed from the base of the PR and between aa92d4f and 4d38ca7.

📒 Files selected for processing (6)

zstd/src/common/mod.rs
zstd/src/decoding/block_decoder.rs
zstd/src/decoding/decode_buffer.rs
zstd/src/decoding/errors.rs
zstd/src/decoding/frame_decoder.rs
zstd/src/decoding/sequence_execution.rs

sw-release-bot

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'structured-zstd vs C FFI'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite	Current: `b6ddf29`	Previous: `aa92d4f`	Ratio
`compress/default/small-1k-random/matrix/pure_rust`	`5.286` ms	`4.263` ms	`1.24`
`compress/default/small-4k-log-lines/matrix/pure_rust`	`5.941` ms	`4.455` ms	`1.33`
`compress/better/low-entropy-1m/matrix/pure_rust`	`6.252` ms	`4.882` ms	`1.28`
`compress/better/large-log-stream/matrix/pure_rust`	`90.931` ms	`77.634` ms	`1.17`

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

polaz · 2026-04-03T18:03:14Z

@coderabbitai Re: DecodeBuffer::reserve #[inline] hint -- Fixed, added #[inline] to the forwarding method.

coderabbitai · 2026-04-03T18:03:29Z

Tip

For best results, initiate chat on the files or code changes.

@polaz Thanks for the quick fix! Adding #[inline] to a thin forwarding method like DecodeBuffer::reserve is the right call — it gives the compiler a clear nudge to eliminate the call overhead since the body is a single delegate to the inner ring buffer's reserve.

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T16:36:03.209Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` appears asymmetric to `new()` in its pre-allocation, but the asymmetry is intentional: `decoder_scratch.reset(window_size)` internally calls `DecodeBuffer::reset()` which already reserves `window_size`. Therefore `reset()` only needs to call `scratch.buffer.reserve(frame_content_size)` on top of that for the FCS delta, rather than explicitly reserving `window_size` again as `new()` does. Do not re-flag this as a bug or asymmetry.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T17:39:58.596Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` intentionally does NOT reserve the frame-content-size (FCS) delta beyond `window_size`. Two cases cover the design space: (1) multi-segment frames where FCS >> window_size — the delta can reach ~99 MiB, and eagerly reserving it would penalize streaming callers; (2) single-segment frames where window_size == FCS — the delta is always 0 so there is nothing to reserve. Incremental growth during decoding is handled by the per-block `MAX_BLOCK_SIZE` (128 KiB) reserve in `execute_sequences`. Do not flag the absence of an FCS-delta reserve in `reset()` as a missing pre-allocation.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1554-1559
Timestamp: 2026-04-02T12:03:04.619Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` has an early-return guard at `abs_pos >= u32::MAX` that disables all further HC insertions for streams >4 GiB, degrading `CompressionLevel::Better` to repcodes-only. This is a known limitation tracked in issue `#51`; the proper fix (table rebasing) is deferred. A TODO comment and a limitation note on `CompressionLevel::Better` doc were added in PR `#48`. Do not re-flag this as a new bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:498-509
Timestamp: 2026-04-02T22:26:07.979Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), `best_level_does_not_regress_vs_better` uses a `<=` (not strict `<`) assertion because the `repeat_offset_fixture(b"HelloWorld", ...)` input is simple enough that HC saturates at both Better (16 candidates) and Best (32 candidates) search depths, producing identical compressed sizes (~30243 bytes). Strict `<` would be a false positive on this fixture. The strict `Best < Better` quality assertion lives in `cross_validation::best_level_beats_better_on_corpus_proxy` on the decodecorpus sample. Do not re-flag the `<=` as a weakened guard.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1531-1535
Timestamp: 2026-04-01T15:33:17.229Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` intentionally casts `abs_pos: usize` to `u32` (wrapping on streams > 4 GiB). This is safe by design: (1) the chain table slot is computed as `(pos_u32 as usize) & chain_mask`, so only the low bits matter and wrapping leaves the addressed slot unchanged; (2) `chain_candidates` filters any recovered `cur as usize` value against `history_abs_start` — a wrapped (small) value is always less than the current sliding-window start and is discarded. Do not flag this cast as a bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1470-1490
Timestamp: 2026-04-02T07:27:15.870Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::pick_lazy_match` intentionally queries `find_best_match(abs_pos + 1, ...)` and `find_best_match(abs_pos + 2, ...)` *before* those positions are inserted into the hash-chain tables. This matches C zstd lazy evaluation semantics: pre-seeding those positions would allow a position to match against itself, producing invalid self-matches. Do not flag the absent pre-seeding as a bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 44
File: zstd/src/encoding/frame_compressor.rs:0-0
Timestamp: 2026-03-28T22:55:01.577Z
Learning: In `zstd/src/encoding/frame_compressor.rs`, `FrameCompressor::set_dictionary` returns `Result<Option<Dictionary>, DictionaryDecodeError>` and enforces two fail-fast programmer-error contracts via `Err(...)` rather than panics: (1) `dictionary.id == 0` returns `DictionaryDecodeError::ZeroDictionaryId`; (2) any zero entry in `dictionary.offset_hist` returns `DictionaryDecodeError::ZeroRepeatOffsetInDictionary`. The fallible parsing paths (`Dictionary::decode_dict`, `Dictionary::from_raw_content`) apply the same rejections at input boundaries. Do not flag these as inconsistencies — the uniform `Result`-based contract is intentional.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:336-339
Timestamp: 2026-04-02T12:03:03.856Z
Learning: In `zstd/src/encoding/match_generator.rs`, `min_primed_tail = 4` is correct for both `MatcherBackend::Dfast` and `MatcherBackend::HashChain` because `HcMatchGenerator::insert_position` unconditionally requires 4 bytes of hash lookahead (`if idx + 4 > concat.len() { return; }`). `backfill_boundary_positions` only re-inserts positions already fully within extended history; it cannot make a <4-byte dictionary tail fragment hashable. Do not suggest lowering `min_primed_tail` below 4 for the `HashChain` backend.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-01T06:49:34.233Z
Learning: In the `structured-zstd` codebase (`zstd/src/encoding/match_generator.rs`), `CompressionLevel::Default` uses the `DfastMatchGenerator` backend (with `pick_lazy_match` lazy evaluation), while `CompressionLevel::Better` uses the `HcMatchGenerator` (hash chain backend) which inserts positions before lazy evaluation inside its own `start_matching` loop. These two backends are independent; review comments about Dfast internals (e.g. position seeding order in `pick_lazy_match`) do not apply to the Better level's implementation path.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/encoding/frame_compressor.rs:280-287
Timestamp: 2026-04-02T16:31:58.167Z
Learning: In `structured-zstd`, `CompressionLevel::Best` intentionally uses `MatcherBackend::HashChain` (not a binary-tree/DUBT backend) due to memory constraints: a BT matcher for a 16 MiB window requires ~128–256 MB of tables (C zstd level 11 uses chainLog=25 → 256 MB), which is impractical. Best is distinguished from Better by deeper HC search (32 vs 16 candidates), larger hash/chain tables (2M/1M vs 1M/512K), and higher target match length (128 vs 48), keeping memory under 20 MB. It achieves a proportional compression gain (~0.27% better than Better on corpus) comparable to C zstd's L7→L11 delta (~0.14%). A proper BT backend is deferred as a future optimization. Do not flag Best's use of HashChain as missing a BT implementation.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings April 3, 2026 15:58

Copilot started reviewing on behalf of polaz April 3, 2026 15:58 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/src/decoding/frame_decoder.rs Outdated

Comment thread zstd/src/decoding/sequence_execution.rs Outdated

fix(decoding): bound pre-allocation against malformed input

a7c66e0

- Enforce MAXIMUM_ALLOWED_WINDOW_SIZE in FrameDecoderState::new() - Clamp sequence pre-allocation to MAX_BLOCK_SIZE (128KB) - Document decode_block_content, FrameDecoderState::new/reset

polaz requested a review from Copilot April 3, 2026 16:24

Copilot started reviewing on behalf of polaz April 3, 2026 16:24 View session

coderabbitai Bot reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/src/decoding/frame_decoder.rs Outdated

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/src/decoding/sequence_execution.rs Outdated

Comment thread zstd/src/decoding/sequence_execution.rs Outdated

Comment thread zstd/src/decoding/frame_decoder.rs

polaz requested a review from Copilot April 3, 2026 16:47

Copilot started reviewing on behalf of polaz April 3, 2026 16:47 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/src/decoding/frame_decoder.rs Outdated

Comment thread zstd/src/decoding/frame_decoder.rs Outdated

Comment thread zstd/src/decoding/frame_decoder.rs Outdated

polaz requested a review from Copilot April 3, 2026 17:01

Copilot started reviewing on behalf of polaz April 3, 2026 17:02 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/src/decoding/frame_decoder.rs Outdated

Comment thread zstd/src/decoding/block_decoder.rs Outdated

Comment thread zstd/src/decoding/errors.rs

refactor(decoding): move MAXIMUM_ALLOWED_WINDOW_SIZE to common module

4d38ca7

- Relocate constant from frame_decoder to crate::common - Clarify decode_block_content and reset() doc strings - Reference shared constant from errors module

polaz requested a review from Copilot April 3, 2026 17:29

Copilot started reviewing on behalf of polaz April 3, 2026 17:30 View session

coderabbitai Bot reviewed Apr 3, 2026

View reviewed changes

Copilot AI reviewed Apr 3, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/src/decoding/decode_buffer.rs

sw-release-bot Bot reviewed Apr 3, 2026

View reviewed changes

perf(decoding): inline DecodeBuffer::reserve forwarding method

b6ddf29

polaz requested a review from Copilot April 3, 2026 18:14

Copilot started reviewing on behalf of polaz April 3, 2026 18:14 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

polaz merged commit 98c1be0 into main Apr 3, 2026
13 of 15 checks passed

polaz deleted the feat/#20-perf-decoder-pre-allocation-from-sequence-block-an branch April 3, 2026 18:30

sw-release-bot Bot mentioned this pull request Apr 3, 2026

chore: release v0.0.6 #57

Merged

coderabbitai Bot mentioned this pull request Apr 5, 2026

perf(decoding): parallel block decompression for multi-block frames #72

Open

6 tasks

Conversation

polaz commented Apr 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Technical Details

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

codecov Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

polaz commented Apr 3, 2026

Uh oh!

coderabbitai Bot commented Apr 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

polaz commented Apr 3, 2026

Uh oh!

coderabbitai Bot commented Apr 3, 2026

Uh oh!

polaz commented Apr 3, 2026

Uh oh!

coderabbitai Bot commented Apr 3, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sw-release-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

polaz commented Apr 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 3, 2026 •

edited

Loading

codecov Bot commented Apr 3, 2026 •

edited

Loading

sw-release-bot Bot left a comment •

edited

Loading