Skip to content

perf(decoding): add shared dictionary handle#105

Merged
polaz merged 26 commits intomainfrom
perf/#86-ddict-handle
Apr 12, 2026
Merged

perf(decoding): add shared dictionary handle#105
polaz merged 26 commits intomainfrom
perf/#86-ddict-handle

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Apr 11, 2026

Summary

  • add DictionaryHandle reuse APIs and direct decode paths to avoid per-call dictionary parsing
  • wire FrameDecoder/StreamingDecoder to accept prepared handles or raw dict bytes
  • add criterion benchmark for repeated dict decode latency

Acceptance Criteria

  • prepared dictionary API is public and documented (DictionaryHandle + decoding module docs)
  • decode path accepts prepared handle directly (FrameDecoder::reset_with_dict_handle, decode_all_with_dict_handle, StreamingDecoder::new_with_dictionary_handle)
  • raw-bytes dictionary API retained (add_dict_from_bytes, decode_all_with_dict_bytes, new_with_dictionary_bytes)
  • existing dictionary decode tests pass
  • benchmark shows improvement for repeated dict decode
    • prepared handle: ~4.16 µs
    • raw dict per call: ~20.07 µs

Testing

  • cargo check --workspace
  • cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings
  • cargo clippy -p structured-zstd --features hash,std,dict_builder,bench_internals --benches -- -D warnings
  • cargo nextest run -p structured-zstd
  • cargo nextest run --workspace
  • cargo test --doc --workspace
  • cargo bench --bench decode_dict_handle

Closes #86

Summary by CodeRabbit

  • New Features

    • Added a cloneable, shareable DictionaryHandle and APIs to register/use pre-parsed dictionaries; streaming and multi-frame decode helpers accepting handles or raw dictionary bytes.
  • Bug Fixes

    • Clearer dictionary-related errors (ID mismatch, duplicate registration); decoder prefers owned then shared dictionaries and reports missing dicts more informatively.
  • Performance

    • Added a benchmark comparing reuse of parsed handles vs decoding raw dictionary bytes each call.
  • Tests

    • Expanded coverage for handle semantics, sharing/cloning, decoder injection/reset, skippable frames, force/lookup behavior, and end-to-end decoding.
  • Documentation

    • README updated with dictionary-backed decompression examples and compression-strategy coverage.

Copilot AI review requested due to automatic review settings April 11, 2026 17:27
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 11, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8fa56f60-c6fb-4442-a155-f436ac945e32

📥 Commits

Reviewing files that changed from the base of the PR and between b580b61 and 2ffa207.

📒 Files selected for processing (1)
  • zstd/src/decoding/dictionary.rs

📝 Walkthrough

Walkthrough

Adds a cloneable, shareable DictionaryHandle (Arc/Rc-backed), integrates handle-based registration and initialization into FrameDecoder and StreamingDecoder, introduces dictionary validation and new FrameDecoder errors, adds tests and a Criterion benchmark, and re-exports DictionaryHandle from the decoding module.

Changes

Cohort / File(s) Summary
Dictionary handle infra
zstd/src/decoding/dictionary.rs, zstd/src/decoding/mod.rs
Introduce DictionaryHandle wrapping a SharedDictionary (Arc on atomic targets, Rc otherwise); add constructors (from_dictionary, decode_dict), accessors (id, as_dict), AsRef/From impls, and Dictionary::into_handle(); re-export DictionaryHandle.
Frame decoder integration
zstd/src/decoding/frame_decoder.rs
Split dict storage into owned_dicts and cfg-enabled shared_dicts; add init_with_dict_handle/reset_with_dict_handle, add_dict_from_bytes, cfg add_dict_handle; validate dictionaries (no zero id, no zero repeat offsets); prefer owned lookup, detect duplicates across tiers; add decode_all_with_dict_handle/decode_all_with_dict_bytes and factor decode_all_impl.
Streaming decoder APIs
zstd/src/decoding/streaming_decoder.rs
Add new_with_dictionary_handle and new_with_dictionary_bytes constructors to initialize streaming decoder with a prepared handle or raw bytes.
Errors
zstd/src/decoding/errors.rs
Add FrameDecoderError::DictIdMismatch { expected, provided } and DictAlreadyRegistered { dict_id }; update Display message for DictNotProvided and add messages for new variants; tests updated to assert messages.
Tests & helpers
zstd/src/tests/dict_test.rs
Add tests for DictionaryHandle creation/clone-sharing, handle vs owned registration behavior, add_dict_from_bytes, reset_with_dict_handle validation/mismatch, decode_all behaviors (skippable frames, target-too-small), force_dict flows, and a test helper to load sample dict-backed frames.
Benchmarks & Cargo
zstd/benches/decode_dict_handle.rs, zstd/Cargo.toml
Add Criterion benchmark decode_dict_handle comparing pre-parsed handle vs raw-bytes decode paths and register new bench target in Cargo.toml.
Docs
README.md
Add "Dictionary-backed Decompression API" example using DictionaryHandle; minor wording and compression-strategy coverage edits.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(200,200,255,0.5)
    participant Client
    end
    rect rgba(200,255,200,0.5)
    participant DH as DictionaryHandle
    end
    rect rgba(255,200,200,0.5)
    participant FD as FrameDecoder
    end
    rect rgba(255,255,200,0.5)
    participant DS as DecoderScratch
    end

    Client->>DH: DictionaryHandle::decode_dict(...) / from_dictionary(...)
    Client->>FD: add_dict_handle(DH) / add_dict_from_bytes(...)
    FD->>FD: store in shared_dicts / owned_dicts
    Client->>FD: decode_all_with_dict_handle(input, out, DH)
    FD->>FD: parse frame header
    FD->>DH: as_dict() -> &Dictionary
    FD->>DS: init_from_dict(&Dictionary)
    DS-->>FD: decode bytes / status
    FD-->>Client: return decoded size / error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Poem

🐇 I parsed the bytes with gentle paws and care,
I hold a handle, shared and light as air.
Clone me softly—no reparse to bear,
Frames hop through quicker, nimble in my care,
Benchmarks nod and clap their tiny hare-y stare.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'perf(decoding): add shared dictionary handle' clearly and concisely summarizes the main change: introducing a shared dictionary handle for performance improvement in decoding.
Linked Issues check ✅ Passed All acceptance criteria from issue #86 are met: DictionaryHandle is public with documented API, prepared dictionary avoids per-call lookup on hot path, existing tests pass, and benchmarks validate latency improvement (4.16µs vs 20.07µs).
Out of Scope Changes check ✅ Passed Minor README updates (wording adjustment, new documentation sections) are incidental to the main feature and directly support its documentation goal; all code changes focus on dictionary handle introduction and related decode APIs.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/#86-ddict-handle

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 11, 2026

Codecov Report

❌ Patch coverage is 98.49246% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
zstd/src/decoding/frame_decoder.rs 97.74% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a reusable, shared dictionary wrapper (DictionaryHandle) to avoid repeatedly parsing the same zstd dictionary across decode paths, and wires FrameDecoder to accept/store these handles.

Changes:

  • Add DictionaryHandle (cloneable shared handle around a parsed Dictionary) and re-export it from decoding.
  • Update FrameDecoder to store dictionary handles and provide add_dict_handle, updating decode initialization to borrow via AsRef<Dictionary>.
  • Update and extend dictionary-related tests to cover handle cloning / sharing.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
zstd/src/decoding/dictionary.rs Adds DictionaryHandle backed by shared ownership for reuse; adds a unit test asserting clones share the same inner allocation.
zstd/src/decoding/frame_decoder.rs Switches decoder dictionary storage to handles; adds add_dict_handle and updates dict initialization to use borrowed dictionaries.
zstd/src/decoding/mod.rs Re-exports DictionaryHandle from the decoding module.
zstd/src/tests/dict_test.rs Updates decoding test to use DictionaryHandle and add_dict_handle.

Comment thread zstd/src/decoding/dictionary.rs
Comment thread zstd/src/decoding/frame_decoder.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/benches/decode_dict_handle.rs`:
- Around line 29-67: Add a one-time correctness preflight in
bench_decode_dict_handle: before creating the Criterion group, decode compressed
into a buffer using FrameDecoder::new() with decode_all_with_dict_handle (and
separately with decode_all_with_dict_bytes) and assert the output equals payload
(using output_len, compressed, handle, and dict_raw) so both paths produce
correct bytes; only proceed to the benchmark groups if these equality checks
succeed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e0b531f3-c99c-40b7-9b17-94a4e7239272

📥 Commits

Reviewing files that changed from the base of the PR and between e40887a and da01705.

📒 Files selected for processing (5)
  • zstd/Cargo.toml
  • zstd/benches/decode_dict_handle.rs
  • zstd/src/decoding/frame_decoder.rs
  • zstd/src/decoding/mod.rs
  • zstd/src/decoding/streaming_decoder.rs

Comment thread zstd/benches/decode_dict_handle.rs
polaz added 3 commits April 11, 2026 20:41
- use Rc on non-atomic targets for DictionaryHandle
- keep add_dict allocation-free and add dict decode tests
- add benchmark preflight correctness checks
@polaz polaz force-pushed the perf/#86-ddict-handle branch from da01705 to 102052e Compare April 11, 2026 17:53
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/frame_decoder.rs`:
- Around line 638-686: Extract the duplicated multi-frame decode loop from
decode_all_with_dict_handle into a single helper (e.g.,
decode_all_frames_with_optional_dict or decode_all_frames) that accepts the
input slice, output slice, and an Option<&DictionaryHandle>, and implements: the
outer loop with init_with_dict_handle/init (or a branch on the Option) including
SkipFrame handling, the inner loop calling
decode_blocks(BlockDecodingStrategy::UptoBytes(1024 * 1024)), draining via
self.read and advancing output/total_bytes_written, checking self.can_collect()
-> TargetTooSmall and self.is_finished(), and proper error propagation; then
replace the bodies of decode_all_with_dict_handle and decode_all (or
decode_all_with_dict_bytes) to call this helper (for the raw dictionary path
first decode DictionaryHandle::decode_dict then call helper) so the logic is
shared and behavior unchanged.

In `@zstd/src/tests/dict_test.rs`:
- Around line 311-329: The fixture selection currently grabs the first .zst
under dict_tests/files (variables files and file_path), which may pick a
non-dictionary frame; update load_sample_dict_frame() to explicitly choose a
dictionary-specific fixture by filtering file names or stems for a dictionary
marker (e.g., contains "dict" or "dictionary" or matches a known dict fixture
pattern) instead of relying on sort order, e.g., replace the .find closure with
a filter that checks both extension == "zst" and the
path.file_stem()/to_string_lossy() contains the dict marker before calling
expect.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8964170b-548f-4a35-85cd-973724733a7e

📥 Commits

Reviewing files that changed from the base of the PR and between da01705 and 102052e.

📒 Files selected for processing (7)
  • zstd/Cargo.toml
  • zstd/benches/decode_dict_handle.rs
  • zstd/src/decoding/dictionary.rs
  • zstd/src/decoding/frame_decoder.rs
  • zstd/src/decoding/mod.rs
  • zstd/src/decoding/streaming_decoder.rs
  • zstd/src/tests/dict_test.rs

Comment thread zstd/src/decoding/frame_decoder.rs
Comment thread zstd/src/tests/dict_test.rs Outdated
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 11, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
gh api repos/structured-world/structured-zstd/pulls/105/comments \
  --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, body: .body, path: .path, line: .line}'

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check PR reviews by coderabbitai
gh api repos/structured-world/structured-zstd/pulls/105/reviews \
  --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, state: .state, body: .body}'

echo "---ISSUE COMMENTS---"
gh api repos/structured-world/structured-zstd/issues/105/comments \
  --jq '.[] | {id: .id, user: .user.login, body: .body[0:200]}'

Length of output: 619


🏁 Script executed:

#!/bin/bash
cat zstd/benches/decode_dict_handle.rs

Length of output: 3031


@polaz Thanks for the heads-up and for routing around the reply restriction. I've read the benchmark file — the one-time correctness preflight is clearly in place:

  • decode_all_with_dict_handle — decodes into output before any iterations and asserts output == payload (message: "prepared_handle produced unexpected output").
  • output.fill(0) + decode_all_with_dict_bytes — resets the buffer, decodes, and asserts the same equality (message: "raw_dict_each_call produced unexpected output").

Both checks fire once at benchmark-setup time, so any regression in either decode path will abort the benchmark run immediately rather than silently timing incorrect output. The fix looks good — appreciate the proactive note.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/structured-zstd PR: 73
File: zstd/src/dictionary/mod.rs:337-345
Timestamp: 2026-04-05T21:31:56.797Z
Learning: In `zstd/src/decoding/dictionary.rs` (structured-world/structured-zstd), `Dictionary::decode_dict()` parses the full zstd dictionary wire format: magic (4 bytes) → dict_id (4 bytes) → Huffman table (variable, consumed via `build_decoder`) → OF FSE table → ML FSE table → LL FSE table → 3 × u32 repeat offsets → raw content. This matches the layout produced by `finalize_raw_dict()` in `zstd/src/dictionary/mod.rs`. Do not flag these as mismatched layouts.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 44
File: zstd/src/encoding/frame_compressor.rs:0-0
Timestamp: 2026-03-28T22:55:01.577Z
Learning: In `zstd/src/encoding/frame_compressor.rs`, `FrameCompressor::set_dictionary` returns `Result<Option<Dictionary>, DictionaryDecodeError>` and enforces two fail-fast programmer-error contracts via `Err(...)` rather than panics: (1) `dictionary.id == 0` returns `DictionaryDecodeError::ZeroDictionaryId`; (2) any zero entry in `dictionary.offset_hist` returns `DictionaryDecodeError::ZeroRepeatOffsetInDictionary`. The fallible parsing paths (`Dictionary::decode_dict`, `Dictionary::from_raw_content`) apply the same rejections at input boundaries. Do not flag these as inconsistencies — the uniform `Result`-based contract is intentional.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-09T23:17:57.736Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd), `MatchGeneratorDriver::prime_with_dictionary` uses a dedicated dense seeding path for the `DfastMatchGenerator` backend: `skip_matching_for_dictionary_priming` routes to `skip_matching_dense`, bypassing the incompressible heuristic and sparse step insertion (`insert_positions_with_step`). This ensures primed dictionary history is always densely seeded and cross-block matching invariants are preserved for dictionary-compressed streams. Do not flag `prime_with_dictionary` as potentially using sparse/incompressible stepping.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T08:17:53.856Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd), `set_source_size_hint` applies only to the payload (uncompressed source) bytes, not to dictionary priming. Dictionary priming is separate and does not inflate the advertised frame window size. Do not suggest including dictionary bytes in the size hint passed to `set_source_size_hint`.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/frame_compressor.rs:779-805
Timestamp: 2026-04-11T07:48:58.281Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd, PR `#99`), a temporary compatibility guard exists for sub-256-byte compressed payloads: the 1-byte-FCS single-segment form is not yet fully C-FFI compatible for payloads below 256 bytes, and a compat guard was added to handle this case while the path is hardened. Do not flag this guard as unnecessary.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T11:56:17.795Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#102`), after commit 46927db, hash-mix kernel dispatch was moved from the global `HASH_MIX_KERNEL` AtomicU8 hot-path read into a per-matcher instance field `hash_mix_kernel` stored directly in `DfastMatchGenerator` and `RowMatchGenerator`. The hot path no longer performs an atomic load or global branch; kernel selection happens once at matcher initialisation and is subsequently a plain struct-field read. Do not flag the hot path as doing a per-call atomic load — that pattern was superseded by commit 46927db.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-09T15:16:10.603Z
Learning: In structured-world/structured-zstd, the `fuzz_exports` API concern was resolved via "Option B": internal decode/encode helpers remain `pub(crate)` by default, and explicit public shims are exposed only behind a dedicated feature flag (e.g., `fuzz`) in a separate file. Do not flag these internal symbols as needing a wider public API — the feature-gated shim layer is the intentional design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/frame_compressor.rs:779-805
Timestamp: 2026-04-11T07:48:58.281Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd, PR `#99`), hinted tiny frames (`source_size_hint <= 2^14` and `total_uncompressed <= 2^14`) are intentionally emitted as single-segment across ALL compression levels, including levels that produce compressed (non-raw) blocks. The `hinted_small_compressible_frames_use_single_segment_across_levels` test asserts `single_segment_flag() == true` for compressible hinted inputs (e.g. 4 KiB repeated log lines). Do not re-suggest asserting `single_segment_flag() == false` for compressible hinted frames — the donor-parity design requires single-segment for all hinted tiny frames.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-06T01:40:24.378Z
Learning: In `zstd/benches/compare_ffi.rs` (structured-world/structured-zstd), Rust FastCOVER trains with the post-finalization content budget in both the `REPORT_DICT_TRAIN` emission path (around lines 208-225) and the Criterion benchmark path (around lines 266-280). Both paths were aligned in commit 8622344. Do not flag these ranges as using inconsistent budget values.

- reuse shared multi-frame decode helper
- ensure sample dict fixture requires a dictionary
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 11, 2026

@coderabbitai Fixed both: extracted a shared multi-frame decode helper for decode_all/with_dict_handle, and hardened the dict fixture by asserting the selected .zst frame declares a dictionary id. Replying here since I lack permission to reply on review comments.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 11, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
cat zstd/benches/decode_dict_handle.rs
echo "---DICT_TEST---"
cat zstd/src/tests/dict_test.rs

Length of output: 14526


🏁 Script executed:

#!/bin/bash
# Look for any shared internal decode helper in frame_decoder.rs
rg -n "decode_all_with_dict" zstd/src/decoding/frame_decoder.rs -A 30

echo "---HELPER SEARCH---"
# Search for any private/internal multi-frame helper
rg -n "fn decode_all" zstd/src/decoding/frame_decoder.rs

echo "---LOAD_SAMPLE_DICT_FRAME FULL---"
# Re-read the full load_sample_dict_frame including any dict-id assertion
rg -n "dict_id\|dict_frame\|declares\|assert\|frame_header\|DictId\|dict_frame_id" zstd/src/tests/dict_test.rs

echo "---FRAME HEADER PARSING---"
# Check if there's any frame-header read or dict_id check in load_sample_dict_frame
ast-grep --pattern 'fn load_sample_dict_frame() {
  $$$
}'

Length of output: 3392


@polaz Thanks for the update — here's what I can verify from the current branch:

Fix #1 — shared multi-frame decode helper ✅

Confirmed in frame_decoder.rs (lines 678–685). decode_all_with_dict_bytes is now a two-line delegating wrapper that parses the raw bytes into a DictionaryHandle and immediately hands off to decode_all_with_dict_handle:

pub fn decode_all_with_dict_bytes() -> Result<usize, FrameDecoderError> {
    let dict = DictionaryHandle::decode_dict(raw_dictionary)?;
    self.decode_all_with_dict_handle(input, output, &dict)
}

The full multi-frame skip/decode loop lives only once inside decode_all_with_dict_handle — no duplication.

Fix #2 — dict fixture hardened with dictionary-id assertion ⚠️ not yet visible

In the current branch snapshot load_sample_dict_frame() still only filters on the .zst extension; I don't see a frame-header peek or a dict_id != 0 assertion. It's possible the commit hasn't landed in the sandbox clone yet, or the check uses a code path my search didn't catch. Could you point me to where the dictionary-ID assertion was added (file + rough line range)? Once I can see it I'll mark this thread resolved.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-06T01:40:24.378Z
Learning: In `zstd/benches/compare_ffi.rs` (structured-world/structured-zstd), Rust FastCOVER trains with the post-finalization content budget in both the `REPORT_DICT_TRAIN` emission path (around lines 208-225) and the Criterion benchmark path (around lines 266-280). Both paths were aligned in commit 8622344. Do not flag these ranges as using inconsistent budget values.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 44
File: zstd/src/encoding/frame_compressor.rs:0-0
Timestamp: 2026-03-28T22:55:01.577Z
Learning: In `zstd/src/encoding/frame_compressor.rs`, `FrameCompressor::set_dictionary` returns `Result<Option<Dictionary>, DictionaryDecodeError>` and enforces two fail-fast programmer-error contracts via `Err(...)` rather than panics: (1) `dictionary.id == 0` returns `DictionaryDecodeError::ZeroDictionaryId`; (2) any zero entry in `dictionary.offset_hist` returns `DictionaryDecodeError::ZeroRepeatOffsetInDictionary`. The fallible parsing paths (`Dictionary::decode_dict`, `Dictionary::from_raw_content`) apply the same rejections at input boundaries. Do not flag these as inconsistencies — the uniform `Result`-based contract is intentional.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/frame_compressor.rs:779-805
Timestamp: 2026-04-11T07:48:58.281Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd, PR `#99`), hinted tiny frames (`source_size_hint <= 2^14` and `total_uncompressed <= 2^14`) are intentionally emitted as single-segment across ALL compression levels, including levels that produce compressed (non-raw) blocks. The `hinted_small_compressible_frames_use_single_segment_across_levels` test asserts `single_segment_flag() == true` for compressible hinted inputs (e.g. 4 KiB repeated log lines). Do not re-suggest asserting `single_segment_flag() == false` for compressible hinted frames — the donor-parity design requires single-segment for all hinted tiny frames.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 73
File: zstd/src/dictionary/mod.rs:337-345
Timestamp: 2026-04-05T21:31:56.797Z
Learning: In `zstd/src/decoding/dictionary.rs` (structured-world/structured-zstd), `Dictionary::decode_dict()` parses the full zstd dictionary wire format: magic (4 bytes) → dict_id (4 bytes) → Huffman table (variable, consumed via `build_decoder`) → OF FSE table → ML FSE table → LL FSE table → 3 × u32 repeat offsets → raw content. This matches the layout produced by `finalize_raw_dict()` in `zstd/src/dictionary/mod.rs`. Do not flag these as mismatched layouts.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/frame_compressor.rs:779-805
Timestamp: 2026-04-11T07:48:58.281Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd, PR `#99`), a temporary compatibility guard exists for sub-256-byte compressed payloads: the 1-byte-FCS single-segment form is not yet fully C-FFI compatible for payloads below 256 bytes, and a compat guard was added to handle this case while the path is hardened. Do not flag this guard as unnecessary.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:51.926Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test suite for `copy_bytes_overshooting` includes a misaligned extra-capacity case that exercises the rounded-up wildcopy path (where `copy_multiple > copy_at_least`) and validates prefix correctness for that scenario. Do not flag the absence of a misaligned/overshooting extra-capacity sub-case as missing test coverage.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T08:52:47.911Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd, PR `#99`), `hinted_small_compressible_frames_use_single_segment_across_levels` asserts `single_segment_flag() == true` for hinted 4 KiB compressible frames across all levels (Fastest, Default, Better, Best, Level(0), Level(3), Level(4), Level(11)). A helper `assert_not_raw_for_level` additionally asserts the first block is not `BlockType::Raw`, that `compressed.len() < data.len()`, and that FFI roundtrip decodes correctly. Do not flag the test as missing size-reduction or non-Raw block assertions.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:56.626Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_preserves_prefix_for_runtime_chunk_lengths` (or equivalent) includes a misaligned extra-capacity case that uses `add(1)`-offset pointers and a `chunk_plus_one` length to exercise the rounded-up wildcopy overshoot path (where `copy_multiple > copy_at_least`) and validates prefix correctness. Do not flag the absence of a misaligned-pointer case in this test — it is present by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:498-509
Timestamp: 2026-04-02T22:26:07.979Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), `best_level_does_not_regress_vs_better` uses a `<=` (not strict `<`) assertion because the `repeat_offset_fixture(b"HelloWorld", ...)` input is simple enough that HC saturates at both Better (16 candidates) and Best (32 candidates) search depths, producing identical compressed sizes (~30243 bytes). Strict `<` would be a false positive on this fixture. The strict `Best < Better` quality assertion lives in `cross_validation::best_level_beats_better_on_corpus_proxy` on the decodecorpus sample. Do not re-flag the `<=` as a weakened guard.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T08:17:53.856Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd), `set_source_size_hint` applies only to the payload (uncompressed source) bytes, not to dictionary priming. Dictionary priming is separate and does not inflate the advertised frame window size. Do not suggest including dictionary bytes in the size hint passed to `set_source_size_hint`.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-09T23:17:57.736Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd), `MatchGeneratorDriver::prime_with_dictionary` uses a dedicated dense seeding path for the `DfastMatchGenerator` backend: `skip_matching_for_dictionary_priming` routes to `skip_matching_dense`, bypassing the incompressible heuristic and sparse step insertion (`insert_positions_with_step`). This ensures primed dictionary history is always densely seeded and cross-block matching invariants are preserved for dictionary-compressed streams. Do not flag `prime_with_dictionary` as potentially using sparse/incompressible stepping.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T11:56:17.795Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#102`), after commit 46927db, hash-mix kernel dispatch was moved from the global `HASH_MIX_KERNEL` AtomicU8 hot-path read into a per-matcher instance field `hash_mix_kernel` stored directly in `DfastMatchGenerator` and `RowMatchGenerator`. The hot path no longer performs an atomic load or global branch; kernel selection happens once at matcher initialisation and is subsequently a plain struct-field read. Do not flag the hot path as doing a per-call atomic load — that pattern was superseded by commit 46927db.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-09T15:16:10.603Z
Learning: In structured-world/structured-zstd, the `fuzz_exports` API concern was resolved via "Option B": internal decode/encode helpers remain `pub(crate)` by default, and explicit public shims are exposed only behind a dedicated feature flag (e.g., `fuzz`) in a separate file. Do not flag these internal symbols as needing a wider public API — the feature-gated shim layer is the intentional design.

- cover decode_all skip and target-too-small paths
- exercise DictionaryHandle conversions
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment thread zstd/src/decoding/frame_decoder.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
zstd/src/tests/dict_test.rs (1)

386-405: 🧹 Nitpick | 🔵 Trivial

Select a dictionary-backed fixture during discovery, not after first pick.

The helper still picks the first .zst and only then asserts dictionary ID presence. If a non-dictionary .zst sorts first, this fails despite valid dictionary fixtures being present.

Suggested hardening diff
-    let file_path = files
+    let (file_path, compressed) = files
         .into_iter()
         .filter_map(Result::ok)
         .map(|entry| entry.path())
-        .find(|path| {
-            path.extension()
-                .and_then(|ext| ext.to_str())
-                .map(|ext| ext == "zst")
-                .unwrap_or(false)
+        .find_map(|path| {
+            let is_zst = path
+                .extension()
+                .and_then(|ext| ext.to_str())
+                .map(|ext| ext == "zst")
+                .unwrap_or(false);
+            if !is_zst {
+                return None;
+            }
+
+            let compressed = fs::read(&path).ok()?;
+            let mut header_src = compressed.as_slice();
+            let (header, _) = crate::decoding::frame::read_frame_header(&mut header_src).ok()?;
+            if header.dictionary_id().is_some() {
+                Some((path, compressed))
+            } else {
+                None
+            }
         })
-        .expect("expected at least one .zst file in dict_tests/files");
-
-    let compressed = fs::read(&file_path).expect("compressed data should load");
-    let mut header_src = compressed.as_slice();
-    let (header, _) = crate::decoding::frame::read_frame_header(&mut header_src)
-        .expect("sample frame header should parse");
-    assert!(
-        header.dictionary_id().is_some(),
-        "sample fixture must require a dictionary"
-    );
+        .expect("expected at least one dictionary-backed .zst file in dict_tests/files");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/tests/dict_test.rs` around lines 386 - 405, The code currently finds
the first .zst file into file_path and only then checks header.dictionary_id(),
which fails if that first file lacks a dictionary; instead, modify the discovery
so the iterator selects the first .zst whose frame header reports a
dictionary_id. Inside the closure used by find (or by using filter_map before
find), open and read each candidate file into a buffer (as the existing
compressed and header_src logic does), call
crate::decoding::frame::read_frame_header and only return the path when
header.dictionary_id().is_some(); skip files that fail to read or parse so
non-dictionary or unreadable .zst fixtures are ignored. Ensure you still call
fs::read and read_frame_header (the compressed, header_src, read_frame_header,
and header.dictionary_id symbols) for each candidate during discovery rather
than after picking the first file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@zstd/src/tests/dict_test.rs`:
- Around line 386-405: The code currently finds the first .zst file into
file_path and only then checks header.dictionary_id(), which fails if that first
file lacks a dictionary; instead, modify the discovery so the iterator selects
the first .zst whose frame header reports a dictionary_id. Inside the closure
used by find (or by using filter_map before find), open and read each candidate
file into a buffer (as the existing compressed and header_src logic does), call
crate::decoding::frame::read_frame_header and only return the path when
header.dictionary_id().is_some(); skip files that fail to read or parse so
non-dictionary or unreadable .zst fixtures are ignored. Ensure you still call
fs::read and read_frame_header (the compressed, header_src, read_frame_header,
and header.dictionary_id symbols) for each candidate during discovery rather
than after picking the first file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ecf8cc0a-55ef-4217-8c44-cc4283ce4946

📥 Commits

Reviewing files that changed from the base of the PR and between 17cf973 and 83682af.

📒 Files selected for processing (1)
  • zstd/src/tests/dict_test.rs

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment thread zstd/src/decoding/frame_decoder.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
zstd/src/tests/dict_test.rs (1)

386-396: 🧹 Nitpick | 🔵 Trivial

Fixture discovery still depends on first .zst ordering.

Line 386-Line 396 still selects the first .zst file, and only later (Line 402-Line 405) asserts dictionary usage. Adding a non-dictionary .zst that sorts first will fail setup even when valid dictionary fixtures exist. Prefer selecting a .zst fixture that already satisfies the dictionary-id condition during discovery.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/tests/dict_test.rs` around lines 386 - 396, The current discovery
picks the first .zst by extension (using files.into_iter()...find(...) ->
file_path) which can pick a non-dictionary fixture; change the finder to only
select .zst files that already satisfy the dictionary-id condition used later by
the test: inside the find closure, after verifying extension == "zst", open/peek
the file and inspect its Zstd frame header or use the existing helper that
extracts the dictionary ID (e.g., a read_dict_id_from_zst or parse_frame_header
helper) and only return true when the dict-id matches the expected dictionary
marker; keep the expect(...) if none match so setup fails clearly when no
appropriate dictionary-backed .zst is present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/tests/dict_test.rs`:
- Around line 347-356: The test currently ignores the usize returned by
FrameDecoder::new().decode_all_with_dict_handle and decode_all_with_dict_bytes,
which can mask short writes; capture the returned lengths (e.g., len1 and len2)
and assert they equal original.len(), then compare the output buffer up to that
returned length (or assert full buffer equals original after verifying length)
to ensure partial writes don't produce false positives; update the assertions
around decode_all_with_dict_handle, decode_all_with_dict_bytes, output, and
original accordingly.

---

Duplicate comments:
In `@zstd/src/tests/dict_test.rs`:
- Around line 386-396: The current discovery picks the first .zst by extension
(using files.into_iter()...find(...) -> file_path) which can pick a
non-dictionary fixture; change the finder to only select .zst files that already
satisfy the dictionary-id condition used later by the test: inside the find
closure, after verifying extension == "zst", open/peek the file and inspect its
Zstd frame header or use the existing helper that extracts the dictionary ID
(e.g., a read_dict_id_from_zst or parse_frame_header helper) and only return
true when the dict-id matches the expected dictionary marker; keep the
expect(...) if none match so setup fails clearly when no appropriate
dictionary-backed .zst is present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 379c5dc1-8f47-4cf8-abdc-a851eaf5f6df

📥 Commits

Reviewing files that changed from the base of the PR and between 17cf973 and 83682af.

📒 Files selected for processing (1)
  • zstd/src/tests/dict_test.rs

Comment thread zstd/src/tests/dict_test.rs Outdated
- cover add_dict_from_bytes and mismatched dict handle reset
- exercise dictionary handle as_ref via into_handle
- document decode_all skipping skippable frames
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 11, 2026

@coderabbitai Fixed — dict fixture selection now chooses dictionary-backed .zst files during discovery.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 11, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -n "load_sample_dict_frame\|dict_id\|dictionary_id\|frame_header\|DictionaryId\|Magic\|has_dict" zstd/src/tests/dict_test.rs -A 3 -B 1

Length of output: 0


🏁 Script executed:

#!/bin/bash
cat zstd/src/tests/dict_test.rs

Length of output: 14217


@polaz Got it — fix confirmed. load_sample_dict_frame now reads the frame header and asserts header.dictionary_id().is_some() before returning the fixture, so any .zst file that doesn't declare a dictionary ID will cause the test to fail immediately with a clear message. Both items are verified:

  • Fix #1 — shared multi-frame decode helper in decode_all_with_dict_bytes
  • Fix #2load_sample_dict_frame asserts header.dictionary_id().is_some() after parsing the frame header ✅

✏️ Learnings added
Learnt from: polaz
URL: https://github.com/structured-world/structured-zstd/pull/105

Timestamp: 2026-04-11T18:31:00.002Z
Learning: In `zstd/src/tests/dict_test.rs` (structured-world/structured-zstd, PR `#105`), the test-only helper `load_sample_dict_frame()` selects the first sorted `.zst` file from `./dict_tests/files`, parses its frame header via `crate::decoding::frame::read_frame_header`, and asserts `header.dictionary_id().is_some()` to guarantee the fixture requires a dictionary. Do not flag the absence of a dict-id guard in this helper — the assertion is present by design.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-06T01:40:24.378Z
Learning: In `zstd/benches/compare_ffi.rs` (structured-world/structured-zstd), Rust FastCOVER trains with the post-finalization content budget in both the `REPORT_DICT_TRAIN` emission path (around lines 208-225) and the Criterion benchmark path (around lines 266-280). Both paths were aligned in commit 8622344. Do not flag these ranges as using inconsistent budget values.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:51.926Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test suite for `copy_bytes_overshooting` includes a misaligned extra-capacity case that exercises the rounded-up wildcopy path (where `copy_multiple > copy_at_least`) and validates prefix correctness for that scenario. Do not flag the absence of a misaligned/overshooting extra-capacity sub-case as missing test coverage.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 44
File: zstd/src/encoding/frame_compressor.rs:0-0
Timestamp: 2026-03-28T22:55:01.577Z
Learning: In `zstd/src/encoding/frame_compressor.rs`, `FrameCompressor::set_dictionary` returns `Result<Option<Dictionary>, DictionaryDecodeError>` and enforces two fail-fast programmer-error contracts via `Err(...)` rather than panics: (1) `dictionary.id == 0` returns `DictionaryDecodeError::ZeroDictionaryId`; (2) any zero entry in `dictionary.offset_hist` returns `DictionaryDecodeError::ZeroRepeatOffsetInDictionary`. The fallible parsing paths (`Dictionary::decode_dict`, `Dictionary::from_raw_content`) apply the same rejections at input boundaries. Do not flag these as inconsistencies — the uniform `Result`-based contract is intentional.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T08:52:47.911Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd, PR `#99`), `hinted_small_compressible_frames_use_single_segment_across_levels` asserts `single_segment_flag() == true` for hinted 4 KiB compressible frames across all levels (Fastest, Default, Better, Best, Level(0), Level(3), Level(4), Level(11)). A helper `assert_not_raw_for_level` additionally asserts the first block is not `BlockType::Raw`, that `compressed.len() < data.len()`, and that FFI roundtrip decodes correctly. Do not flag the test as missing size-reduction or non-Raw block assertions.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 73
File: zstd/src/dictionary/mod.rs:337-345
Timestamp: 2026-04-05T21:31:56.797Z
Learning: In `zstd/src/decoding/dictionary.rs` (structured-world/structured-zstd), `Dictionary::decode_dict()` parses the full zstd dictionary wire format: magic (4 bytes) → dict_id (4 bytes) → Huffman table (variable, consumed via `build_decoder`) → OF FSE table → ML FSE table → LL FSE table → 3 × u32 repeat offsets → raw content. This matches the layout produced by `finalize_raw_dict()` in `zstd/src/dictionary/mod.rs`. Do not flag these as mismatched layouts.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:56.626Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_preserves_prefix_for_runtime_chunk_lengths` (or equivalent) includes a misaligned extra-capacity case that uses `add(1)`-offset pointers and a `chunk_plus_one` length to exercise the rounded-up wildcopy overshoot path (where `copy_multiple > copy_at_least`) and validates prefix correctness. Do not flag the absence of a misaligned-pointer case in this test — it is present by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:498-509
Timestamp: 2026-04-02T22:26:07.979Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), `best_level_does_not_regress_vs_better` uses a `<=` (not strict `<`) assertion because the `repeat_offset_fixture(b"HelloWorld", ...)` input is simple enough that HC saturates at both Better (16 candidates) and Best (32 candidates) search depths, producing identical compressed sizes (~30243 bytes). Strict `<` would be a false positive on this fixture. The strict `Best < Better` quality assertion lives in `cross_validation::best_level_beats_better_on_corpus_proxy` on the decodecorpus sample. Do not re-flag the `<=` as a weakened guard.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/frame_compressor.rs:779-805
Timestamp: 2026-04-11T07:48:58.281Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd, PR `#99`), hinted tiny frames (`source_size_hint <= 2^14` and `total_uncompressed <= 2^14`) are intentionally emitted as single-segment across ALL compression levels, including levels that produce compressed (non-raw) blocks. The `hinted_small_compressible_frames_use_single_segment_across_levels` test asserts `single_segment_flag() == true` for compressible hinted inputs (e.g. 4 KiB repeated log lines). Do not re-suggest asserting `single_segment_flag() == false` for compressible hinted frames — the donor-parity design requires single-segment for all hinted tiny frames.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/match_generator.rs:457-463
Timestamp: 2026-04-11T07:48:12.353Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#99`), `MatchGeneratorDriver::skip_matching_for_dictionary_priming` calls `row_matcher_mut().skip_matching_with_hint(Some(false))` for the `MatcherBackend::Row` case — explicitly requesting dense seeding, consistent with Simple (`skip_matching_with_hint(Some(false))`) and HashChain (`skip_matching(Some(false))`) backends. Do not flag this as using an implicit/unhinted path or suggest reverting to `skip_matching()`.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-09T23:17:57.736Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd), `MatchGeneratorDriver::prime_with_dictionary` uses a dedicated dense seeding path for the `DfastMatchGenerator` backend: `skip_matching_for_dictionary_priming` routes to `skip_matching_dense`, bypassing the incompressible heuristic and sparse step insertion (`insert_positions_with_step`). This ensures primed dictionary history is always densely seeded and cross-block matching invariants are preserved for dictionary-compressed streams. Do not flag `prime_with_dictionary` as potentially using sparse/incompressible stepping.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T08:17:53.856Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd), `set_source_size_hint` applies only to the payload (uncompressed source) bytes, not to dictionary priming. Dictionary priming is separate and does not inflate the advertised frame window size. Do not suggest including dictionary bytes in the size hint passed to `set_source_size_hint`.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/frame_compressor.rs:779-805
Timestamp: 2026-04-11T07:48:58.281Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd, PR `#99`), a temporary compatibility guard exists for sub-256-byte compressed payloads: the 1-byte-FCS single-segment form is not yet fully C-FFI compatible for payloads below 256 bytes, and a compat guard was added to handle this case while the path is hardened. Do not flag this guard as unnecessary.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T11:56:17.795Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#102`), after commit 46927db, hash-mix kernel dispatch was moved from the global `HASH_MIX_KERNEL` AtomicU8 hot-path read into a per-matcher instance field `hash_mix_kernel` stored directly in `DfastMatchGenerator` and `RowMatchGenerator`. The hot path no longer performs an atomic load or global branch; kernel selection happens once at matcher initialisation and is subsequently a plain struct-field read. Do not flag the hot path as doing a per-call atomic load — that pattern was superseded by commit 46927db.

- assert decode_all_* returns original length
- guard against zero-filled false positives
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Comment thread zstd/src/tests/dict_test.rs Outdated
Comment thread zstd/src/tests/dict_test.rs Outdated
Comment thread zstd/src/decoding/dictionary.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/frame_decoder.rs`:
- Around line 749-757: Add the same handle-path warning to
decode_all_with_dict_bytes that exists on decode_all_with_dict_handle: document
that this thin wrapper will initialize every frame with the provided dictionary
(even if frame headers omit a dict ID) and thus carries the same
silent-corruption risk; update the function docstring for
decode_all_with_dict_bytes to mirror the warning and reference that it calls
DictionaryHandle::decode_dict and forwards to decode_all_with_dict_handle so
callers are aware of the precondition.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a0fd9dee-c2dc-4343-ab01-d8c304196fae

📥 Commits

Reviewing files that changed from the base of the PR and between 6690ff6 and 4d0b8cb.

📒 Files selected for processing (1)
  • zstd/src/decoding/frame_decoder.rs

Comment thread zstd/src/decoding/frame_decoder.rs
Comment thread zstd/src/decoding/frame_decoder.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
zstd/src/decoding/frame_decoder.rs (2)

749-757: ⚠️ Potential issue | 🟡 Minor

Document the raw-bytes footgun too.

This wrapper parses once and then forwards directly to decode_all_with_dict_handle(), so it inherits the same “apply this dictionary to every frame, even when the header omits a dict ID” precondition. Please mirror the warning here so callers do not assume the raw-bytes API is safer.

📘 Suggested doc update
 /// Decode multiple frames into the output slice using a serialized dictionary.
+///
+/// # Warning
+///
+/// Each decoded frame is initialized with the parsed dictionary, even when a
+/// frame header omits the optional dictionary ID. Callers must only use this
+/// API when they already know the input frames were encoded with that
+/// dictionary; otherwise decoded output can be silently corrupted.
 pub fn decode_all_with_dict_bytes(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/decoding/frame_decoder.rs` around lines 749 - 757, The doc for
decode_all_with_dict_bytes should warn that, like decode_all_with_dict_handle,
it parses the serialized dictionary via DictionaryHandle::decode_dict and then
applies that dictionary to every frame (even when a frame header omits a dict
ID); update the function-level documentation for decode_all_with_dict_bytes to
mirror the existing footgun warning from decode_all_with_dict_handle so callers
understand the raw-bytes API is not safer and must only be used when the same
dictionary should be applied to all frames.

313-319: ⚠️ Potential issue | 🟠 Major

Reject invalid dict_id == 0 values at registration time.

Dictionary is public and its id contract says it must not be zero, but these entry points only guard duplicate IDs. A manually constructed Dictionary or DictionaryHandle::from_dictionary(...) can still register 0, and force_dict(0) can then prime the decoder with an impossible dictionary ID.

Also applies to: 336-342

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/decoding/frame_decoder.rs` around lines 313 - 319, Reject
registration of dictionaries with id == 0 by adding an explicit check in the
registration paths: in FrameDecoder::add_dict (and the equivalent method shown
at 336-342), validate let dict_id = dict.id; if dict_id == 0 { return
Err(FrameDecoderError::InvalidDictId { dict_id }); } before checking duplicates;
similarly ensure any public entry that accepts a Dictionary (or
DictionaryHandle::from_dictionary) performs the same guard so force_dict(0)
cannot prime the decoder with an invalid ID. Ensure you return an appropriate
FrameDecoderError variant (or add one) for the zero id case and update
callers/tests accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@zstd/src/decoding/frame_decoder.rs`:
- Around line 749-757: The doc for decode_all_with_dict_bytes should warn that,
like decode_all_with_dict_handle, it parses the serialized dictionary via
DictionaryHandle::decode_dict and then applies that dictionary to every frame
(even when a frame header omits a dict ID); update the function-level
documentation for decode_all_with_dict_bytes to mirror the existing footgun
warning from decode_all_with_dict_handle so callers understand the raw-bytes API
is not safer and must only be used when the same dictionary should be applied to
all frames.
- Around line 313-319: Reject registration of dictionaries with id == 0 by
adding an explicit check in the registration paths: in FrameDecoder::add_dict
(and the equivalent method shown at 336-342), validate let dict_id = dict.id; if
dict_id == 0 { return Err(FrameDecoderError::InvalidDictId { dict_id }); }
before checking duplicates; similarly ensure any public entry that accepts a
Dictionary (or DictionaryHandle::from_dictionary) performs the same guard so
force_dict(0) cannot prime the decoder with an invalid ID. Ensure you return an
appropriate FrameDecoderError variant (or add one) for the zero id case and
update callers/tests accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 117cd9f6-d760-4820-aec0-8c890e006a75

📥 Commits

Reviewing files that changed from the base of the PR and between 6690ff6 and 4d0b8cb.

📒 Files selected for processing (1)
  • zstd/src/decoding/frame_decoder.rs

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
zstd/src/tests/dict_test.rs (1)

93-128: 🧹 Nitpick | 🔵 Trivial

Keep test_dict_decoding running on no-atomic targets.

The new cfg at Line 93 skips this end-to-end fixture loop on the exact targets that exercise the Rc fallback. The owned-dictionary path still exists there, so it would be better to cfg only the registration step (add_dict_handle vs add_dict) and keep the rest of the corpus coverage intact.

Possible adjustment
-#[cfg(target_has_atomic = "ptr")]
 #[test]
 fn test_dict_decoding() {
@@
-    let dict = crate::decoding::DictionaryHandle::decode_dict(&dict).unwrap();
-    frame_dec.add_dict_handle(dict).unwrap();
+    #[cfg(target_has_atomic = "ptr")]
+    {
+        let dict = crate::decoding::DictionaryHandle::decode_dict(&dict).unwrap();
+        frame_dec.add_dict_handle(dict).unwrap();
+    }
+    #[cfg(not(target_has_atomic = "ptr"))]
+    {
+        let dict = crate::decoding::dictionary::Dictionary::decode_dict(&dict).unwrap();
+        frame_dec.add_dict(dict).unwrap();
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/tests/dict_test.rs` around lines 93 - 128, The test
test_dict_decoding is being entirely skipped on targets lacking pointer atomics;
instead, keep the test body but only conditionally choose the registration call:
after creating FrameDecoder and decoding the dictionary via
DictionaryHandle::decode_dict, use cfg(target_has_atomic = "ptr") to call
frame_dec.add_dict_handle(dict).unwrap() and in the else branch call the
owned-dictionary registration (frame_dec.add_dict(...) or the existing add_dict
API) so the rest of the loop and assertions still run; adjust references to
FrameDecoder, DictionaryHandle::decode_dict, add_dict_handle and add_dict
accordingly so only the registration line is cfg-gated rather than the whole
test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/dictionary.rs`:
- Around line 177-180: into_handle()/from_dictionary() currently wrap a
Dictionary verbatim allowing callers to mutate public fields (Dictionary::id,
Dictionary::offset_hist) and bypass validation; make handle construction enforce
the same invariants by either making into_handle/from_dictionary fallible
(return Result and validate id != 0 and offset_hist != 0/invalid) or by
re-running the existing checks before a handle is used (e.g., call the same
validation logic before reset_with_dict_handle()/init_from_dict() proceeds);
reference Dictionary::into_handle, DictionaryHandle::from_dictionary,
reset_with_dict_handle, init_from_dict, add_dict, and add_dict_handle when
adding the validation so invalid dictionaries are rejected at handle creation or
immediately before use.

---

Outside diff comments:
In `@zstd/src/tests/dict_test.rs`:
- Around line 93-128: The test test_dict_decoding is being entirely skipped on
targets lacking pointer atomics; instead, keep the test body but only
conditionally choose the registration call: after creating FrameDecoder and
decoding the dictionary via DictionaryHandle::decode_dict, use
cfg(target_has_atomic = "ptr") to call frame_dec.add_dict_handle(dict).unwrap()
and in the else branch call the owned-dictionary registration
(frame_dec.add_dict(...) or the existing add_dict API) so the rest of the loop
and assertions still run; adjust references to FrameDecoder,
DictionaryHandle::decode_dict, add_dict_handle and add_dict accordingly so only
the registration line is cfg-gated rather than the whole test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 79c6dbbe-aa0f-4b1f-ab1a-6f276ab3072a

📥 Commits

Reviewing files that changed from the base of the PR and between 4d0b8cb and 0d0d99b.

📒 Files selected for processing (3)
  • zstd/src/decoding/dictionary.rs
  • zstd/src/decoding/frame_decoder.rs
  • zstd/src/tests/dict_test.rs

Comment thread zstd/src/decoding/dictionary.rs
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 12, 2026

@coderabbitai Addressed the outside-diff nitpick: test_dict_decoding is no longer skipped on non-atomic targets. The test now always runs and only cfg-gates the dictionary registration step (add_dict_handle on atomic targets, add_dict on non-atomic targets). Included in commit 83d07c9.

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 12, 2026

@coderabbitai Re outside-diff item (dict_test.rs:93-128): this is fixed in commit 83d07c9 (test now runs on no-atomic targets with cfg only on registration branch). Please mark this as resolved on next review pass.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
grep -n "test_dict_decoding\|cfg.*target_has_atomic\|add_dict_handle\|add_dict\b" zstd/src/tests/dict_test.rs | head -50
echo "---FULL TEST---"
sed -n '1,50p' zstd/src/tests/dict_test.rs
# Find the test_dict_decoding function
ast-grep --pattern 'fn test_dict_decoding() {
  $$$
}'

Length of output: 15632


@polaz Fix confirmed. test_dict_decoding (Line 94) carries no #[cfg(...)] guard on the function itself — it runs unconditionally on all targets. The only cfg-gated sections are the registration block:

  • Lines 130–134 (target_has_atomic = "ptr"): DictionaryHandle::decode_dict + add_dict_handle
  • Lines 135–139 (not(target_has_atomic = "ptr")): Dictionary::decode_dict + add_dict

All other decode, assert, and summary logic is shared. The outside-diff nitpick is fully addressed. ✅


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:56.626Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_preserves_prefix_for_runtime_chunk_lengths` (or equivalent) includes a misaligned extra-capacity case that uses `add(1)`-offset pointers and a `chunk_plus_one` length to exercise the rounded-up wildcopy overshoot path (where `copy_multiple > copy_at_least`) and validates prefix correctness. Do not flag the absence of a misaligned-pointer case in this test — it is present by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T18:31:00.002Z
Learning: In `zstd/src/tests/dict_test.rs` (structured-world/structured-zstd, PR `#105`), the test-only helper `load_sample_dict_frame()` selects the first sorted `.zst` file from `./dict_tests/files`, parses its frame header via `crate::decoding::frame::read_frame_header`, and asserts `header.dictionary_id().is_some()` to guarantee the fixture requires a dictionary. Do not flag the absence of a dict-id guard in this helper — the assertion is present by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:51.926Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test suite for `copy_bytes_overshooting` includes a misaligned extra-capacity case that exercises the rounded-up wildcopy path (where `copy_multiple > copy_at_least`) and validates prefix correctness for that scenario. Do not flag the absence of a misaligned/overshooting extra-capacity sub-case as missing test coverage.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T20:26:15.902Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_covers_all_copy_strategies` derives copy lengths from a helper `active_chunk_size_for_tests()` (not hardcoded values) so that single-chunk, multi-chunk, and sub-chunk (fallback) cases adapt to whichever SIMD backend width is active at runtime (AVX-512/AVX2/SSE2/NEON/scalar). Do not flag hardcoded test lengths as missing coverage for other backends — the test is backend-adaptive by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T21:44:05.799Z
Learning: In `zstd/src/decoding/frame_decoder.rs` (structured-world/structured-zstd, PR `#105`), `reset_with_dict_handle` always applies the explicit dictionary: it calls `state.decoder_scratch.init_from_dict(dict.as_ref())` and sets `state.using_dict = Some(dict.id())` unconditionally after an optional mismatch guard. The mismatch guard only fires when `frame_header.dictionary_id()` is `Some` and differs from `dict.id()`; when the frame header omits a dictionary_id (`None`), the guard is skipped and the dictionary is still applied. Do not flag `reset_with_dict_handle` as ignoring the explicit dictionary for frames without a declared dict-id.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-06T01:40:24.378Z
Learning: In `zstd/benches/compare_ffi.rs` (structured-world/structured-zstd), Rust FastCOVER trains with the post-finalization content budget in both the `REPORT_DICT_TRAIN` emission path (around lines 208-225) and the Criterion benchmark path (around lines 266-280). Both paths were aligned in commit 8622344. Do not flag these ranges as using inconsistent budget values.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/frame_compressor.rs:779-805
Timestamp: 2026-04-11T07:48:58.281Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd, PR `#99`), hinted tiny frames (`source_size_hint <= 2^14` and `total_uncompressed <= 2^14`) are intentionally emitted as single-segment across ALL compression levels, including levels that produce compressed (non-raw) blocks. The `hinted_small_compressible_frames_use_single_segment_across_levels` test asserts `single_segment_flag() == true` for compressible hinted inputs (e.g. 4 KiB repeated log lines). Do not re-suggest asserting `single_segment_flag() == false` for compressible hinted frames — the donor-parity design requires single-segment for all hinted tiny frames.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 105
File: zstd/src/decoding/frame_decoder.rs:327-358
Timestamp: 2026-04-12T09:12:31.790Z
Learning: In `zstd/src/decoding/frame_decoder.rs` (structured-world/structured-zstd), `FrameDecoder::add_dict()` and `add_dict_handle()` both call the private `validate_registered_dictionary()` helper, which rejects `dictionary.id == 0` with `DictionaryDecodeError::ZeroDictionaryId` and any zero entry in `dictionary.offset_hist` with `DictionaryDecodeError::ZeroRepeatOffsetInDictionary`. This mirrors the encoder's `set_dictionary` validation. Do not flag these decoder registration paths as missing zero-id or zero-offset checks.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T08:52:47.911Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd, PR `#99`), `hinted_small_compressible_frames_use_single_segment_across_levels` asserts `single_segment_flag() == true` for hinted 4 KiB compressible frames across all levels (Fastest, Default, Better, Best, Level(0), Level(3), Level(4), Level(11)). A helper `assert_not_raw_for_level` additionally asserts the first block is not `BlockType::Raw`, that `compressed.len() < data.len()`, and that FFI roundtrip decodes correctly. Do not flag the test as missing size-reduction or non-Raw block assertions.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/frame_compressor.rs:779-805
Timestamp: 2026-04-11T07:48:58.281Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd, PR `#99`), a temporary compatibility guard exists for sub-256-byte compressed payloads: the 1-byte-FCS single-segment form is not yet fully C-FFI compatible for payloads below 256 bytes, and a compat guard was added to handle this case while the path is hardened. Do not flag this guard as unnecessary.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:498-509
Timestamp: 2026-04-02T22:26:07.979Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), `best_level_does_not_regress_vs_better` uses a `<=` (not strict `<`) assertion because the `repeat_offset_fixture(b"HelloWorld", ...)` input is simple enough that HC saturates at both Better (16 candidates) and Best (32 candidates) search depths, producing identical compressed sizes (~30243 bytes). Strict `<` would be a false positive on this fixture. The strict `Best < Better` quality assertion lives in `cross_validation::best_level_beats_better_on_corpus_proxy` on the decodecorpus sample. Do not re-flag the `<=` as a weakened guard.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/match_generator.rs:457-463
Timestamp: 2026-04-11T07:48:12.353Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#99`), `MatchGeneratorDriver::skip_matching_for_dictionary_priming` calls `row_matcher_mut().skip_matching_with_hint(Some(false))` for the `MatcherBackend::Row` case — explicitly requesting dense seeding, consistent with Simple (`skip_matching_with_hint(Some(false))`) and HashChain (`skip_matching(Some(false))`) backends. Do not flag this as using an implicit/unhinted path or suggest reverting to `skip_matching()`.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 44
File: zstd/src/encoding/frame_compressor.rs:0-0
Timestamp: 2026-03-28T22:55:01.577Z
Learning: In `zstd/src/encoding/frame_compressor.rs`, `FrameCompressor::set_dictionary` returns `Result<Option<Dictionary>, DictionaryDecodeError>` and enforces two fail-fast programmer-error contracts via `Err(...)` rather than panics: (1) `dictionary.id == 0` returns `DictionaryDecodeError::ZeroDictionaryId`; (2) any zero entry in `dictionary.offset_hist` returns `DictionaryDecodeError::ZeroRepeatOffsetInDictionary`. The fallible parsing paths (`Dictionary::decode_dict`, `Dictionary::from_raw_content`) apply the same rejections at input boundaries. Do not flag these as inconsistencies — the uniform `Result`-based contract is intentional.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/match_generator.rs:1675-1701
Timestamp: 2026-04-10T07:20:51.122Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd), `DfastMatchGenerator::skip_matching` (the dense/non-sparse path, i.e. when `used_sparse` is false) backfills the previous-slice tail (`current_abs_start.saturating_sub(DFAST_MIN_MATCH_LEN + 3) .. current_abs_start`) before calling `insert_positions(current_abs_start, current_abs_end)`. This mirrors the `skip_matching_dense` backfill and `HcMatchGenerator::backfill_boundary_positions` patterns, ensuring positions at block boundaries become hashable as soon as the next slice extends history. A regression test `dfast_dense_skip_matching_backfills_previous_tail_for_next_block` covers immediate cross-boundary matching after a dense skip. Do not flag the absence of this backfill as a missing step.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-10T12:52:14.220Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd), `HcMatchGenerator::skip_matching`'s dense-tail reseed (the `Some(true)` / sparse incompressible path) iterates position-by-position over `tail_start..current_abs_end` and skips positions where `(pos - current_abs_start) % INCOMPRESSIBLE_SKIP_STEP == 0`. This prevents sparse-grid positions already indexed in the sparse pass from being reinserted and creating self-loop chain entries. A regression test `hc_sparse_skip_matching_does_not_reinsert_sparse_tail_positions` (line ~4086) covers this invariant. Do not flag the absence of a bulk `insert_positions(tail_start, current_abs_end)` call in the sparse-hint dense-tail path as a missing optimization.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/match_generator.rs:1682-1687
Timestamp: 2026-04-09T23:59:08.161Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd), `DfastMatchGenerator::skip_matching_dense` backfills the previous-slice tail (`max(current_abs_start - 3, history_abs_start) .. current_abs_start`) before seeding the current slice via `insert_positions`. This ensures that positions at internal priming-slice boundaries become hashable as soon as the next slice extends history, keeping cross-slice dictionary priming fully dense. The pattern mirrors `HcMatchGenerator::backfill_boundary_positions`. Do not flag the absence of this backfill as a missing step in `skip_matching_dense`.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-09T23:17:57.736Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd), `MatchGeneratorDriver::prime_with_dictionary` uses a dedicated dense seeding path for the `DfastMatchGenerator` backend: `skip_matching_for_dictionary_priming` routes to `skip_matching_dense`, bypassing the incompressible heuristic and sparse step insertion (`insert_positions_with_step`). This ensures primed dictionary history is always densely seeded and cross-block matching invariants are preserved for dictionary-compressed streams. Do not flag `prime_with_dictionary` as potentially using sparse/incompressible stepping.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T11:56:17.795Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#102`), after commit 46927db, hash-mix kernel dispatch was moved from the global `HASH_MIX_KERNEL` AtomicU8 hot-path read into a per-matcher instance field `hash_mix_kernel` stored directly in `DfastMatchGenerator` and `RowMatchGenerator`. The hot path no longer performs an atomic load or global branch; kernel selection happens once at matcher initialisation and is subsequently a plain struct-field read. Do not flag the hot path as doing a per-call atomic load — that pattern was superseded by commit 46927db.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T07:15:56.578Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#99`), `RowMatchGenerator::skip_matching_with_hint(Some(true))` performs sparse-prefix insertion via `insert_positions_with_step(current_abs_start, current_abs_end, INCOMPRESSIBLE_SKIP_STEP)` followed by a dense-tail reseed over `ROW_MIN_MATCH_LEN + INCOMPRESSIBLE_SKIP_STEP` positions, skipping sparse-grid positions (`!(pos - current_abs_start).is_multiple_of(INCOMPRESSIBLE_SKIP_STEP)`). The `MatcherBackend::Row` dispatch in `MatchGeneratorDriver::skip_matching_with_hint` now passes the hint through to the Row backend. A regression test `row_skip_matching_with_incompressible_hint_uses_sparse_prefix` covers this. Do not flag Row as ignoring incompressible hints or defaulting to dense-only skip.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T11:31:54.265Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#102`), the test `row_hash_and_row_extracts_high_bits` acquires `HASH_MIX_KERNEL_TEST_LOCK` before any `hash_mix_u64` call (fixed in commit 904d2b0). This lock guards all tests that force a specific `HashMixKernel` via `with_forced_hash_mix_kernel(...)` to prevent cross-test kernel races caused by the global `HASH_MIX_KERNEL` atomic. Do not flag the absence of this lock as missing — it is acquired at the top of each affected test.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T11:31:46.064Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#102`), the test-only `with_forced_hash_mix_kernel(...)` helper forces the global `HASH_MIX_KERNEL` atomic to a specific kernel for coverage. Tests that call this helper must acquire a shared mutex lock *before* the `with_forced_hash_mix_kernel` call (not only around the assertion) to prevent cross-test kernel races when `cargo nextest` runs tests in parallel. This pattern was locked in commit 904d2b0. Do not flag the lock acquisition as unnecessarily early.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 73
File: zstd/src/dictionary/mod.rs:337-345
Timestamp: 2026-04-05T21:31:56.797Z
Learning: In `zstd/src/decoding/dictionary.rs` (structured-world/structured-zstd), `Dictionary::decode_dict()` parses the full zstd dictionary wire format: magic (4 bytes) → dict_id (4 bytes) → Huffman table (variable, consumed via `build_decoder`) → OF FSE table → ML FSE table → LL FSE table → 3 × u32 repeat offsets → raw content. This matches the layout produced by `finalize_raw_dict()` in `zstd/src/dictionary/mod.rs`. Do not flag these as mismatched layouts.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T08:17:53.856Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd), `set_source_size_hint` applies only to the payload (uncompressed source) bytes, not to dictionary priming. Dictionary priming is separate and does not inflate the advertised frame window size. Do not suggest including dictionary bytes in the size hint passed to `set_source_size_hint`.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
sed -n '85,140p' zstd/src/tests/dict_test.rs

Length of output: 1819


@polaz Fix confirmed from the current branch. test_dict_decoding now has no #[cfg(...)] gate on the #[test] attribute itself — it runs on all targets. The cfg is applied only to the two inner registration branches:

#[cfg(target_has_atomic = "ptr")]
{
    let dict = DictionaryHandle::decode_dict(&dict).unwrap();
    frame_dec.add_dict_handle(dict).unwrap();
}
#[cfg(not(target_has_atomic = "ptr"))]
{
    let dict = Dictionary::decode_dict(&dict).unwrap();
    frame_dec.add_dict(dict).unwrap();
}

The outside-diff item is fully resolved. ✅


✏️ Learnings added
Learnt from: polaz
URL: https://github.com/structured-world/structured-zstd/pull/105

Timestamp: 2026-04-12T09:45:02.645Z
Learning: In `zstd/src/tests/dict_test.rs` (structured-world/structured-zstd, PR `#105`), `test_dict_decoding` runs unconditionally on all targets (no `#[cfg(...)]` gate on the `#[test]` function). Only the dictionary registration step is cfg-gated: `add_dict_handle(DictionaryHandle)` on `target_has_atomic = "ptr"` targets and `add_dict(Dictionary)` on non-atomic targets. Do not flag `test_dict_decoding` as skipped or unavailable on no-atomic targets.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:56.626Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_preserves_prefix_for_runtime_chunk_lengths` (or equivalent) includes a misaligned extra-capacity case that uses `add(1)`-offset pointers and a `chunk_plus_one` length to exercise the rounded-up wildcopy overshoot path (where `copy_multiple > copy_at_least`) and validates prefix correctness. Do not flag the absence of a misaligned-pointer case in this test — it is present by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T18:31:00.002Z
Learning: In `zstd/src/tests/dict_test.rs` (structured-world/structured-zstd, PR `#105`), the test-only helper `load_sample_dict_frame()` selects the first sorted `.zst` file from `./dict_tests/files`, parses its frame header via `crate::decoding::frame::read_frame_header`, and asserts `header.dictionary_id().is_some()` to guarantee the fixture requires a dictionary. Do not flag the absence of a dict-id guard in this helper — the assertion is present by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:51.926Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test suite for `copy_bytes_overshooting` includes a misaligned extra-capacity case that exercises the rounded-up wildcopy path (where `copy_multiple > copy_at_least`) and validates prefix correctness for that scenario. Do not flag the absence of a misaligned/overshooting extra-capacity sub-case as missing test coverage.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 105
File: zstd/src/decoding/dictionary.rs:177-180
Timestamp: 2026-04-12T09:44:15.051Z
Learning: In `zstd/src/decoding/frame_decoder.rs` (structured-world/structured-zstd, PR `#105`), `reset_with_dict_handle()` calls the shared `validate_registered_dictionary()` helper before `state.decoder_scratch.init_from_dict(dict.as_ref())`, rejecting handles with `dictionary.id == 0` (`DictionaryDecodeError::ZeroDictionaryId`) or any zero entry in `dictionary.offset_hist` (`DictionaryDecodeError::ZeroRepeatOffsetInDictionary`). A regression test `test_reset_with_dict_handle_rejects_invalid_handle_invariants` covers both cases. Do not flag `reset_with_dict_handle` as bypassing invariant validation.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T20:26:15.902Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_covers_all_copy_strategies` derives copy lengths from a helper `active_chunk_size_for_tests()` (not hardcoded values) so that single-chunk, multi-chunk, and sub-chunk (fallback) cases adapt to whichever SIMD backend width is active at runtime (AVX-512/AVX2/SSE2/NEON/scalar). Do not flag hardcoded test lengths as missing coverage for other backends — the test is backend-adaptive by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T21:44:05.799Z
Learning: In `zstd/src/decoding/frame_decoder.rs` (structured-world/structured-zstd, PR `#105`), `reset_with_dict_handle` always applies the explicit dictionary: it calls `state.decoder_scratch.init_from_dict(dict.as_ref())` and sets `state.using_dict = Some(dict.id())` unconditionally after an optional mismatch guard. The mismatch guard only fires when `frame_header.dictionary_id()` is `Some` and differs from `dict.id()`; when the frame header omits a dictionary_id (`None`), the guard is skipped and the dictionary is still applied. Do not flag `reset_with_dict_handle` as ignoring the explicit dictionary for frames without a declared dict-id.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-06T01:40:24.378Z
Learning: In `zstd/benches/compare_ffi.rs` (structured-world/structured-zstd), Rust FastCOVER trains with the post-finalization content budget in both the `REPORT_DICT_TRAIN` emission path (around lines 208-225) and the Criterion benchmark path (around lines 266-280). Both paths were aligned in commit 8622344. Do not flag these ranges as using inconsistent budget values.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/frame_compressor.rs:779-805
Timestamp: 2026-04-11T07:48:58.281Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd, PR `#99`), hinted tiny frames (`source_size_hint <= 2^14` and `total_uncompressed <= 2^14`) are intentionally emitted as single-segment across ALL compression levels, including levels that produce compressed (non-raw) blocks. The `hinted_small_compressible_frames_use_single_segment_across_levels` test asserts `single_segment_flag() == true` for compressible hinted inputs (e.g. 4 KiB repeated log lines). Do not re-suggest asserting `single_segment_flag() == false` for compressible hinted frames — the donor-parity design requires single-segment for all hinted tiny frames.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 105
File: zstd/src/decoding/frame_decoder.rs:327-358
Timestamp: 2026-04-12T09:12:31.790Z
Learning: In `zstd/src/decoding/frame_decoder.rs` (structured-world/structured-zstd), `FrameDecoder::add_dict()` and `add_dict_handle()` both call the private `validate_registered_dictionary()` helper, which rejects `dictionary.id == 0` with `DictionaryDecodeError::ZeroDictionaryId` and any zero entry in `dictionary.offset_hist` with `DictionaryDecodeError::ZeroRepeatOffsetInDictionary`. This mirrors the encoder's `set_dictionary` validation. Do not flag these decoder registration paths as missing zero-id or zero-offset checks.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T08:52:47.911Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd, PR `#99`), `hinted_small_compressible_frames_use_single_segment_across_levels` asserts `single_segment_flag() == true` for hinted 4 KiB compressible frames across all levels (Fastest, Default, Better, Best, Level(0), Level(3), Level(4), Level(11)). A helper `assert_not_raw_for_level` additionally asserts the first block is not `BlockType::Raw`, that `compressed.len() < data.len()`, and that FFI roundtrip decodes correctly. Do not flag the test as missing size-reduction or non-Raw block assertions.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:498-509
Timestamp: 2026-04-02T22:26:07.979Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), `best_level_does_not_regress_vs_better` uses a `<=` (not strict `<`) assertion because the `repeat_offset_fixture(b"HelloWorld", ...)` input is simple enough that HC saturates at both Better (16 candidates) and Best (32 candidates) search depths, producing identical compressed sizes (~30243 bytes). Strict `<` would be a false positive on this fixture. The strict `Best < Better` quality assertion lives in `cross_validation::best_level_beats_better_on_corpus_proxy` on the decodecorpus sample. Do not re-flag the `<=` as a weakened guard.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/match_generator.rs:457-463
Timestamp: 2026-04-11T07:48:12.353Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#99`), `MatchGeneratorDriver::skip_matching_for_dictionary_priming` calls `row_matcher_mut().skip_matching_with_hint(Some(false))` for the `MatcherBackend::Row` case — explicitly requesting dense seeding, consistent with Simple (`skip_matching_with_hint(Some(false))`) and HashChain (`skip_matching(Some(false))`) backends. Do not flag this as using an implicit/unhinted path or suggest reverting to `skip_matching()`.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 44
File: zstd/src/encoding/frame_compressor.rs:0-0
Timestamp: 2026-03-28T22:55:01.577Z
Learning: In `zstd/src/encoding/frame_compressor.rs`, `FrameCompressor::set_dictionary` returns `Result<Option<Dictionary>, DictionaryDecodeError>` and enforces two fail-fast programmer-error contracts via `Err(...)` rather than panics: (1) `dictionary.id == 0` returns `DictionaryDecodeError::ZeroDictionaryId`; (2) any zero entry in `dictionary.offset_hist` returns `DictionaryDecodeError::ZeroRepeatOffsetInDictionary`. The fallible parsing paths (`Dictionary::decode_dict`, `Dictionary::from_raw_content`) apply the same rejections at input boundaries. Do not flag these as inconsistencies — the uniform `Result`-based contract is intentional.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/match_generator.rs:1675-1701
Timestamp: 2026-04-10T07:20:51.122Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd), `DfastMatchGenerator::skip_matching` (the dense/non-sparse path, i.e. when `used_sparse` is false) backfills the previous-slice tail (`current_abs_start.saturating_sub(DFAST_MIN_MATCH_LEN + 3) .. current_abs_start`) before calling `insert_positions(current_abs_start, current_abs_end)`. This mirrors the `skip_matching_dense` backfill and `HcMatchGenerator::backfill_boundary_positions` patterns, ensuring positions at block boundaries become hashable as soon as the next slice extends history. A regression test `dfast_dense_skip_matching_backfills_previous_tail_for_next_block` covers immediate cross-boundary matching after a dense skip. Do not flag the absence of this backfill as a missing step.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-10T12:52:14.220Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd), `HcMatchGenerator::skip_matching`'s dense-tail reseed (the `Some(true)` / sparse incompressible path) iterates position-by-position over `tail_start..current_abs_end` and skips positions where `(pos - current_abs_start) % INCOMPRESSIBLE_SKIP_STEP == 0`. This prevents sparse-grid positions already indexed in the sparse pass from being reinserted and creating self-loop chain entries. A regression test `hc_sparse_skip_matching_does_not_reinsert_sparse_tail_positions` (line ~4086) covers this invariant. Do not flag the absence of a bulk `insert_positions(tail_start, current_abs_end)` call in the sparse-hint dense-tail path as a missing optimization.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/match_generator.rs:1682-1687
Timestamp: 2026-04-09T23:59:08.161Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd), `DfastMatchGenerator::skip_matching_dense` backfills the previous-slice tail (`max(current_abs_start - 3, history_abs_start) .. current_abs_start`) before seeding the current slice via `insert_positions`. This ensures that positions at internal priming-slice boundaries become hashable as soon as the next slice extends history, keeping cross-slice dictionary priming fully dense. The pattern mirrors `HcMatchGenerator::backfill_boundary_positions`. Do not flag the absence of this backfill as a missing step in `skip_matching_dense`.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-09T23:17:57.736Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd), `MatchGeneratorDriver::prime_with_dictionary` uses a dedicated dense seeding path for the `DfastMatchGenerator` backend: `skip_matching_for_dictionary_priming` routes to `skip_matching_dense`, bypassing the incompressible heuristic and sparse step insertion (`insert_positions_with_step`). This ensures primed dictionary history is always densely seeded and cross-block matching invariants are preserved for dictionary-compressed streams. Do not flag `prime_with_dictionary` as potentially using sparse/incompressible stepping.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T11:56:17.795Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#102`), after commit 46927db, hash-mix kernel dispatch was moved from the global `HASH_MIX_KERNEL` AtomicU8 hot-path read into a per-matcher instance field `hash_mix_kernel` stored directly in `DfastMatchGenerator` and `RowMatchGenerator`. The hot path no longer performs an atomic load or global branch; kernel selection happens once at matcher initialisation and is subsequently a plain struct-field read. Do not flag the hot path as doing a per-call atomic load — that pattern was superseded by commit 46927db.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T11:31:54.265Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#102`), the test `row_hash_and_row_extracts_high_bits` acquires `HASH_MIX_KERNEL_TEST_LOCK` before any `hash_mix_u64` call (fixed in commit 904d2b0). This lock guards all tests that force a specific `HashMixKernel` via `with_forced_hash_mix_kernel(...)` to prevent cross-test kernel races caused by the global `HASH_MIX_KERNEL` atomic. Do not flag the absence of this lock as missing — it is acquired at the top of each affected test.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T11:31:46.064Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#102`), the test-only `with_forced_hash_mix_kernel(...)` helper forces the global `HASH_MIX_KERNEL` atomic to a specific kernel for coverage. Tests that call this helper must acquire a shared mutex lock *before* the `with_forced_hash_mix_kernel` call (not only around the assertion) to prevent cross-test kernel races when `cargo nextest` runs tests in parallel. This pattern was locked in commit 904d2b0. Do not flag the lock acquisition as unnecessarily early.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T07:15:56.578Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#99`), `RowMatchGenerator::skip_matching_with_hint(Some(true))` performs sparse-prefix insertion via `insert_positions_with_step(current_abs_start, current_abs_end, INCOMPRESSIBLE_SKIP_STEP)` followed by a dense-tail reseed over `ROW_MIN_MATCH_LEN + INCOMPRESSIBLE_SKIP_STEP` positions, skipping sparse-grid positions (`!(pos - current_abs_start).is_multiple_of(INCOMPRESSIBLE_SKIP_STEP)`). The `MatcherBackend::Row` dispatch in `MatchGeneratorDriver::skip_matching_with_hint` now passes the hint through to the Row backend. A regression test `row_skip_matching_with_incompressible_hint_uses_sparse_prefix` covers this. Do not flag Row as ignoring incompressible hints or defaulting to dense-only skip.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 73
File: zstd/src/dictionary/mod.rs:337-345
Timestamp: 2026-04-05T21:31:56.797Z
Learning: In `zstd/src/decoding/dictionary.rs` (structured-world/structured-zstd), `Dictionary::decode_dict()` parses the full zstd dictionary wire format: magic (4 bytes) → dict_id (4 bytes) → Huffman table (variable, consumed via `build_decoder`) → OF FSE table → ML FSE table → LL FSE table → 3 × u32 repeat offsets → raw content. This matches the layout produced by `finalize_raw_dict()` in `zstd/src/dictionary/mod.rs`. Do not flag these as mismatched layouts.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T08:17:53.856Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd), `set_source_size_hint` applies only to the payload (uncompressed source) bytes, not to dictionary priming. Dictionary priming is separate and does not inflate the advertised frame window size. Do not suggest including dictionary bytes in the size hint passed to `set_source_size_hint`.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/frame_decoder.rs`:
- Around line 294-322: In reset_with_dict_handle ensure consistent
DictionaryHandle accessor usage: replace dict.as_ref() with dict.as_dict() so
the call to initialize decoder_scratch uses the same accessor pattern used
elsewhere (see reset_with_dict_handle and other usages of
DictionaryHandle::as_dict) — update the call to
state.decoder_scratch.init_from_dict to pass dict.as_dict() and keep the rest of
the method unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 07286ab3-b60f-44fc-b7f9-668a8bdaaa35

📥 Commits

Reviewing files that changed from the base of the PR and between 0d0d99b and 83d07c9.

📒 Files selected for processing (2)
  • zstd/src/decoding/frame_decoder.rs
  • zstd/src/tests/dict_test.rs

Comment thread zstd/src/decoding/frame_decoder.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/dictionary.rs`:
- Around line 317-323: The test
dictionary_handle_from_raw_content_supports_as_ref currently only calls
handle.as_dict(), so it doesn't exercise the public AsRef<Dictionary> impl;
update the test to directly call AsRef by asserting via handle.as_ref() (or
replace as_dict() assertions with assertions on handle.as_ref().id and
handle.as_ref().dict_content.as_slice()) after creating the Dictionary with
Dictionary::from_raw_content and into_handle(), ensuring the AsRef<Dictionary>
implementation is validated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6c0e5e43-6aec-43d5-99ee-930850568a17

📥 Commits

Reviewing files that changed from the base of the PR and between 83d07c9 and b580b61.

📒 Files selected for processing (2)
  • zstd/src/decoding/dictionary.rs
  • zstd/src/decoding/frame_decoder.rs

Comment thread zstd/src/decoding/dictionary.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

@polaz polaz merged commit 495a38d into main Apr 12, 2026
13 checks passed
@polaz polaz deleted the perf/#86-ddict-handle branch April 12, 2026 10:14
@sw-release-bot sw-release-bot Bot mentioned this pull request Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(decoding): pre-parsed dictionary handle for repeated dict decode

2 participants