perf(decoding): add runtime-dispatched simd wildcopy by polaz · Pull Request #85 · structured-world/structured-zstd

polaz · 2026-04-08T20:06:50Z

Summary

add decoding::simd_copy with runtime-dispatched overshooting copy paths (AVX-512/AVX2/SSE2 on x86, NEON on aarch64, scalar fallback)
route ringbuffer match-copy hot paths through the new helper
cache x86 feature dispatch with OnceLock to avoid repeated hot-path feature checks
add explicit safety contract and debug verification in copy_bytes_overshooting
make ringbuffer copy-strategy test derive lengths from runtime active chunk size

Validation

cargo fmt --all -- --check
cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings
cargo clippy -p structured-zstd --features hash,std,dict_builder,bench_internals --benches -- -D warnings
cargo nextest run --workspace
cargo test --doc --workspace

Benchmark note

On this branch, decode benchmarks did not show statistically significant gains yet (mostly noise / no-change on tested scenarios). This PR focuses on architectural cleanup and hot-path dispatch tightening; x86_64 AVX-512 behavior should be validated on hardware/runners that expose AVX-512.

Summary by CodeRabbit

Refactor
- Consolidated internal byte-copy logic into a single runtime-optimized implementation that selects scalar or hardware-accelerated paths when available, simplifying call sites and improving copy performance and reliability.
Tests
- Updated and added tests to validate correctness across fallback and optimized copy paths, verify chunk-size selection and overshoot behavior, and ensure prefix preservation in overshoot scenarios.

coderabbitai · 2026-04-08T20:07:04Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 93c89fa5-bf90-4917-a88c-7f0a385a5559

📥 Commits

Reviewing files that changed from the base of the PR and between 28d8956 and 01b3e8d.

📒 Files selected for processing (1)

zstd/src/decoding/ringbuffer.rs

📝 Walkthrough

Walkthrough

Added a new internal SIMD-aware copy module simd_copy and replaced the ringbuffer's local overshooting copy with calls into simd_copy::copy_bytes_overshooting; tests updated to use the module's test helper.

Changes

Cohort / File(s)	Summary
Decoding mod registration `zstd/src/decoding/mod.rs`	Registered new private module: `mod simd_copy;`.
New SIMD copy implementation `zstd/src/decoding/simd_copy.rs`	Added `pub(crate) unsafe fn copy_bytes_overshooting((src_ptr, src_cap), (dst_ptr, dst_cap), copy_at_least)` with strategy selection (scalar, SSE2, AVX2, AVX-512, NEON), runtime feature detection on x86 when `std` enabled, debug assertions, and `active_chunk_size_for_tests()`; includes unit tests and fallback to exact `copy_from_nonoverlapping`.
Ring buffer integration & tests `zstd/src/decoding/ringbuffer.rs`	Removed file-local `copy_bytes_overshooting`; added `use super::simd_copy;`, replaced local/`copy_from_nonoverlapping` copy sites with `simd_copy::copy_bytes_overshooting`, adjusted capacity calculations for segmented copies, and updated unit tests to derive chunk sizes from the new test helper.

Sequence Diagram(s)

sequenceDiagram
    participant RingBuffer as "RingBuffer (caller)"
    participant SIMD as "decoding::simd_copy"
    participant Memory as "src/dst memory"
    RingBuffer->>SIMD: copy_bytes_overshooting((src_ptr, src_cap), (dst_ptr, dst_cap), copy_at_least)
    SIMD->>SIMD: select strategy & chunk size (cfg/runtime)
    alt min(src_cap,dst_cap) >= rounded_chunk
        SIMD->>Memory: perform chunked loads/stores (SIMD or scalar chunked loops)
    else
        SIMD->>Memory: call copy_from_nonoverlapping for exact copy_at_least
    end
    SIMD-->>RingBuffer: return (void), debug-assert checked

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

research(decoding): evaluate advanced SIMD wildcopy paths beyond current baseline #87 — The new SIMD-backed wildcopy directly implements the hot-path behavior discussed in this issue.
perf(decoding): SIMD wildcopy for literal and match memcpy #68 — Adds the same overshooting/wildcopy primitive and wires it into the ringbuffer, matching this issue's objective.

"A rabbit hopped with nimble paws,
swapping bytes without a pause.
Chunks aligned and tests all bright,
I copied fast into the night. 🐇✨"

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely describes the main change: adding a runtime-dispatched SIMD wildcard copy feature to the decoding module.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/#68-simd-wildcopy

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR introduces a new runtime-dispatched “overshooting” copy helper to speed up decoder hot-path copies across multiple SIMD backends, and routes ringbuffer match-copy logic through that helper.

Changes:

Added decoding::simd_copy with runtime-selected AVX-512/AVX2/SSE2 (x86/x86_64), NEON (aarch64), and scalar fallback overshooting copy paths.
Updated ringbuffer copy hot paths to use the new helper instead of the local implementation / direct copy_from_nonoverlapping in several cases.
Wired the new module into decoding::mod.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
zstd/src/decoding/simd_copy.rs	New runtime-dispatched SIMD/scalar overshooting copy implementation.
zstd/src/decoding/ringbuffer.rs	Switched multiple internal copy sites to use `simd_copy::copy_bytes_overshooting`; removed the old local helper.
zstd/src/decoding/mod.rs	Added the new `simd_copy` module.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

zstd/src/decoding/ringbuffer.rs (1)
1030-1064: ⚠️ Potential issue | 🟡 Minor

Make this test derive its lengths from the active chunk size.

The fixed 16/32/17 cases no longer cover “all copy strategies” on wider backends: with AVX2, 32 only exercises a single chunk, and with AVX-512 both 16 and 32 fall back to copy_from_nonoverlapping. Please key these sizes off the runtime-selected chunk size, or move the test next to simd_copy.rs so the SIMD loop stays exercised on every supported backend.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/decoding/ringbuffer.rs` around lines 1030 - 1064, The test
copy_bytes_overshooting_covers_all_copy_strategies uses hardcoded lengths
(16/32/17) that don't exercise all SIMD paths on wider backends; change it to
derive lengths from the runtime chunk size provided by simd_copy so the
single-chunk, multi-chunk, and fallback cases always map to 1*chunk, 2*chunk,
and chunk+1 respectively. Update the test to query the chunk size (use the
public symbol in simd_copy such as a CHUNK/CHUNK_SIZE constant or a chunk_size()
accessor) and compute single_len = chunk, multi_len = chunk * 2, fallback_len =
chunk + 1, then use those lengths in the copy_bytes_overshooting calls in
copy_bytes_overshooting_covers_all_copy_strategies.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/simd_copy.rs`:
- Around line 1-2: The import of the is_x86_feature_detected macro is from the
wrong module; replace the use of core::arch::is_x86_feature_detected with
std::arch::is_x86_feature_detected in the cfg block (the line currently under
#[cfg(all(feature = "std", any(target_arch = "x86", target_arch = "x86_64")))]
that brings in is_x86_feature_detected) so the code uses the macro provided by
std::arch.

---

Outside diff comments:
In `@zstd/src/decoding/ringbuffer.rs`:
- Around line 1030-1064: The test
copy_bytes_overshooting_covers_all_copy_strategies uses hardcoded lengths
(16/32/17) that don't exercise all SIMD paths on wider backends; change it to
derive lengths from the runtime chunk size provided by simd_copy so the
single-chunk, multi-chunk, and fallback cases always map to 1*chunk, 2*chunk,
and chunk+1 respectively. Update the test to query the chunk size (use the
public symbol in simd_copy such as a CHUNK/CHUNK_SIZE constant or a chunk_size()
accessor) and compute single_len = chunk, multi_len = chunk * 2, fallback_len =
chunk + 1, then use those lengths in the copy_bytes_overshooting calls in
copy_bytes_overshooting_covers_all_copy_strategies.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: aef01b78-5497-483b-a6de-767f87f1a793

📥 Commits

Reviewing files that changed from the base of the PR and between 9f905ad and 283b72c.

📒 Files selected for processing (3)

zstd/src/decoding/mod.rs
zstd/src/decoding/ringbuffer.rs
zstd/src/decoding/simd_copy.rs

polaz · 2026-04-08T20:25:55Z

@coderabbitai Fixed outside-diff finding in zstd/src/decoding/ringbuffer.rs: copy_bytes_overshooting_covers_all_copy_strategies now derives lengths from runtime chunk size via active_chunk_size_for_tests(), so single/multi/fallback cases adapt to active backend width.

coderabbitai · 2026-04-08T20:26:17Z

Tip

For best results, initiate chat on the files or code changes.

@polaz Thanks for the update! Using active_chunk_size_for_tests() to derive lengths at runtime is a solid approach — it keeps the test meaningful across all SIMD backends (AVX-512/AVX2/SSE2/NEON/scalar) without hardcoding widths that would only be correct for one specific backend. The single-chunk, multi-chunk, and sub-chunk (fallback) cases will now correctly probe the boundaries of whichever path is active at test time.

✏️ Learnings added

Learnt from: polaz
URL: https://github.com/structured-world/structured-zstd/pull/85

Timestamp: 2026-04-08T20:26:15.902Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_covers_all_copy_strategies` derives copy lengths from a helper `active_chunk_size_for_tests()` (not hardcoded values) so that single-chunk, multi-chunk, and sub-chunk (fallback) cases adapt to whichever SIMD backend width is active at runtime (AVX-512/AVX2/SSE2/NEON/scalar). Do not flag hardcoded test lengths as missing coverage for other backends — the test is backend-adaptive by design.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-06T01:40:24.378Z
Learning: In `zstd/benches/compare_ffi.rs` (structured-world/structured-zstd), Rust FastCOVER trains with the post-finalization content budget in both the `REPORT_DICT_TRAIN` emission path (around lines 208-225) and the Criterion benchmark path (around lines 266-280). Both paths were aligned in commit 8622344. Do not flag these ranges as using inconsistent budget values.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T16:00:30.438Z
Learning: In `zstd/src/encoding/match_generator.rs`, the `RowMatchGenerator` (used for `CompressionLevel::Level(4)` / `MatcherBackend::Row`) uses a stable 4-byte hash key, consistent with the 4-byte lookahead constraint shared by `HcMatchGenerator`. Previous-block tail positions are backfilled into the row tables before matching/skip begins (analogous to `backfill_boundary_positions` for the HC backend), and a regression test covers cross-boundary tail reuse. Do not flag missing backfill or key-width issues for the Row backend.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T07:19:26.176Z
Learning: In `cli/src/main.rs` (structured-world/structured-zstd), path-based compression calls `set_source_size_hint(source_size as u64)` (NOT `set_pledged_content_size`) for the filesystem metadata size. This is intentional: `metadata().len()` is unreliable for pseudo-files (`/proc/*`), FIFOs, and growing inputs, so the hint must remain advisory (only scaling matcher/window parameters) and must not impose a hard byte contract on the encoder. Do not flag the absence of `set_pledged_content_size` in path-based compression as a missing feature.

Learnt from: CR
Repo: structured-world/structured-zstd PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-26T15:15:32.318Z
Learning: Avoid suggesting architectural changes that diverge too far from upstream, as this is a fork of ruzstd (KillingSpark/zstd-rs)

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:498-509
Timestamp: 2026-04-02T22:26:07.979Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), `best_level_does_not_regress_vs_better` uses a `<=` (not strict `<`) assertion because the `repeat_offset_fixture(b"HelloWorld", ...)` input is simple enough that HC saturates at both Better (16 candidates) and Best (32 candidates) search depths, producing identical compressed sizes (~30243 bytes). Strict `<` would be a false positive on this fixture. The strict `Best < Better` quality assertion lives in `cross_validation::best_level_beats_better_on_corpus_proxy` on the decodecorpus sample. Do not re-flag the `<=` as a weakened guard.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T16:36:03.248Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` appears asymmetric to `new()` in its pre-allocation, but the asymmetry is intentional: `decoder_scratch.reset(window_size)` internally calls `DecodeBuffer::reset()` which already reserves `window_size`. Therefore `reset()` only needs to call `scratch.buffer.reserve(frame_content_size)` on top of that for the FCS delta, rather than explicitly reserving `window_size` again as `new()` does. Do not re-flag this as a bug or asymmetry.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1554-1559
Timestamp: 2026-04-02T12:03:04.619Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` has an early-return guard at `abs_pos >= u32::MAX` that disables all further HC insertions for streams >4 GiB, degrading `CompressionLevel::Better` to repcodes-only. This is a known limitation tracked in issue `#51`; the proper fix (table rebasing) is deferred. A TODO comment and a limitation note on `CompressionLevel::Better` doc were added in PR `#48`. Do not re-flag this as a new bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T17:39:58.627Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` intentionally does NOT reserve the frame-content-size (FCS) delta beyond `window_size`. Two cases cover the design space: (1) multi-segment frames where FCS >> window_size — the delta can reach ~99 MiB, and eagerly reserving it would penalize streaming callers; (2) single-segment frames where window_size == FCS — the delta is always 0 so there is nothing to reserve. Incremental growth during decoding is handled by the per-block `MAX_BLOCK_SIZE` (128 KiB) reserve in `execute_sequences`. Do not flag the absence of an FCS-delta reserve in `reset()` as a missing pre-allocation.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T08:17:53.856Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd), `set_source_size_hint` applies only to the payload (uncompressed source) bytes, not to dictionary priming. Dictionary priming is separate and does not inflate the advertised frame window size. Do not suggest including dictionary bytes in the size hint passed to `set_source_size_hint`.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1531-1535
Timestamp: 2026-04-01T15:33:17.229Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` intentionally casts `abs_pos: usize` to `u32` (wrapping on streams > 4 GiB). This is safe by design: (1) the chain table slot is computed as `(pos_u32 as usize) & chain_mask`, so only the low bits matter and wrapping leaves the addressed slot unchanged; (2) `chain_candidates` filters any recovered `cur as usize` value against `history_abs_start` — a wrapped (small) value is always less than the current sliding-window start and is discarded. Do not flag this cast as a bug.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/ringbuffer.rs`:
- Around line 587-614: The SIMD copy calls in the unsafe block
(simd_copy::copy_bytes_overshooting invocations using m1_in_f1, m2_in_f1,
m1_in_f2, m2_in_f2) pass the source/destination lengths equal to the requested
copy amounts, which prevents the routine from using safe overshoot
optimizations; compute and pass the actual remaining buffer capacities for each
tuple instead (e.g., remaining bytes from m1_ptr to its buffer end, from m2_ptr
to its buffer end, from f1_ptr/f2_ptr to their ring-buffer chunk ends) so the
tuples become (ptr, src_capacity) and (ptr, dst_capacity) while keeping the
third parameter as the copy length, and update the arguments for all four calls
in this block to enable SIMD overshooting.

In `@zstd/src/decoding/simd_copy.rs`:
- Line 99: Remove the needless explicit return when returning the result of
scalar_strategy(): replace the line that says `return scalar_strategy();` with a
bare final expression `scalar_strategy()` so the function returns by expression
rather than using the `return` keyword; this fixes the clippy::needless_return
lint in simd_copy.rs for the code path that returns the scalar_strategy() value.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7104f346-6cee-41e5-ab6c-05c263c7e896

📥 Commits

Reviewing files that changed from the base of the PR and between 283b72c and 7b46f00.

📒 Files selected for processing (3)

zstd/src/decoding/mod.rs
zstd/src/decoding/ringbuffer.rs
zstd/src/decoding/simd_copy.rs

polaz · 2026-04-08T20:34:49Z

@coderabbitai re-review

coderabbitai · 2026-04-08T20:35:01Z

@polaz Sure, kicking off a full re-review of the PR now!

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T20:26:15.902Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_covers_all_copy_strategies` derives copy lengths from a helper `active_chunk_size_for_tests()` (not hardcoded values) so that single-chunk, multi-chunk, and sub-chunk (fallback) cases adapt to whichever SIMD backend width is active at runtime (AVX-512/AVX2/SSE2/NEON/scalar). Do not flag hardcoded test lengths as missing coverage for other backends — the test is backend-adaptive by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-06T01:40:24.378Z
Learning: In `zstd/benches/compare_ffi.rs` (structured-world/structured-zstd), Rust FastCOVER trains with the post-finalization content budget in both the `REPORT_DICT_TRAIN` emission path (around lines 208-225) and the Criterion benchmark path (around lines 266-280). Both paths were aligned in commit 8622344. Do not flag these ranges as using inconsistent budget values.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:498-509
Timestamp: 2026-04-02T22:26:07.979Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), `best_level_does_not_regress_vs_better` uses a `<=` (not strict `<`) assertion because the `repeat_offset_fixture(b"HelloWorld", ...)` input is simple enough that HC saturates at both Better (16 candidates) and Best (32 candidates) search depths, producing identical compressed sizes (~30243 bytes). Strict `<` would be a false positive on this fixture. The strict `Best < Better` quality assertion lives in `cross_validation::best_level_beats_better_on_corpus_proxy` on the decodecorpus sample. Do not re-flag the `<=` as a weakened guard.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1554-1559
Timestamp: 2026-04-02T12:03:04.619Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` has an early-return guard at `abs_pos >= u32::MAX` that disables all further HC insertions for streams >4 GiB, degrading `CompressionLevel::Better` to repcodes-only. This is a known limitation tracked in issue `#51`; the proper fix (table rebasing) is deferred. A TODO comment and a limitation note on `CompressionLevel::Better` doc were added in PR `#48`. Do not re-flag this as a new bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1531-1535
Timestamp: 2026-04-01T15:33:17.229Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` intentionally casts `abs_pos: usize` to `u32` (wrapping on streams > 4 GiB). This is safe by design: (1) the chain table slot is computed as `(pos_u32 as usize) & chain_mask`, so only the low bits matter and wrapping leaves the addressed slot unchanged; (2) `chain_candidates` filters any recovered `cur as usize` value against `history_abs_start` — a wrapped (small) value is always less than the current sliding-window start and is discarded. Do not flag this cast as a bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T16:00:30.438Z
Learning: In `zstd/src/encoding/match_generator.rs`, the `RowMatchGenerator` (used for `CompressionLevel::Level(4)` / `MatcherBackend::Row`) uses a stable 4-byte hash key, consistent with the 4-byte lookahead constraint shared by `HcMatchGenerator`. Previous-block tail positions are backfilled into the row tables before matching/skip begins (analogous to `backfill_boundary_positions` for the HC backend), and a regression test covers cross-boundary tail reuse. Do not flag missing backfill or key-width issues for the Row backend.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T07:19:26.176Z
Learning: In `cli/src/main.rs` (structured-world/structured-zstd), path-based compression calls `set_source_size_hint(source_size as u64)` (NOT `set_pledged_content_size`) for the filesystem metadata size. This is intentional: `metadata().len()` is unreliable for pseudo-files (`/proc/*`), FIFOs, and growing inputs, so the hint must remain advisory (only scaling matcher/window parameters) and must not impose a hard byte contract on the encoder. Do not flag the absence of `set_pledged_content_size` in path-based compression as a missing feature.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T22:29:06.406Z
Learning: In `zstd/src/dictionary/fastcover.rs` (structured-world/structured-zstd), `FastCoverTuned.accel` is correctly populated in `train_fastcover_internal` using `accel: params.accel`, and `normalize_fastcover_params` clamps `accel` to `1..=10`. Do not flag `FastCoverTuned.accel` as unclamped or incorrectly set.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T16:36:03.248Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` appears asymmetric to `new()` in its pre-allocation, but the asymmetry is intentional: `decoder_scratch.reset(window_size)` internally calls `DecodeBuffer::reset()` which already reserves `window_size`. Therefore `reset()` only needs to call `scratch.buffer.reserve(frame_content_size)` on top of that for the FCS delta, rather than explicitly reserving `window_size` again as `new()` does. Do not re-flag this as a bug or asymmetry.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1470-1490
Timestamp: 2026-04-02T07:27:15.870Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::pick_lazy_match` intentionally queries `find_best_match(abs_pos + 1, ...)` and `find_best_match(abs_pos + 2, ...)` *before* those positions are inserted into the hash-chain tables. This matches C zstd lazy evaluation semantics: pre-seeding those positions would allow a position to match against itself, producing invalid self-matches. Do not flag the absent pre-seeding as a bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T17:39:58.627Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` intentionally does NOT reserve the frame-content-size (FCS) delta beyond `window_size`. Two cases cover the design space: (1) multi-segment frames where FCS >> window_size — the delta can reach ~99 MiB, and eagerly reserving it would penalize streaming callers; (2) single-segment frames where window_size == FCS — the delta is always 0 so there is nothing to reserve. Incremental growth during decoding is handled by the per-block `MAX_BLOCK_SIZE` (128 KiB) reserve in `execute_sequences`. Do not flag the absence of an FCS-delta reserve in `reset()` as a missing pre-allocation.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 73
File: zstd/src/dictionary/mod.rs:337-345
Timestamp: 2026-04-05T21:31:56.797Z
Learning: In `zstd/src/decoding/dictionary.rs` (structured-world/structured-zstd), `Dictionary::decode_dict()` parses the full zstd dictionary wire format: magic (4 bytes) → dict_id (4 bytes) → Huffman table (variable, consumed via `build_decoder`) → OF FSE table → ML FSE table → LL FSE table → 3 × u32 repeat offsets → raw content. This matches the layout produced by `finalize_raw_dict()` in `zstd/src/dictionary/mod.rs`. Do not flag these as mismatched layouts.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 58
File: .github/workflows/ci.yml:31-32
Timestamp: 2026-04-03T11:24:17.021Z
Learning: In `structured-world/structured-zstd`, the CI Clippy lint job (`.github/workflows/ci.yml`) uses two separate steps: (1) the required gate: `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings`, and (2) a separate bench-internals step: `cargo clippy -p structured-zstd --features hash,std,dict_builder,bench_internals -- -D warnings`. The `bench_internals` feature must NOT replace the primary gate — it runs as an additional step only.

✅ Actions performed

Full review triggered.

codecov · 2026-04-08T20:36:25Z

Codecov Report

❌ Patch coverage is 93.39623% with 21 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
zstd/src/decoding/simd_copy.rs	86.27%	21 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

polaz · 2026-04-08T21:32:24Z

@coderabbitai full review

coderabbitai · 2026-04-08T21:32:47Z

✅ Actions performed

Full review triggered.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

coderabbitai · 2026-04-08T23:03:06Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/simd_copy.rs`:
- Around line 282-316: Add a #[cfg(feature = "std")] guard to the x86 SIMD tests
so they don't reference std::arch::is_x86_feature_detected! in no_std builds;
specifically wrap the test functions copy_avx2_copies_full_chunk_when_available
and copy_avx512_copies_full_chunk_when_available (which call
is_x86_feature_detected!("avx2") and is_x86_feature_detected!("avx512f")) with
the same #[cfg(feature = "std")] used on
copy_sse2_copies_full_chunk_when_available so the tests only compile when the
standard library is available.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 71683fee-119b-4a36-8497-5de63d4144a3

📥 Commits

Reviewing files that changed from the base of the PR and between e877c44 and f1ada71.

📒 Files selected for processing (3)

zstd/src/decoding/mod.rs
zstd/src/decoding/ringbuffer.rs
zstd/src/decoding/simd_copy.rs

polaz · 2026-04-08T23:14:32Z

@coderabbitai re-review

coderabbitai · 2026-04-08T23:14:45Z

@polaz Sure, kicking off a full re-review now!

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T07:19:26.176Z
Learning: In `cli/src/main.rs` (structured-world/structured-zstd), path-based compression calls `set_source_size_hint(source_size as u64)` (NOT `set_pledged_content_size`) for the filesystem metadata size. This is intentional: `metadata().len()` is unreliable for pseudo-files (`/proc/*`), FIFOs, and growing inputs, so the hint must remain advisory (only scaling matcher/window parameters) and must not impose a hard byte contract on the encoder. Do not flag the absence of `set_pledged_content_size` in path-based compression as a missing feature.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:56.626Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_preserves_prefix_for_runtime_chunk_lengths` (or equivalent) includes a misaligned extra-capacity case that uses `add(1)`-offset pointers and a `chunk_plus_one` length to exercise the rounded-up wildcopy overshoot path (where `copy_multiple > copy_at_least`) and validates prefix correctness. Do not flag the absence of a misaligned-pointer case in this test — it is present by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:51.926Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test suite for `copy_bytes_overshooting` includes a misaligned extra-capacity case that exercises the rounded-up wildcopy path (where `copy_multiple > copy_at_least`) and validates prefix correctness for that scenario. Do not flag the absence of a misaligned/overshooting extra-capacity sub-case as missing test coverage.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T20:26:15.902Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_covers_all_copy_strategies` derives copy lengths from a helper `active_chunk_size_for_tests()` (not hardcoded values) so that single-chunk, multi-chunk, and sub-chunk (fallback) cases adapt to whichever SIMD backend width is active at runtime (AVX-512/AVX2/SSE2/NEON/scalar). Do not flag hardcoded test lengths as missing coverage for other backends — the test is backend-adaptive by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-06T01:40:24.378Z
Learning: In `zstd/benches/compare_ffi.rs` (structured-world/structured-zstd), Rust FastCOVER trains with the post-finalization content budget in both the `REPORT_DICT_TRAIN` emission path (around lines 208-225) and the Criterion benchmark path (around lines 266-280). Both paths were aligned in commit 8622344. Do not flag these ranges as using inconsistent budget values.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:498-509
Timestamp: 2026-04-02T22:26:07.979Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), `best_level_does_not_regress_vs_better` uses a `<=` (not strict `<`) assertion because the `repeat_offset_fixture(b"HelloWorld", ...)` input is simple enough that HC saturates at both Better (16 candidates) and Best (32 candidates) search depths, producing identical compressed sizes (~30243 bytes). Strict `<` would be a false positive on this fixture. The strict `Best < Better` quality assertion lives in `cross_validation::best_level_beats_better_on_corpus_proxy` on the decodecorpus sample. Do not re-flag the `<=` as a weakened guard.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T08:17:53.856Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd), `set_source_size_hint` applies only to the payload (uncompressed source) bytes, not to dictionary priming. Dictionary priming is separate and does not inflate the advertised frame window size. Do not suggest including dictionary bytes in the size hint passed to `set_source_size_hint`.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T22:29:06.406Z
Learning: In `zstd/src/dictionary/fastcover.rs` (structured-world/structured-zstd), `FastCoverTuned.accel` is correctly populated in `train_fastcover_internal` using `accel: params.accel`, and `normalize_fastcover_params` clamps `accel` to `1..=10`. Do not flag `FastCoverTuned.accel` as unclamped or incorrectly set.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1554-1559
Timestamp: 2026-04-02T12:03:04.619Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` has an early-return guard at `abs_pos >= u32::MAX` that disables all further HC insertions for streams >4 GiB, degrading `CompressionLevel::Better` to repcodes-only. This is a known limitation tracked in issue `#51`; the proper fix (table rebasing) is deferred. A TODO comment and a limitation note on `CompressionLevel::Better` doc were added in PR `#48`. Do not re-flag this as a new bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 61
File: zstd/src/encoding/mod.rs:169-179
Timestamp: 2026-04-04T20:45:28.506Z
Learning: In `zstd/src/encoding/mod.rs`, `Matcher::set_source_size_hint()` has a default no-op implementation intentionally. Only the built-in `MatchGeneratorDriver` overrides it to apply the hint during level resolution. `NoDictionaryMatcher` and `TinyMatcher` are test stubs and are explicitly exempt from implementing this hook. Do not flag the absent override in non-driver matchers as a bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 58
File: .github/workflows/ci.yml:31-32
Timestamp: 2026-04-03T11:24:17.021Z
Learning: In `structured-world/structured-zstd`, the CI Clippy lint job (`.github/workflows/ci.yml`) uses two separate steps: (1) the required gate: `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings`, and (2) a separate bench-internals step: `cargo clippy -p structured-zstd --features hash,std,dict_builder,bench_internals -- -D warnings`. The `bench_internals` feature must NOT replace the primary gate — it runs as an additional step only.

Learnt from: CR
Repo: structured-world/structured-zstd PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-26T15:15:32.318Z
Learning: Applies to **/*.rs : Code must pass `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings` (excluding `rustc-dep-of-std` and `fuzz_exports` features)

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 33
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-26T00:05:14.600Z
Learning: In structured-world/structured-zstd, the Clippy requirement is: `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings`. The `rustc-dep-of-std` feature is excluded (internal feature for Rust stdlib builds only) and `fuzz_exports` is excluded (fuzzing-specific entry points validated separately from the regular lint gate). The `--all-features` flag should NOT be used.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/simd_copy.rs`:
- Around line 97-102: The SSE2 branch currently returns a CopyStrategy { chunk:
16, copy: copy_sse2 } without checking copy_at_least; add a minimum-threshold
guard (e.g., require copy_at_least >= 16) before selecting the SSE2 strategy so
small copies fall through to the generic/fallback handling
(copy_bytes_overshooting), mirroring the AVX2/AVX512/NEON checks; update the
conditional that references caps.sse2 to also test copy_at_least and only return
the SSE2 CopyStrategy when both caps.sse2 and copy_at_least >= 16 are true.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: cb7564dd-e4f7-4841-b09e-bd167a1b820a

📥 Commits

Reviewing files that changed from the base of the PR and between f1ada71 and 89736dc.

📒 Files selected for processing (1)

zstd/src/decoding/simd_copy.rs

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

sw-release-bot

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'structured-zstd vs C FFI (x86_64-gnu)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.30.

Benchmark suite	Current: `01b3e8d`	Previous: `e877c44`	Ratio
`compress/level4-row/decodecorpus-z000033/matrix/pure_rust`	`63.537` ms	`44.778` ms	`1.42`
`compress/best/decodecorpus-z000033/matrix/pure_rust`	`98.124` ms	`60.807` ms	`1.61`
`compress/best/low-entropy-1m/matrix/c_ffi`	`1.598` ms	`1.217` ms	`1.31`

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

coderabbitai

♻️ Duplicate comments (1)

zstd/src/decoding/ringbuffer.rs (1)
659-663: 🧹 Nitpick | 🔵 Trivial

Consider threading proper capacities for SIMD consistency (optional).

Unlike copy_with_checks (which now passes aggregate capacities), copy_with_nobranch_check passes the copy length as the capacity in each case, e.g., (m1_ptr, m1_in_f1) for both source capacity and copy amount. This prevents SIMD overshoot when the length isn't chunk-aligned.

Since this function is marked #[allow(dead_code)] and the branchless design prioritizes branch elimination over SIMD throughput, this is a low-priority consistency improvement. If this path is ever enabled for production, consider threading f1_len/f2_len through to enable SIMD benefits.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/decoding/ringbuffer.rs` around lines 659 - 663, The call in
copy_with_nobranch_check uses m1_in_f1 as both the source capacity and copy
length which can prevent SIMD overshoot optimization; update the
simd_copy::copy_bytes_overshooting calls in copy_with_nobranch_check to pass the
actual source capacities (e.g., f1_len/f2_len or the aggregate capacity used by
copy_with_checks) instead of the copy length m1_in_f1, so the tuple arguments
reflect (ptr, source_capacity) like the non-branchless path and enable proper
SIMD behavior if this path is used in production.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@zstd/src/decoding/ringbuffer.rs`:
- Around line 659-663: The call in copy_with_nobranch_check uses m1_in_f1 as
both the source capacity and copy length which can prevent SIMD overshoot
optimization; update the simd_copy::copy_bytes_overshooting calls in
copy_with_nobranch_check to pass the actual source capacities (e.g.,
f1_len/f2_len or the aggregate capacity used by copy_with_checks) instead of the
copy length m1_in_f1, so the tuple arguments reflect (ptr, source_capacity) like
the non-branchless path and enable proper SIMD behavior if this path is used in
production.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3f86b8f0-ab67-41b2-85a6-d543e6eb20c1

📥 Commits

Reviewing files that changed from the base of the PR and between 89736dc and 28d8956.

📒 Files selected for processing (2)

zstd/src/decoding/ringbuffer.rs
zstd/src/decoding/simd_copy.rs

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

polaz · 2026-04-09T06:17:17Z

@coderabbitai Fixed duplicate note from review body (ringbuffer.rs 659-663) in commit 01b3e8d: copy_with_nobranch_check now threads aggregate source/destination capacities (m1_src_cap/m2_src_cap/f1_dst_cap/f2_dst_cap) into copy_bytes_overshooting, aligning it with copy_with_checks for SIMD overshoot consistency.

coderabbitai · 2026-04-09T06:17:41Z

Tip

For best results, initiate chat on the files or code changes.

@polaz Great, that's the right fix — now both copy_with_checks and copy_with_nobranch_check pass the aggregate buffer capacities into copy_bytes_overshooting, so the SIMD eligibility gate (min(src_len, dst_len) >= copy_multiple) applies uniformly across both hot-path code paths. Aligning the two eliminates the earlier asymmetry where copy_with_nobranch_check would unconditionally fall back to scalar copy_from_nonoverlapping even when both the source and destination regions had sufficient surplus capacity to absorb the overshoot.

✏️ Learnings added

Learnt from: polaz
URL: https://github.com/structured-world/structured-zstd/pull/85

Timestamp: 2026-04-09T06:17:40.408Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), both `copy_with_checks` and `copy_with_nobranch_check` thread aggregate source/destination capacities (`m1_src_cap`, `m2_src_cap`, `f1_dst_cap`, `f2_dst_cap`) into `simd_copy::copy_bytes_overshooting` so the SIMD eligibility check (`min(src_len, dst_len) >= copy_multiple`) is applied uniformly. Do not flag the capacity parameters in `copy_with_nobranch_check` as unnecessary or inconsistent with `copy_with_checks`.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T07:19:26.176Z
Learning: In `cli/src/main.rs` (structured-world/structured-zstd), path-based compression calls `set_source_size_hint(source_size as u64)` (NOT `set_pledged_content_size`) for the filesystem metadata size. This is intentional: `metadata().len()` is unreliable for pseudo-files (`/proc/*`), FIFOs, and growing inputs, so the hint must remain advisory (only scaling matcher/window parameters) and must not impose a hard byte contract on the encoder. Do not flag the absence of `set_pledged_content_size` in path-based compression as a missing feature.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:56.626Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_preserves_prefix_for_runtime_chunk_lengths` (or equivalent) includes a misaligned extra-capacity case that uses `add(1)`-offset pointers and a `chunk_plus_one` length to exercise the rounded-up wildcopy overshoot path (where `copy_multiple > copy_at_least`) and validates prefix correctness. Do not flag the absence of a misaligned-pointer case in this test — it is present by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:51.926Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test suite for `copy_bytes_overshooting` includes a misaligned extra-capacity case that exercises the rounded-up wildcopy path (where `copy_multiple > copy_at_least`) and validates prefix correctness for that scenario. Do not flag the absence of a misaligned/overshooting extra-capacity sub-case as missing test coverage.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T20:26:15.902Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_covers_all_copy_strategies` derives copy lengths from a helper `active_chunk_size_for_tests()` (not hardcoded values) so that single-chunk, multi-chunk, and sub-chunk (fallback) cases adapt to whichever SIMD backend width is active at runtime (AVX-512/AVX2/SSE2/NEON/scalar). Do not flag hardcoded test lengths as missing coverage for other backends — the test is backend-adaptive by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-06T01:40:24.378Z
Learning: In `zstd/benches/compare_ffi.rs` (structured-world/structured-zstd), Rust FastCOVER trains with the post-finalization content budget in both the `REPORT_DICT_TRAIN` emission path (around lines 208-225) and the Criterion benchmark path (around lines 266-280). Both paths were aligned in commit 8622344. Do not flag these ranges as using inconsistent budget values.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:498-509
Timestamp: 2026-04-02T22:26:07.979Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), `best_level_does_not_regress_vs_better` uses a `<=` (not strict `<`) assertion because the `repeat_offset_fixture(b"HelloWorld", ...)` input is simple enough that HC saturates at both Better (16 candidates) and Best (32 candidates) search depths, producing identical compressed sizes (~30243 bytes). Strict `<` would be a false positive on this fixture. The strict `Best < Better` quality assertion lives in `cross_validation::best_level_beats_better_on_corpus_proxy` on the decodecorpus sample. Do not re-flag the `<=` as a weakened guard.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T08:17:53.856Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd), `set_source_size_hint` applies only to the payload (uncompressed source) bytes, not to dictionary priming. Dictionary priming is separate and does not inflate the advertised frame window size. Do not suggest including dictionary bytes in the size hint passed to `set_source_size_hint`.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T22:29:06.406Z
Learning: In `zstd/src/dictionary/fastcover.rs` (structured-world/structured-zstd), `FastCoverTuned.accel` is correctly populated in `train_fastcover_internal` using `accel: params.accel`, and `normalize_fastcover_params` clamps `accel` to `1..=10`. Do not flag `FastCoverTuned.accel` as unclamped or incorrectly set.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1554-1559
Timestamp: 2026-04-02T12:03:04.619Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` has an early-return guard at `abs_pos >= u32::MAX` that disables all further HC insertions for streams >4 GiB, degrading `CompressionLevel::Better` to repcodes-only. This is a known limitation tracked in issue `#51`; the proper fix (table rebasing) is deferred. A TODO comment and a limitation note on `CompressionLevel::Better` doc were added in PR `#48`. Do not re-flag this as a new bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 61
File: zstd/src/encoding/mod.rs:169-179
Timestamp: 2026-04-04T20:45:28.506Z
Learning: In `zstd/src/encoding/mod.rs`, `Matcher::set_source_size_hint()` has a default no-op implementation intentionally. Only the built-in `MatchGeneratorDriver` overrides it to apply the hint during level resolution. `NoDictionaryMatcher` and `TinyMatcher` are test stubs and are explicitly exempt from implementing this hook. Do not flag the absent override in non-driver matchers as a bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:479-523
Timestamp: 2026-04-02T19:19:44.750Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), the per-level roundtrip test suites (Better, Best) share a common `level_roundtrip_suite!` macro that expands the same 7-test matrix (compressible, random, multi-block, streaming, edge cases, repeat offsets, large literals) for each level via module-scoped macro invocations. Do not suggest re-inlining these as separate flat test functions — the macro was added intentionally to eliminate cross-level drift.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 58
File: .github/workflows/ci.yml:31-32
Timestamp: 2026-04-03T11:24:17.021Z
Learning: In `structured-world/structured-zstd`, the CI Clippy lint job (`.github/workflows/ci.yml`) uses two separate steps: (1) the required gate: `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings`, and (2) a separate bench-internals step: `cargo clippy -p structured-zstd --features hash,std,dict_builder,bench_internals -- -D warnings`. The `bench_internals` feature must NOT replace the primary gate — it runs as an additional step only.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T17:39:58.627Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` intentionally does NOT reserve the frame-content-size (FCS) delta beyond `window_size`. Two cases cover the design space: (1) multi-segment frames where FCS >> window_size — the delta can reach ~99 MiB, and eagerly reserving it would penalize streaming callers; (2) single-segment frames where window_size == FCS — the delta is always 0 so there is nothing to reserve. Incremental growth during decoding is handled by the per-block `MAX_BLOCK_SIZE` (128 KiB) reserve in `execute_sequences`. Do not flag the absence of an FCS-delta reserve in `reset()` as a missing pre-allocation.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1531-1535
Timestamp: 2026-04-01T15:33:17.229Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` intentionally casts `abs_pos: usize` to `u32` (wrapping on streams > 4 GiB). This is safe by design: (1) the chain table slot is computed as `(pos_u32 as usize) & chain_mask`, so only the low bits matter and wrapping leaves the addressed slot unchanged; (2) `chain_candidates` filters any recovered `cur as usize` value against `history_abs_start` — a wrapped (small) value is always less than the current sliding-window start and is discarded. Do not flag this cast as a bug.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings April 8, 2026 20:06

Copilot started reviewing on behalf of polaz April 8, 2026 20:07 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Comment thread zstd/src/decoding/simd_copy.rs

Comment thread zstd/src/decoding/simd_copy.rs Outdated

Comment thread zstd/src/decoding/simd_copy.rs

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread zstd/src/decoding/simd_copy.rs Outdated

polaz requested a review from Copilot April 8, 2026 20:19

Copilot started reviewing on behalf of polaz April 8, 2026 20:20 View session

polaz added 4 commits April 8, 2026 23:21

perf(decoding): add runtime-dispatched simd wildcopy

5010d27

fix(decoding): use std x86 feature detection in simd copy

59a277b

refactor(decoding): cache simd wildcopy dispatch

4a69c53

refactor(decoding): tighten wildcopy contracts and tests

7b46f00

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Comment thread zstd/src/decoding/simd_copy.rs

Comment thread zstd/src/decoding/simd_copy.rs Outdated

Comment thread zstd/src/decoding/ringbuffer.rs Outdated

polaz force-pushed the perf/#68-simd-wildcopy branch from 11bd090 to 7b46f00 Compare April 8, 2026 20:27

fix(decoding): close lint and review nitpicks

12f22c1

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread zstd/src/decoding/ringbuffer.rs

Comment thread zstd/src/decoding/simd_copy.rs Outdated

refactor(decoding): pass real capacities in simd helper

b128cdb

polaz requested a review from Copilot April 8, 2026 20:34

Copilot started reviewing on behalf of polaz April 8, 2026 20:35 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Comment thread zstd/src/decoding/simd_copy.rs

perf(decoding): keep x86 simd strategy in no_std builds

b1ea146

polaz requested a review from Copilot April 8, 2026 21:32

Copilot started reviewing on behalf of polaz April 8, 2026 21:33 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Comment thread zstd/src/decoding/simd_copy.rs Outdated

Comment thread zstd/src/decoding/simd_copy.rs

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread zstd/src/decoding/simd_copy.rs Outdated

test(decoding): guard x86 simd tests behind std

89736dc

polaz requested a review from Copilot April 8, 2026 23:11

Copilot started reviewing on behalf of polaz April 8, 2026 23:12 View session

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread zstd/src/decoding/simd_copy.rs Outdated

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Comment thread zstd/src/decoding/simd_copy.rs

Comment thread zstd/src/decoding/ringbuffer.rs Outdated

sw-release-bot Bot reviewed Apr 8, 2026

View reviewed changes

fix(decoding): tighten wildcopy safety contract and sse2 gating

28d8956

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

polaz requested a review from Copilot April 9, 2026 06:11

Copilot started reviewing on behalf of polaz April 9, 2026 06:12 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

perf(decoding): thread branchless wildcopy capacities

01b3e8d

polaz requested a review from Copilot April 9, 2026 06:22

Copilot started reviewing on behalf of polaz April 9, 2026 06:23 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

polaz merged commit 1112571 into main Apr 9, 2026
17 checks passed

polaz deleted the perf/#68-simd-wildcopy branch April 9, 2026 06:52

sw-release-bot Bot mentioned this pull request Apr 9, 2026

chore: release v0.0.11 #89

Merged

This was referenced Apr 9, 2026

perf(decoding): SIMD wildcopy for literal and match memcpy #68

Closed

Roadmap: structured-zstd feature parity with C zstd #28

Open

coderabbitai Bot mentioned this pull request Apr 11, 2026

perf: ARM platform optimizations (CRC32 hash, NEON copy, SVE2 histcnt) #71

Closed

8 tasks

polaz mentioned this pull request Apr 11, 2026

perf(encoding): complete ARM histogram path for #71 #104

Merged

This was referenced Apr 12, 2026

perf(bench): add wildcopy candidate research bench and dashboard range filter #107

Merged

perf(decoding): integrate AVX2 unroll2 wildcopy candidate #108

Open

Conversation

polaz commented Apr 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Benchmark note

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

polaz commented Apr 8, 2026

Uh oh!

coderabbitai Bot commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

polaz commented Apr 8, 2026

Uh oh!

coderabbitai Bot commented Apr 8, 2026

Uh oh!

codecov Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

polaz commented Apr 8, 2026

Uh oh!

coderabbitai Bot commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 8, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

polaz commented Apr 8, 2026

Uh oh!

coderabbitai Bot commented Apr 8, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

polaz commented Apr 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 8, 2026 •

edited

Loading

codecov Bot commented Apr 8, 2026 •

edited

Loading

sw-release-bot Bot left a comment •

edited

Loading