Skip to content

perf(decoding): add runtime-dispatched simd wildcopy#85

Merged
polaz merged 12 commits intomainfrom
perf/#68-simd-wildcopy
Apr 9, 2026
Merged

perf(decoding): add runtime-dispatched simd wildcopy#85
polaz merged 12 commits intomainfrom
perf/#68-simd-wildcopy

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Apr 8, 2026

Summary

  • add decoding::simd_copy with runtime-dispatched overshooting copy paths (AVX-512/AVX2/SSE2 on x86, NEON on aarch64, scalar fallback)
  • route ringbuffer match-copy hot paths through the new helper
  • cache x86 feature dispatch with OnceLock to avoid repeated hot-path feature checks
  • add explicit safety contract and debug verification in copy_bytes_overshooting
  • make ringbuffer copy-strategy test derive lengths from runtime active chunk size

Validation

  • cargo fmt --all -- --check
  • cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings
  • cargo clippy -p structured-zstd --features hash,std,dict_builder,bench_internals --benches -- -D warnings
  • cargo nextest run --workspace
  • cargo test --doc --workspace

Benchmark note

On this branch, decode benchmarks did not show statistically significant gains yet (mostly noise / no-change on tested scenarios). This PR focuses on architectural cleanup and hot-path dispatch tightening; x86_64 AVX-512 behavior should be validated on hardware/runners that expose AVX-512.

Summary by CodeRabbit

  • Refactor

    • Consolidated internal byte-copy logic into a single runtime-optimized implementation that selects scalar or hardware-accelerated paths when available, simplifying call sites and improving copy performance and reliability.
  • Tests

    • Updated and added tests to validate correctness across fallback and optimized copy paths, verify chunk-size selection and overshoot behavior, and ensure prefix preservation in overshoot scenarios.

Copilot AI review requested due to automatic review settings April 8, 2026 20:06
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 93c89fa5-bf90-4917-a88c-7f0a385a5559

📥 Commits

Reviewing files that changed from the base of the PR and between 28d8956 and 01b3e8d.

📒 Files selected for processing (1)
  • zstd/src/decoding/ringbuffer.rs

📝 Walkthrough

Walkthrough

Added a new internal SIMD-aware copy module simd_copy and replaced the ringbuffer's local overshooting copy with calls into simd_copy::copy_bytes_overshooting; tests updated to use the module's test helper.

Changes

Cohort / File(s) Summary
Decoding mod registration
zstd/src/decoding/mod.rs
Registered new private module: mod simd_copy;.
New SIMD copy implementation
zstd/src/decoding/simd_copy.rs
Added pub(crate) unsafe fn copy_bytes_overshooting((src_ptr, src_cap), (dst_ptr, dst_cap), copy_at_least) with strategy selection (scalar, SSE2, AVX2, AVX-512, NEON), runtime feature detection on x86 when std enabled, debug assertions, and active_chunk_size_for_tests(); includes unit tests and fallback to exact copy_from_nonoverlapping.
Ring buffer integration & tests
zstd/src/decoding/ringbuffer.rs
Removed file-local copy_bytes_overshooting; added use super::simd_copy;, replaced local/copy_from_nonoverlapping copy sites with simd_copy::copy_bytes_overshooting, adjusted capacity calculations for segmented copies, and updated unit tests to derive chunk sizes from the new test helper.

Sequence Diagram(s)

sequenceDiagram
    participant RingBuffer as "RingBuffer (caller)"
    participant SIMD as "decoding::simd_copy"
    participant Memory as "src/dst memory"
    RingBuffer->>SIMD: copy_bytes_overshooting((src_ptr, src_cap), (dst_ptr, dst_cap), copy_at_least)
    SIMD->>SIMD: select strategy & chunk size (cfg/runtime)
    alt min(src_cap,dst_cap) >= rounded_chunk
        SIMD->>Memory: perform chunked loads/stores (SIMD or scalar chunked loops)
    else
        SIMD->>Memory: call copy_from_nonoverlapping for exact copy_at_least
    end
    SIMD-->>RingBuffer: return (void), debug-assert checked
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

"A rabbit hopped with nimble paws,
swapping bytes without a pause.
Chunks aligned and tests all bright,
I copied fast into the night. 🐇✨"

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: adding a runtime-dispatched SIMD wildcard copy feature to the decoding module.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/#68-simd-wildcopy

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new runtime-dispatched “overshooting” copy helper to speed up decoder hot-path copies across multiple SIMD backends, and routes ringbuffer match-copy logic through that helper.

Changes:

  • Added decoding::simd_copy with runtime-selected AVX-512/AVX2/SSE2 (x86/x86_64), NEON (aarch64), and scalar fallback overshooting copy paths.
  • Updated ringbuffer copy hot paths to use the new helper instead of the local implementation / direct copy_from_nonoverlapping in several cases.
  • Wired the new module into decoding::mod.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
zstd/src/decoding/simd_copy.rs New runtime-dispatched SIMD/scalar overshooting copy implementation.
zstd/src/decoding/ringbuffer.rs Switched multiple internal copy sites to use simd_copy::copy_bytes_overshooting; removed the old local helper.
zstd/src/decoding/mod.rs Added the new simd_copy module.

Comment thread zstd/src/decoding/simd_copy.rs
Comment thread zstd/src/decoding/simd_copy.rs Outdated
Comment thread zstd/src/decoding/simd_copy.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
zstd/src/decoding/ringbuffer.rs (1)

1030-1064: ⚠️ Potential issue | 🟡 Minor

Make this test derive its lengths from the active chunk size.

The fixed 16/32/17 cases no longer cover “all copy strategies” on wider backends: with AVX2, 32 only exercises a single chunk, and with AVX-512 both 16 and 32 fall back to copy_from_nonoverlapping. Please key these sizes off the runtime-selected chunk size, or move the test next to simd_copy.rs so the SIMD loop stays exercised on every supported backend.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/decoding/ringbuffer.rs` around lines 1030 - 1064, The test
copy_bytes_overshooting_covers_all_copy_strategies uses hardcoded lengths
(16/32/17) that don't exercise all SIMD paths on wider backends; change it to
derive lengths from the runtime chunk size provided by simd_copy so the
single-chunk, multi-chunk, and fallback cases always map to 1*chunk, 2*chunk,
and chunk+1 respectively. Update the test to query the chunk size (use the
public symbol in simd_copy such as a CHUNK/CHUNK_SIZE constant or a chunk_size()
accessor) and compute single_len = chunk, multi_len = chunk * 2, fallback_len =
chunk + 1, then use those lengths in the copy_bytes_overshooting calls in
copy_bytes_overshooting_covers_all_copy_strategies.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/simd_copy.rs`:
- Around line 1-2: The import of the is_x86_feature_detected macro is from the
wrong module; replace the use of core::arch::is_x86_feature_detected with
std::arch::is_x86_feature_detected in the cfg block (the line currently under
#[cfg(all(feature = "std", any(target_arch = "x86", target_arch = "x86_64")))]
that brings in is_x86_feature_detected) so the code uses the macro provided by
std::arch.

---

Outside diff comments:
In `@zstd/src/decoding/ringbuffer.rs`:
- Around line 1030-1064: The test
copy_bytes_overshooting_covers_all_copy_strategies uses hardcoded lengths
(16/32/17) that don't exercise all SIMD paths on wider backends; change it to
derive lengths from the runtime chunk size provided by simd_copy so the
single-chunk, multi-chunk, and fallback cases always map to 1*chunk, 2*chunk,
and chunk+1 respectively. Update the test to query the chunk size (use the
public symbol in simd_copy such as a CHUNK/CHUNK_SIZE constant or a chunk_size()
accessor) and compute single_len = chunk, multi_len = chunk * 2, fallback_len =
chunk + 1, then use those lengths in the copy_bytes_overshooting calls in
copy_bytes_overshooting_covers_all_copy_strategies.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: aef01b78-5497-483b-a6de-767f87f1a793

📥 Commits

Reviewing files that changed from the base of the PR and between 9f905ad and 283b72c.

📒 Files selected for processing (3)
  • zstd/src/decoding/mod.rs
  • zstd/src/decoding/ringbuffer.rs
  • zstd/src/decoding/simd_copy.rs

Comment thread zstd/src/decoding/simd_copy.rs Outdated
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 8, 2026

@coderabbitai Fixed outside-diff finding in zstd/src/decoding/ringbuffer.rs: copy_bytes_overshooting_covers_all_copy_strategies now derives lengths from runtime chunk size via active_chunk_size_for_tests(), so single/multi/fallback cases adapt to active backend width.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

Tip

For best results, initiate chat on the files or code changes.

@polaz Thanks for the update! Using active_chunk_size_for_tests() to derive lengths at runtime is a solid approach — it keeps the test meaningful across all SIMD backends (AVX-512/AVX2/SSE2/NEON/scalar) without hardcoding widths that would only be correct for one specific backend. The single-chunk, multi-chunk, and sub-chunk (fallback) cases will now correctly probe the boundaries of whichever path is active at test time.


✏️ Learnings added
Learnt from: polaz
URL: https://github.com/structured-world/structured-zstd/pull/85

Timestamp: 2026-04-08T20:26:15.902Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_covers_all_copy_strategies` derives copy lengths from a helper `active_chunk_size_for_tests()` (not hardcoded values) so that single-chunk, multi-chunk, and sub-chunk (fallback) cases adapt to whichever SIMD backend width is active at runtime (AVX-512/AVX2/SSE2/NEON/scalar). Do not flag hardcoded test lengths as missing coverage for other backends — the test is backend-adaptive by design.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-06T01:40:24.378Z
Learning: In `zstd/benches/compare_ffi.rs` (structured-world/structured-zstd), Rust FastCOVER trains with the post-finalization content budget in both the `REPORT_DICT_TRAIN` emission path (around lines 208-225) and the Criterion benchmark path (around lines 266-280). Both paths were aligned in commit 8622344. Do not flag these ranges as using inconsistent budget values.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T16:00:30.438Z
Learning: In `zstd/src/encoding/match_generator.rs`, the `RowMatchGenerator` (used for `CompressionLevel::Level(4)` / `MatcherBackend::Row`) uses a stable 4-byte hash key, consistent with the 4-byte lookahead constraint shared by `HcMatchGenerator`. Previous-block tail positions are backfilled into the row tables before matching/skip begins (analogous to `backfill_boundary_positions` for the HC backend), and a regression test covers cross-boundary tail reuse. Do not flag missing backfill or key-width issues for the Row backend.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T07:19:26.176Z
Learning: In `cli/src/main.rs` (structured-world/structured-zstd), path-based compression calls `set_source_size_hint(source_size as u64)` (NOT `set_pledged_content_size`) for the filesystem metadata size. This is intentional: `metadata().len()` is unreliable for pseudo-files (`/proc/*`), FIFOs, and growing inputs, so the hint must remain advisory (only scaling matcher/window parameters) and must not impose a hard byte contract on the encoder. Do not flag the absence of `set_pledged_content_size` in path-based compression as a missing feature.

Learnt from: CR
Repo: structured-world/structured-zstd PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-26T15:15:32.318Z
Learning: Avoid suggesting architectural changes that diverge too far from upstream, as this is a fork of ruzstd (KillingSpark/zstd-rs)

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:498-509
Timestamp: 2026-04-02T22:26:07.979Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), `best_level_does_not_regress_vs_better` uses a `<=` (not strict `<`) assertion because the `repeat_offset_fixture(b"HelloWorld", ...)` input is simple enough that HC saturates at both Better (16 candidates) and Best (32 candidates) search depths, producing identical compressed sizes (~30243 bytes). Strict `<` would be a false positive on this fixture. The strict `Best < Better` quality assertion lives in `cross_validation::best_level_beats_better_on_corpus_proxy` on the decodecorpus sample. Do not re-flag the `<=` as a weakened guard.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T16:36:03.248Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` appears asymmetric to `new()` in its pre-allocation, but the asymmetry is intentional: `decoder_scratch.reset(window_size)` internally calls `DecodeBuffer::reset()` which already reserves `window_size`. Therefore `reset()` only needs to call `scratch.buffer.reserve(frame_content_size)` on top of that for the FCS delta, rather than explicitly reserving `window_size` again as `new()` does. Do not re-flag this as a bug or asymmetry.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1554-1559
Timestamp: 2026-04-02T12:03:04.619Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` has an early-return guard at `abs_pos >= u32::MAX` that disables all further HC insertions for streams >4 GiB, degrading `CompressionLevel::Better` to repcodes-only. This is a known limitation tracked in issue `#51`; the proper fix (table rebasing) is deferred. A TODO comment and a limitation note on `CompressionLevel::Better` doc were added in PR `#48`. Do not re-flag this as a new bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T17:39:58.627Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` intentionally does NOT reserve the frame-content-size (FCS) delta beyond `window_size`. Two cases cover the design space: (1) multi-segment frames where FCS >> window_size — the delta can reach ~99 MiB, and eagerly reserving it would penalize streaming callers; (2) single-segment frames where window_size == FCS — the delta is always 0 so there is nothing to reserve. Incremental growth during decoding is handled by the per-block `MAX_BLOCK_SIZE` (128 KiB) reserve in `execute_sequences`. Do not flag the absence of an FCS-delta reserve in `reset()` as a missing pre-allocation.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T08:17:53.856Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd), `set_source_size_hint` applies only to the payload (uncompressed source) bytes, not to dictionary priming. Dictionary priming is separate and does not inflate the advertised frame window size. Do not suggest including dictionary bytes in the size hint passed to `set_source_size_hint`.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1531-1535
Timestamp: 2026-04-01T15:33:17.229Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` intentionally casts `abs_pos: usize` to `u32` (wrapping on streams > 4 GiB). This is safe by design: (1) the chain table slot is computed as `(pos_u32 as usize) & chain_mask`, so only the low bits matter and wrapping leaves the addressed slot unchanged; (2) `chain_candidates` filters any recovered `cur as usize` value against `history_abs_start` — a wrapped (small) value is always less than the current sliding-window start and is discarded. Do not flag this cast as a bug.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread zstd/src/decoding/simd_copy.rs
Comment thread zstd/src/decoding/simd_copy.rs Outdated
Comment thread zstd/src/decoding/ringbuffer.rs Outdated
@polaz polaz force-pushed the perf/#68-simd-wildcopy branch from 11bd090 to 7b46f00 Compare April 8, 2026 20:27
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/ringbuffer.rs`:
- Around line 587-614: The SIMD copy calls in the unsafe block
(simd_copy::copy_bytes_overshooting invocations using m1_in_f1, m2_in_f1,
m1_in_f2, m2_in_f2) pass the source/destination lengths equal to the requested
copy amounts, which prevents the routine from using safe overshoot
optimizations; compute and pass the actual remaining buffer capacities for each
tuple instead (e.g., remaining bytes from m1_ptr to its buffer end, from m2_ptr
to its buffer end, from f1_ptr/f2_ptr to their ring-buffer chunk ends) so the
tuples become (ptr, src_capacity) and (ptr, dst_capacity) while keeping the
third parameter as the copy length, and update the arguments for all four calls
in this block to enable SIMD overshooting.

In `@zstd/src/decoding/simd_copy.rs`:
- Line 99: Remove the needless explicit return when returning the result of
scalar_strategy(): replace the line that says `return scalar_strategy();` with a
bare final expression `scalar_strategy()` so the function returns by expression
rather than using the `return` keyword; this fixes the clippy::needless_return
lint in simd_copy.rs for the code path that returns the scalar_strategy() value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7104f346-6cee-41e5-ab6c-05c263c7e896

📥 Commits

Reviewing files that changed from the base of the PR and between 283b72c and 7b46f00.

📒 Files selected for processing (3)
  • zstd/src/decoding/mod.rs
  • zstd/src/decoding/ringbuffer.rs
  • zstd/src/decoding/simd_copy.rs

Comment thread zstd/src/decoding/ringbuffer.rs
Comment thread zstd/src/decoding/simd_copy.rs Outdated
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 8, 2026

@coderabbitai re-review

@polaz polaz requested a review from Copilot April 8, 2026 20:34
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

@polaz Sure, kicking off a full re-review of the PR now!


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T20:26:15.902Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_covers_all_copy_strategies` derives copy lengths from a helper `active_chunk_size_for_tests()` (not hardcoded values) so that single-chunk, multi-chunk, and sub-chunk (fallback) cases adapt to whichever SIMD backend width is active at runtime (AVX-512/AVX2/SSE2/NEON/scalar). Do not flag hardcoded test lengths as missing coverage for other backends — the test is backend-adaptive by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-06T01:40:24.378Z
Learning: In `zstd/benches/compare_ffi.rs` (structured-world/structured-zstd), Rust FastCOVER trains with the post-finalization content budget in both the `REPORT_DICT_TRAIN` emission path (around lines 208-225) and the Criterion benchmark path (around lines 266-280). Both paths were aligned in commit 8622344. Do not flag these ranges as using inconsistent budget values.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:498-509
Timestamp: 2026-04-02T22:26:07.979Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), `best_level_does_not_regress_vs_better` uses a `<=` (not strict `<`) assertion because the `repeat_offset_fixture(b"HelloWorld", ...)` input is simple enough that HC saturates at both Better (16 candidates) and Best (32 candidates) search depths, producing identical compressed sizes (~30243 bytes). Strict `<` would be a false positive on this fixture. The strict `Best < Better` quality assertion lives in `cross_validation::best_level_beats_better_on_corpus_proxy` on the decodecorpus sample. Do not re-flag the `<=` as a weakened guard.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1554-1559
Timestamp: 2026-04-02T12:03:04.619Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` has an early-return guard at `abs_pos >= u32::MAX` that disables all further HC insertions for streams >4 GiB, degrading `CompressionLevel::Better` to repcodes-only. This is a known limitation tracked in issue `#51`; the proper fix (table rebasing) is deferred. A TODO comment and a limitation note on `CompressionLevel::Better` doc were added in PR `#48`. Do not re-flag this as a new bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1531-1535
Timestamp: 2026-04-01T15:33:17.229Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` intentionally casts `abs_pos: usize` to `u32` (wrapping on streams > 4 GiB). This is safe by design: (1) the chain table slot is computed as `(pos_u32 as usize) & chain_mask`, so only the low bits matter and wrapping leaves the addressed slot unchanged; (2) `chain_candidates` filters any recovered `cur as usize` value against `history_abs_start` — a wrapped (small) value is always less than the current sliding-window start and is discarded. Do not flag this cast as a bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T16:00:30.438Z
Learning: In `zstd/src/encoding/match_generator.rs`, the `RowMatchGenerator` (used for `CompressionLevel::Level(4)` / `MatcherBackend::Row`) uses a stable 4-byte hash key, consistent with the 4-byte lookahead constraint shared by `HcMatchGenerator`. Previous-block tail positions are backfilled into the row tables before matching/skip begins (analogous to `backfill_boundary_positions` for the HC backend), and a regression test covers cross-boundary tail reuse. Do not flag missing backfill or key-width issues for the Row backend.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T07:19:26.176Z
Learning: In `cli/src/main.rs` (structured-world/structured-zstd), path-based compression calls `set_source_size_hint(source_size as u64)` (NOT `set_pledged_content_size`) for the filesystem metadata size. This is intentional: `metadata().len()` is unreliable for pseudo-files (`/proc/*`), FIFOs, and growing inputs, so the hint must remain advisory (only scaling matcher/window parameters) and must not impose a hard byte contract on the encoder. Do not flag the absence of `set_pledged_content_size` in path-based compression as a missing feature.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T22:29:06.406Z
Learning: In `zstd/src/dictionary/fastcover.rs` (structured-world/structured-zstd), `FastCoverTuned.accel` is correctly populated in `train_fastcover_internal` using `accel: params.accel`, and `normalize_fastcover_params` clamps `accel` to `1..=10`. Do not flag `FastCoverTuned.accel` as unclamped or incorrectly set.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T16:36:03.248Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` appears asymmetric to `new()` in its pre-allocation, but the asymmetry is intentional: `decoder_scratch.reset(window_size)` internally calls `DecodeBuffer::reset()` which already reserves `window_size`. Therefore `reset()` only needs to call `scratch.buffer.reserve(frame_content_size)` on top of that for the FCS delta, rather than explicitly reserving `window_size` again as `new()` does. Do not re-flag this as a bug or asymmetry.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1470-1490
Timestamp: 2026-04-02T07:27:15.870Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::pick_lazy_match` intentionally queries `find_best_match(abs_pos + 1, ...)` and `find_best_match(abs_pos + 2, ...)` *before* those positions are inserted into the hash-chain tables. This matches C zstd lazy evaluation semantics: pre-seeding those positions would allow a position to match against itself, producing invalid self-matches. Do not flag the absent pre-seeding as a bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T17:39:58.627Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` intentionally does NOT reserve the frame-content-size (FCS) delta beyond `window_size`. Two cases cover the design space: (1) multi-segment frames where FCS >> window_size — the delta can reach ~99 MiB, and eagerly reserving it would penalize streaming callers; (2) single-segment frames where window_size == FCS — the delta is always 0 so there is nothing to reserve. Incremental growth during decoding is handled by the per-block `MAX_BLOCK_SIZE` (128 KiB) reserve in `execute_sequences`. Do not flag the absence of an FCS-delta reserve in `reset()` as a missing pre-allocation.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 73
File: zstd/src/dictionary/mod.rs:337-345
Timestamp: 2026-04-05T21:31:56.797Z
Learning: In `zstd/src/decoding/dictionary.rs` (structured-world/structured-zstd), `Dictionary::decode_dict()` parses the full zstd dictionary wire format: magic (4 bytes) → dict_id (4 bytes) → Huffman table (variable, consumed via `build_decoder`) → OF FSE table → ML FSE table → LL FSE table → 3 × u32 repeat offsets → raw content. This matches the layout produced by `finalize_raw_dict()` in `zstd/src/dictionary/mod.rs`. Do not flag these as mismatched layouts.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 58
File: .github/workflows/ci.yml:31-32
Timestamp: 2026-04-03T11:24:17.021Z
Learning: In `structured-world/structured-zstd`, the CI Clippy lint job (`.github/workflows/ci.yml`) uses two separate steps: (1) the required gate: `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings`, and (2) a separate bench-internals step: `cargo clippy -p structured-zstd --features hash,std,dict_builder,bench_internals -- -D warnings`. The `bench_internals` feature must NOT replace the primary gate — it runs as an additional step only.
✅ Actions performed

Full review triggered.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 93.39623% with 21 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
zstd/src/decoding/simd_copy.rs 86.27% 21 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread zstd/src/decoding/simd_copy.rs
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 8, 2026

@coderabbitai full review

@polaz polaz requested a review from Copilot April 8, 2026 21:32
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread zstd/src/decoding/simd_copy.rs Outdated
Comment thread zstd/src/decoding/simd_copy.rs
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/simd_copy.rs`:
- Around line 282-316: Add a #[cfg(feature = "std")] guard to the x86 SIMD tests
so they don't reference std::arch::is_x86_feature_detected! in no_std builds;
specifically wrap the test functions copy_avx2_copies_full_chunk_when_available
and copy_avx512_copies_full_chunk_when_available (which call
is_x86_feature_detected!("avx2") and is_x86_feature_detected!("avx512f")) with
the same #[cfg(feature = "std")] used on
copy_sse2_copies_full_chunk_when_available so the tests only compile when the
standard library is available.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 71683fee-119b-4a36-8497-5de63d4144a3

📥 Commits

Reviewing files that changed from the base of the PR and between e877c44 and f1ada71.

📒 Files selected for processing (3)
  • zstd/src/decoding/mod.rs
  • zstd/src/decoding/ringbuffer.rs
  • zstd/src/decoding/simd_copy.rs

Comment thread zstd/src/decoding/simd_copy.rs Outdated
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 8, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

@polaz Sure, kicking off a full re-review now!


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T07:19:26.176Z
Learning: In `cli/src/main.rs` (structured-world/structured-zstd), path-based compression calls `set_source_size_hint(source_size as u64)` (NOT `set_pledged_content_size`) for the filesystem metadata size. This is intentional: `metadata().len()` is unreliable for pseudo-files (`/proc/*`), FIFOs, and growing inputs, so the hint must remain advisory (only scaling matcher/window parameters) and must not impose a hard byte contract on the encoder. Do not flag the absence of `set_pledged_content_size` in path-based compression as a missing feature.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:56.626Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_preserves_prefix_for_runtime_chunk_lengths` (or equivalent) includes a misaligned extra-capacity case that uses `add(1)`-offset pointers and a `chunk_plus_one` length to exercise the rounded-up wildcopy overshoot path (where `copy_multiple > copy_at_least`) and validates prefix correctness. Do not flag the absence of a misaligned-pointer case in this test — it is present by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:51.926Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test suite for `copy_bytes_overshooting` includes a misaligned extra-capacity case that exercises the rounded-up wildcopy path (where `copy_multiple > copy_at_least`) and validates prefix correctness for that scenario. Do not flag the absence of a misaligned/overshooting extra-capacity sub-case as missing test coverage.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T20:26:15.902Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_covers_all_copy_strategies` derives copy lengths from a helper `active_chunk_size_for_tests()` (not hardcoded values) so that single-chunk, multi-chunk, and sub-chunk (fallback) cases adapt to whichever SIMD backend width is active at runtime (AVX-512/AVX2/SSE2/NEON/scalar). Do not flag hardcoded test lengths as missing coverage for other backends — the test is backend-adaptive by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-06T01:40:24.378Z
Learning: In `zstd/benches/compare_ffi.rs` (structured-world/structured-zstd), Rust FastCOVER trains with the post-finalization content budget in both the `REPORT_DICT_TRAIN` emission path (around lines 208-225) and the Criterion benchmark path (around lines 266-280). Both paths were aligned in commit 8622344. Do not flag these ranges as using inconsistent budget values.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:498-509
Timestamp: 2026-04-02T22:26:07.979Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), `best_level_does_not_regress_vs_better` uses a `<=` (not strict `<`) assertion because the `repeat_offset_fixture(b"HelloWorld", ...)` input is simple enough that HC saturates at both Better (16 candidates) and Best (32 candidates) search depths, producing identical compressed sizes (~30243 bytes). Strict `<` would be a false positive on this fixture. The strict `Best < Better` quality assertion lives in `cross_validation::best_level_beats_better_on_corpus_proxy` on the decodecorpus sample. Do not re-flag the `<=` as a weakened guard.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T08:17:53.856Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd), `set_source_size_hint` applies only to the payload (uncompressed source) bytes, not to dictionary priming. Dictionary priming is separate and does not inflate the advertised frame window size. Do not suggest including dictionary bytes in the size hint passed to `set_source_size_hint`.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T22:29:06.406Z
Learning: In `zstd/src/dictionary/fastcover.rs` (structured-world/structured-zstd), `FastCoverTuned.accel` is correctly populated in `train_fastcover_internal` using `accel: params.accel`, and `normalize_fastcover_params` clamps `accel` to `1..=10`. Do not flag `FastCoverTuned.accel` as unclamped or incorrectly set.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1554-1559
Timestamp: 2026-04-02T12:03:04.619Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` has an early-return guard at `abs_pos >= u32::MAX` that disables all further HC insertions for streams >4 GiB, degrading `CompressionLevel::Better` to repcodes-only. This is a known limitation tracked in issue `#51`; the proper fix (table rebasing) is deferred. A TODO comment and a limitation note on `CompressionLevel::Better` doc were added in PR `#48`. Do not re-flag this as a new bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 61
File: zstd/src/encoding/mod.rs:169-179
Timestamp: 2026-04-04T20:45:28.506Z
Learning: In `zstd/src/encoding/mod.rs`, `Matcher::set_source_size_hint()` has a default no-op implementation intentionally. Only the built-in `MatchGeneratorDriver` overrides it to apply the hint during level resolution. `NoDictionaryMatcher` and `TinyMatcher` are test stubs and are explicitly exempt from implementing this hook. Do not flag the absent override in non-driver matchers as a bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 58
File: .github/workflows/ci.yml:31-32
Timestamp: 2026-04-03T11:24:17.021Z
Learning: In `structured-world/structured-zstd`, the CI Clippy lint job (`.github/workflows/ci.yml`) uses two separate steps: (1) the required gate: `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings`, and (2) a separate bench-internals step: `cargo clippy -p structured-zstd --features hash,std,dict_builder,bench_internals -- -D warnings`. The `bench_internals` feature must NOT replace the primary gate — it runs as an additional step only.

Learnt from: CR
Repo: structured-world/structured-zstd PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-26T15:15:32.318Z
Learning: Applies to **/*.rs : Code must pass `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings` (excluding `rustc-dep-of-std` and `fuzz_exports` features)

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 33
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-26T00:05:14.600Z
Learning: In structured-world/structured-zstd, the Clippy requirement is: `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings`. The `rustc-dep-of-std` feature is excluded (internal feature for Rust stdlib builds only) and `fuzz_exports` is excluded (fuzzing-specific entry points validated separately from the regular lint gate). The `--all-features` flag should NOT be used.
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/decoding/simd_copy.rs`:
- Around line 97-102: The SSE2 branch currently returns a CopyStrategy { chunk:
16, copy: copy_sse2 } without checking copy_at_least; add a minimum-threshold
guard (e.g., require copy_at_least >= 16) before selecting the SSE2 strategy so
small copies fall through to the generic/fallback handling
(copy_bytes_overshooting), mirroring the AVX2/AVX512/NEON checks; update the
conditional that references caps.sse2 to also test copy_at_least and only return
the SSE2 CopyStrategy when both caps.sse2 and copy_at_least >= 16 are true.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: cb7564dd-e4f7-4841-b09e-bd167a1b820a

📥 Commits

Reviewing files that changed from the base of the PR and between f1ada71 and 89736dc.

📒 Files selected for processing (1)
  • zstd/src/decoding/simd_copy.rs

Comment thread zstd/src/decoding/simd_copy.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread zstd/src/decoding/simd_copy.rs
Comment thread zstd/src/decoding/ringbuffer.rs Outdated
Copy link
Copy Markdown

@sw-release-bot sw-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'structured-zstd vs C FFI (x86_64-gnu)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.30.

Benchmark suite Current: 01b3e8d Previous: e877c44 Ratio
compress/level4-row/decodecorpus-z000033/matrix/pure_rust 63.537 ms 44.778 ms 1.42
compress/best/decodecorpus-z000033/matrix/pure_rust 98.124 ms 60.807 ms 1.61
compress/best/low-entropy-1m/matrix/c_ffi 1.598 ms 1.217 ms 1.31

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
zstd/src/decoding/ringbuffer.rs (1)

659-663: 🧹 Nitpick | 🔵 Trivial

Consider threading proper capacities for SIMD consistency (optional).

Unlike copy_with_checks (which now passes aggregate capacities), copy_with_nobranch_check passes the copy length as the capacity in each case, e.g., (m1_ptr, m1_in_f1) for both source capacity and copy amount. This prevents SIMD overshoot when the length isn't chunk-aligned.

Since this function is marked #[allow(dead_code)] and the branchless design prioritizes branch elimination over SIMD throughput, this is a low-priority consistency improvement. If this path is ever enabled for production, consider threading f1_len/f2_len through to enable SIMD benefits.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/decoding/ringbuffer.rs` around lines 659 - 663, The call in
copy_with_nobranch_check uses m1_in_f1 as both the source capacity and copy
length which can prevent SIMD overshoot optimization; update the
simd_copy::copy_bytes_overshooting calls in copy_with_nobranch_check to pass the
actual source capacities (e.g., f1_len/f2_len or the aggregate capacity used by
copy_with_checks) instead of the copy length m1_in_f1, so the tuple arguments
reflect (ptr, source_capacity) like the non-branchless path and enable proper
SIMD behavior if this path is used in production.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@zstd/src/decoding/ringbuffer.rs`:
- Around line 659-663: The call in copy_with_nobranch_check uses m1_in_f1 as
both the source capacity and copy length which can prevent SIMD overshoot
optimization; update the simd_copy::copy_bytes_overshooting calls in
copy_with_nobranch_check to pass the actual source capacities (e.g.,
f1_len/f2_len or the aggregate capacity used by copy_with_checks) instead of the
copy length m1_in_f1, so the tuple arguments reflect (ptr, source_capacity) like
the non-branchless path and enable proper SIMD behavior if this path is used in
production.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3f86b8f0-ab67-41b2-85a6-d543e6eb20c1

📥 Commits

Reviewing files that changed from the base of the PR and between 89736dc and 28d8956.

📒 Files selected for processing (2)
  • zstd/src/decoding/ringbuffer.rs
  • zstd/src/decoding/simd_copy.rs

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 9, 2026

@coderabbitai Fixed duplicate note from review body (ringbuffer.rs 659-663) in commit 01b3e8d: copy_with_nobranch_check now threads aggregate source/destination capacities (m1_src_cap/m2_src_cap/f1_dst_cap/f2_dst_cap) into copy_bytes_overshooting, aligning it with copy_with_checks for SIMD overshoot consistency.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

Tip

For best results, initiate chat on the files or code changes.

@polaz Great, that's the right fix — now both copy_with_checks and copy_with_nobranch_check pass the aggregate buffer capacities into copy_bytes_overshooting, so the SIMD eligibility gate (min(src_len, dst_len) >= copy_multiple) applies uniformly across both hot-path code paths. Aligning the two eliminates the earlier asymmetry where copy_with_nobranch_check would unconditionally fall back to scalar copy_from_nonoverlapping even when both the source and destination regions had sufficient surplus capacity to absorb the overshoot.


✏️ Learnings added
Learnt from: polaz
URL: https://github.com/structured-world/structured-zstd/pull/85

Timestamp: 2026-04-09T06:17:40.408Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), both `copy_with_checks` and `copy_with_nobranch_check` thread aggregate source/destination capacities (`m1_src_cap`, `m2_src_cap`, `f1_dst_cap`, `f2_dst_cap`) into `simd_copy::copy_bytes_overshooting` so the SIMD eligibility check (`min(src_len, dst_len) >= copy_multiple`) is applied uniformly. Do not flag the capacity parameters in `copy_with_nobranch_check` as unnecessary or inconsistent with `copy_with_checks`.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T07:19:26.176Z
Learning: In `cli/src/main.rs` (structured-world/structured-zstd), path-based compression calls `set_source_size_hint(source_size as u64)` (NOT `set_pledged_content_size`) for the filesystem metadata size. This is intentional: `metadata().len()` is unreliable for pseudo-files (`/proc/*`), FIFOs, and growing inputs, so the hint must remain advisory (only scaling matcher/window parameters) and must not impose a hard byte contract on the encoder. Do not flag the absence of `set_pledged_content_size` in path-based compression as a missing feature.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:56.626Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_preserves_prefix_for_runtime_chunk_lengths` (or equivalent) includes a misaligned extra-capacity case that uses `add(1)`-offset pointers and a `chunk_plus_one` length to exercise the rounded-up wildcopy overshoot path (where `copy_multiple > copy_at_least`) and validates prefix correctness. Do not flag the absence of a misaligned-pointer case in this test — it is present by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:51.926Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test suite for `copy_bytes_overshooting` includes a misaligned extra-capacity case that exercises the rounded-up wildcopy path (where `copy_multiple > copy_at_least`) and validates prefix correctness for that scenario. Do not flag the absence of a misaligned/overshooting extra-capacity sub-case as missing test coverage.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T20:26:15.902Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_covers_all_copy_strategies` derives copy lengths from a helper `active_chunk_size_for_tests()` (not hardcoded values) so that single-chunk, multi-chunk, and sub-chunk (fallback) cases adapt to whichever SIMD backend width is active at runtime (AVX-512/AVX2/SSE2/NEON/scalar). Do not flag hardcoded test lengths as missing coverage for other backends — the test is backend-adaptive by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-06T01:40:24.378Z
Learning: In `zstd/benches/compare_ffi.rs` (structured-world/structured-zstd), Rust FastCOVER trains with the post-finalization content budget in both the `REPORT_DICT_TRAIN` emission path (around lines 208-225) and the Criterion benchmark path (around lines 266-280). Both paths were aligned in commit 8622344. Do not flag these ranges as using inconsistent budget values.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:498-509
Timestamp: 2026-04-02T22:26:07.979Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), `best_level_does_not_regress_vs_better` uses a `<=` (not strict `<`) assertion because the `repeat_offset_fixture(b"HelloWorld", ...)` input is simple enough that HC saturates at both Better (16 candidates) and Best (32 candidates) search depths, producing identical compressed sizes (~30243 bytes). Strict `<` would be a false positive on this fixture. The strict `Best < Better` quality assertion lives in `cross_validation::best_level_beats_better_on_corpus_proxy` on the decodecorpus sample. Do not re-flag the `<=` as a weakened guard.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T08:17:53.856Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd), `set_source_size_hint` applies only to the payload (uncompressed source) bytes, not to dictionary priming. Dictionary priming is separate and does not inflate the advertised frame window size. Do not suggest including dictionary bytes in the size hint passed to `set_source_size_hint`.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-05T22:29:06.406Z
Learning: In `zstd/src/dictionary/fastcover.rs` (structured-world/structured-zstd), `FastCoverTuned.accel` is correctly populated in `train_fastcover_internal` using `accel: params.accel`, and `normalize_fastcover_params` clamps `accel` to `1..=10`. Do not flag `FastCoverTuned.accel` as unclamped or incorrectly set.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1554-1559
Timestamp: 2026-04-02T12:03:04.619Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` has an early-return guard at `abs_pos >= u32::MAX` that disables all further HC insertions for streams >4 GiB, degrading `CompressionLevel::Better` to repcodes-only. This is a known limitation tracked in issue `#51`; the proper fix (table rebasing) is deferred. A TODO comment and a limitation note on `CompressionLevel::Better` doc were added in PR `#48`. Do not re-flag this as a new bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 61
File: zstd/src/encoding/mod.rs:169-179
Timestamp: 2026-04-04T20:45:28.506Z
Learning: In `zstd/src/encoding/mod.rs`, `Matcher::set_source_size_hint()` has a default no-op implementation intentionally. Only the built-in `MatchGeneratorDriver` overrides it to apply the hint during level resolution. `NoDictionaryMatcher` and `TinyMatcher` are test stubs and are explicitly exempt from implementing this hook. Do not flag the absent override in non-driver matchers as a bug.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 53
File: zstd/src/tests/roundtrip_integrity.rs:479-523
Timestamp: 2026-04-02T19:19:44.750Z
Learning: In `structured-zstd` (`zstd/src/tests/roundtrip_integrity.rs`), the per-level roundtrip test suites (Better, Best) share a common `level_roundtrip_suite!` macro that expands the same 7-test matrix (compressible, random, multi-block, streaming, edge cases, repeat offsets, large literals) for each level via module-scoped macro invocations. Do not suggest re-inlining these as separate flat test functions — the macro was added intentionally to eliminate cross-level drift.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 58
File: .github/workflows/ci.yml:31-32
Timestamp: 2026-04-03T11:24:17.021Z
Learning: In `structured-world/structured-zstd`, the CI Clippy lint job (`.github/workflows/ci.yml`) uses two separate steps: (1) the required gate: `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings`, and (2) a separate bench-internals step: `cargo clippy -p structured-zstd --features hash,std,dict_builder,bench_internals -- -D warnings`. The `bench_internals` feature must NOT replace the primary gate — it runs as an additional step only.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-03T17:39:58.627Z
Learning: In `zstd/src/decoding/frame_decoder.rs`, `FrameDecoderState::reset()` intentionally does NOT reserve the frame-content-size (FCS) delta beyond `window_size`. Two cases cover the design space: (1) multi-segment frames where FCS >> window_size — the delta can reach ~99 MiB, and eagerly reserving it would penalize streaming callers; (2) single-segment frames where window_size == FCS — the delta is always 0 so there is nothing to reserve. Incremental growth during decoding is handled by the per-block `MAX_BLOCK_SIZE` (128 KiB) reserve in `execute_sequences`. Do not flag the absence of an FCS-delta reserve in `reset()` as a missing pre-allocation.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 48
File: zstd/src/encoding/match_generator.rs:1531-1535
Timestamp: 2026-04-01T15:33:17.229Z
Learning: In `zstd/src/encoding/match_generator.rs`, `HcMatchGenerator::insert_position` intentionally casts `abs_pos: usize` to `u32` (wrapping on streams > 4 GiB). This is safe by design: (1) the chain table slot is computed as `(pos_u32 as usize) & chain_mask`, so only the low bits matter and wrapping leaves the addressed slot unchanged; (2) `chain_candidates` filters any recovered `cur as usize` value against `history_abs_start` — a wrapped (small) value is always less than the current sliding-window start and is discarded. Do not flag this cast as a bug.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants