perf(bench): add wildcopy candidate research bench and dashboard range filter by polaz · Pull Request #107 · structured-world/structured-zstd

polaz · 2026-04-12T10:33:02Z

Summary

add benchmark dashboard usability fixes in gh-pages source:
- taller chart canvas so legend/series toggles remain visible
- From/To snapshot selectors to focus on recent measurement points
add a dedicated Criterion bench wildcopy_candidates (feature-gated by bench_internals) for issue research(decoding): evaluate advanced SIMD wildcopy paths beyond current baseline #87 research
expose a bench-only internal entrypoint for production wildcopy kernel comparison

Issue linkage

Related: research(decoding): evaluate advanced SIMD wildcopy paths beyond current baseline #87 (research task)
This PR does not close research(decoding): evaluate advanced SIMD wildcopy paths beyond current baseline #87.

Bench findings (local)

cargo bench --bench wildcopy_candidates -p structured-zstd --features bench_internals -- --output-format bencher

baseline/64: 3 ns
candidate/64: 2 ns
baseline/256: 7 ns
candidate/256: 4 ns
baseline/1024: 28 ns
candidate/1024: 14 ns
baseline/4096: 94 ns
candidate/4096: 58 ns
baseline/16384: 347 ns
candidate/16384: 268 ns
baseline/65536: 1368 ns
candidate/65536: 1121 ns

Validation

cargo fmt -- --check
cargo check --workspace
cargo clippy --all-targets -- -D warnings
cargo check -p structured-zstd --features bench_internals --benches
cargo nextest run --workspace
cargo test --doc --workspace

Summary by CodeRabbit

New Features
- Added date range (From/To) controls and inclusive time-window filtering to the dashboard.
UI
- Chart now fills its container with fixed visual height; time-window select lists populate dynamically, retain/validate selections, and update with filter changes.
Tests
- Added an x86/x86_64 benchmark comparing baseline and candidate copy strategies and registered a new benchmark target.
Documentation
- Added benchmark research notes with measurements and a provisional GO recommendation.
Refactor
- Exposed bench-only internals behind a feature gate to support benchmarking.

coderabbitai · 2026-04-12T10:33:15Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2d982636-19cc-4751-a9b2-54f49cef79c8

📥 Commits

Reviewing files that changed from the base of the PR and between 7fb9592 and 70e7a5c.

📒 Files selected for processing (1)

zstd/src/decoding/simd_copy.rs

📝 Walkthrough

Walkthrough

Adds a bench-only Criterion benchmark and feature-gated wrappers to exercise simd_copy wildcopy candidates; re-exports a bench entrypoint; and enhances the bench-dashboard with From/To time-window selectors and Chart.js sizing/layout adjustments.

Changes

Cohort / File(s)	Summary
Bench dashboard `\.github/bench-dashboard/index.html`	Added `From`/`To` time-point selects; split filtering into core vs time-window (`matchesCoreFilters`, `applyTimeWindow`, `updateTimeWindowOptions`); added `renderPointOptions`; adjusted chart sizing and `maintainAspectRatio: false`.
Bench target & new benchmark `zstd/Cargo.toml`, `zstd/benches/wildcopy_candidates.rs`	Added feature-gated `wildcopy_candidates` bench target and new Criterion benchmark comparing SIMD baselines (AVX-512/AVX2/SSE2/scalar) vs an AVX2 `unroll2` candidate; x86/x86_64-only kernels with noop `main` on other targets.
Decoder simd visibility & bench wrappers `zstd/src/decoding/mod.rs`, `zstd/src/decoding/simd_copy.rs`, `zstd/src/lib.rs`	Exposed a bench-gated crate-scoped re-export and added bench-only wrapper functions `copy_bytes_overshooting_for_bench` to call existing `copy_bytes_overshooting(...)` for benchmarks (`feature = "bench_internals"`).
Docs `BENCHMARKS.md`	Appended "Wildcopy Candidates" research section with links, measurement table, checklist, and provisional GO decision for the AVX2 unroll2 candidate.

Sequence Diagram(s)

sequenceDiagram
  participant Browser as Browser (bench-dashboard)
  participant Runner as Bench Runner (cargo + Criterion)
  participant BenchLib as zstd::testing (bench_internals)
  participant Simd as simd_copy impl
  participant CPU as CPU (ISA/features)

  Browser->>Runner: request benchmark data / trigger run (filtered by From/To)
  Runner->>BenchLib: execute bench_wildcopy_candidates
  BenchLib->>Simd: call copy_bytes_overshooting_for_bench(src,dst,len)
  Simd->>CPU: invoke ISA-specific kernel (AVX-512/AVX2/SSE2/scalar)
  CPU-->>Simd: return copied bytes
  Simd-->>BenchLib: provide output for verification
  BenchLib-->>Runner: report throughput & correctness
  Runner-->>Browser: update dashboard charts (respecting time window)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related issues

perf(decoding): integrate AVX2 unroll2 wildcopy candidate #108 — Benchmarks and exposes the AVX2 unroll2 candidate and bench-only wrapper required to evaluate integration into the production wildcopy path.
#87: research(decoding): evaluate advanced SIMD wildcopy paths beyond current baseline — This PR provides the benchmark harness and candidate kernels needed to fulfill the research objectives.

Possibly related PRs

perf(bench): multi-arch relative Rust-vs-FFI dashboard #78 — Overlaps dashboard filtering/rendering and time-window UI changes.
perf(decoding): add runtime-dispatched simd wildcopy #85 — Prior work on simd_copy / overshooting copy that this PR exposes via a bench-only wrapper.
perf(decoding): branchless bitstream reader with mask table and BMI2 support #58 — Related feature-gated testing/benchmark gating and re-export patterns.

Poem

🐇
I hopped through lanes of AVX and SSE,
unrolled the bytes so wild and free,
From and To keep charts in line,
benches hum, the kernels shine,
rabbit claps — the bytes agree!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the two main changes: adding a wildcopy candidate research benchmark and a dashboard range filter feature.
Linked Issues check	✅ Passed	The PR implements all coding requirements from `#87`: benchmarks a wildco candidate (AVX2 unroll2) vs baseline, validates correctness with existing tests, provides reproducible performance gains (2-5x improvement), and documents go/no-go decision.
Out of Scope Changes check	✅ Passed	All changes directly support `#87` research objectives: benchmark code/infrastructure, dashboard improvements for visualizing results, and documentation. No unrelated modifications detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/#87-wildcopy-research

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Adds benchmarking tooling for researching alternative decode wildcopy kernels (issue #87) and improves the benchmark dashboard UX by making the chart more readable and allowing users to filter to a snapshot range.

Changes:

Add a feature-gated Criterion bench (wildcopy_candidates) and a bench-only entrypoint to call the production overshooting wildcopy implementation.
Register the new bench in zstd/Cargo.toml behind bench_internals.
Improve the gh-pages benchmark dashboard with a taller chart and From/To snapshot selectors to focus on a time window.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`zstd/src/lib.rs`	Re-export bench-only wildcopy entrypoint via `structured_zstd::testing` for benchmark use.
`zstd/src/decoding/simd_copy.rs`	Add `copy_bytes_overshooting_for_bench` wrapper behind `bench_internals`.
`zstd/src/decoding/mod.rs`	Re-export the bench-only entrypoint from `decoding` under `bench_internals`.
`zstd/Cargo.toml`	Register new `wildcopy_candidates` Criterion bench gated by `bench_internals`.
`zstd/benches/wildcopy_candidates.rs`	New microbench comparing production wildcopy vs an AVX2 unroll candidate.
`.github/bench-dashboard/index.html`	Dashboard layout improvements + snapshot range filtering (`From`/`To`).

codecov · 2026-04-12T11:42:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/benches/wildcopy_candidates.rs`:
- Around line 68-126: The benchmark is comparing different call paths: baseline
uses copy_bytes_overshooting_for_bench (which does strategy selection, capacity
checks, and rounding) while the candidate calls candidate_copy_kernel directly
and all tested lengths are 64-byte aligned—this produces an optimistic and
non-equivalent comparison. Fix by making the two measured paths equivalent:
either (A) call the selected kernel directly for both sides (invoke the same
kernel function used by copy_bytes_overshooting_for_bench for the baseline
instead of copy_bytes_overshooting_for_bench), or (B) wrap the candidate
invocation in the same dispatch/rounding logic (call
copy_bytes_overshooting_for_bench for both) so both pay the same
selection/rounding costs; also update the lengths array to include at least one
non-multiple-of-64 value (e.g., 65) so tail/rounded paths are exercised. Ensure
you modify the benchmark code that constructs lengths, the baseline invocation
using copy_bytes_overshooting_for_bench, and the candidate invocation using
candidate_copy_kernel so both paths are symmetric.

In `@zstd/src/decoding/mod.rs`:
- Line 24: Replace the broad module export with a private module and expose only
the bench shim: change the declaration `pub(crate) mod simd_copy;` to `mod
simd_copy;` inside `decoding`, then add a targeted re-export like `pub(crate)
use self::simd_copy::copy_bytes_overshooting_for_bench;` so only that function
is visible across the crate; finally update zstd/src/lib.rs to reference
`crate::decoding::copy_bytes_overshooting_for_bench` instead of importing the
entire `simd_copy` module.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fcd3516a-4962-40fe-98c2-9a2fe3bc9523

📥 Commits

Reviewing files that changed from the base of the PR and between d405446 and 3d5a9c8.

📒 Files selected for processing (5)

BENCHMARKS.md
zstd/benches/wildcopy_candidates.rs
zstd/src/decoding/mod.rs
zstd/src/decoding/simd_copy.rs
zstd/src/lib.rs

Copilot

Pull request overview

This PR improves the benchmarking workflow for structured-zstd by adding a new research-focused Criterion benchmark for SIMD “wildcopy” candidate evaluation and enhancing the benchmark dashboard UI with a selectable snapshot window.

Changes:

Added a new wildcopy_candidates Criterion benchmark (gated behind bench_internals) to compare copy kernels against the production overshooting wildcopy.
Exposed a bench-only internal entrypoint for the production wildcopy implementation and adjusted module visibility to support it.
Improved the benchmark dashboard with a larger chart area and From/To snapshot selectors to filter the displayed time window.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`zstd/src/lib.rs`	Re-exports the bench-only wildcopy entrypoint under the existing `bench_internals`-gated `testing` module.
`zstd/src/decoding/simd_copy.rs`	Adds a `bench_internals`-gated wrapper around the production overshooting copy routine for benchmarking.
`zstd/src/decoding/mod.rs`	Makes `simd_copy` visible within the crate to enable the bench-only re-export path.
`zstd/Cargo.toml`	Registers the new `wildcopy_candidates` bench and gates it behind `bench_internals`.
`zstd/benches/wildcopy_candidates.rs`	Implements the new Criterion benchmark comparing baseline vs a candidate AVX2 unroll2 kernel (with scalar fallback).
`BENCHMARKS.md`	Documents the Issue #87 research mapping, local sample measurements, and decision/follow-up linkage.
`.github/bench-dashboard/index.html`	Adds snapshot range selectors and adjusts layout/chart sizing to improve dashboard usability.

sw-release-bot

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'structured-zstd vs C FFI (x86_64-gnu)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.30.

Benchmark suite	Current: `70e7a5c`	Previous: `051fed2`	Ratio
`compress/fastest/decodecorpus-z000033/matrix/c_ffi`	`3.736` ms	`2.766` ms	`1.35`

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

zstd/src/decoding/mod.rs (1)
25-26: 🛠️ Refactor suggestion | 🟠 Major

Keep the bench shim off the public decoding surface.

pub use here makes the helper reachable as structured_zstd::decoding::copy_bytes_overshooting_for_bench, so the hidden testing module is no longer the sole bench-only escape hatch. Keep this re-export non-public from decoding and let testing remain the only public facade for bench builds.

Based on learnings, internal decode/encode helpers remain pub(crate) by default, and explicit public shims are exposed only behind a dedicated feature flag in a separate file.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/decoding/mod.rs` around lines 25 - 26, The public re-export exposes
the bench helper on the decoding public API; change the visibility of the
re-export so it is not public (e.g., replace `pub use
self::simd_copy::copy_bytes_overshooting_for_bench;` with a non-public re-export
such as `pub(crate) use` or remove the re-export and let the `testing` module
remain the only bench-facing facade), ensuring the symbol
copy_bytes_overshooting_for_bench from module simd_copy is only reachable via
the bench-only testing shim.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/benches/wildcopy_candidates.rs`:
- Around line 91-115: select_baseline_copy_kernel currently only picks AVX2 vs
scalar and therefore diverges from the production dispatcher (copy_strategy)
which may choose AVX-512 or SSE2 based on features and length; update
select_baseline_copy_kernel to use the same dispatch policy as production by
delegating to or replicating copy_strategy logic so that when AVX-512 or SSE2
are selected in production the bench uses the same baseline kernel name, chunk
and kernel function (e.g., return the same BenchPath values used for AVX-512,
AVX2, SSE2 and scalar in copy_strategy), honoring the same feature checks and
length thresholds (len >= 64 for AVX-512, len >= 16 for SSE2, etc.) so bench
timings match production behavior.
- Line 151: The benchmark's lengths array (variable names: lengths in
wildcopy_candidates.rs) starts at 64 and 65 so it omits small-block cases;
update the lengths array to include at least a couple of sub-64 sizes (for
example 1, 8, 32 or similar small values) before 64 so the bench measures the
small-block regime called out in issue `#87` and can catch chunk-threshold
performance changes; ensure the new small sizes are inserted into the existing
lengths array literal used by the benchmark loop so all tests run as before but
now include those sub-64 cases.

---

Duplicate comments:
In `@zstd/src/decoding/mod.rs`:
- Around line 25-26: The public re-export exposes the bench helper on the
decoding public API; change the visibility of the re-export so it is not public
(e.g., replace `pub use self::simd_copy::copy_bytes_overshooting_for_bench;`
with a non-public re-export such as `pub(crate) use` or remove the re-export and
let the `testing` module remain the only bench-facing facade), ensuring the
symbol copy_bytes_overshooting_for_bench from module simd_copy is only reachable
via the bench-only testing shim.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6bbbdb5f-3697-4c07-9250-db86c02a248e

📥 Commits

Reviewing files that changed from the base of the PR and between 3d5a9c8 and e9e073e.

📒 Files selected for processing (3)

zstd/benches/wildcopy_candidates.rs
zstd/src/decoding/mod.rs
zstd/src/lib.rs

Copilot

Pull request overview

Adds bench-only tooling and UI improvements to support research/iteration on decode wildcopy performance (issue #87), while improving the benchmark dashboard’s usability when inspecting recent runs.

Changes:

Adds a feature-gated internal entrypoint exposing the production overshooting copy for benchmarks (bench_internals).
Introduces a new Criterion benchmark (wildcopy_candidates) to compare copy kernels across payload sizes.
Updates the GitHub Pages benchmark dashboard to improve chart sizing and add From/To snapshot range filtering.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`zstd/src/lib.rs`	Re-exports the new bench-only copy entrypoint via the `testing` module.
`zstd/src/decoding/simd_copy.rs`	Adds a `bench_internals`-gated wrapper around the production overshooting copy routine.
`zstd/src/decoding/mod.rs`	Re-exports the bench-only copy wrapper from the decoding module.
`zstd/Cargo.toml`	Registers the new `wildcopy_candidates` Criterion bench (gated by `bench_internals`).
`zstd/benches/wildcopy_candidates.rs`	Implements the baseline vs candidate copy-kernel benchmark with correctness checks vs production output.
`BENCHMARKS.md`	Documents the issue #87 research mapping, sample measurements, and follow-up link.
`.github/bench-dashboard/index.html`	Improves chart layout and adds From/To selectors with inclusive time-window filtering.

polaz · 2026-04-12T12:45:11Z

@coderabbitai Duplicate body item about pub use self::simd_copy::copy_bytes_overshooting_for_bench in decoding is now fixed in commit f507c7e: decoding no longer publicly re-exports the bench shim, and testing re-exports directly from decoding::simd_copy.

coderabbitai · 2026-04-12T12:45:29Z

Tip

For best results, initiate chat on the files or code changes.

@polaz: Thanks for the update! The fix in commit f507c7e looks correct — removing the intermediate pub use from decoding/mod.rs and having testing re-export directly from decoding::simd_copy eliminates the duplicate path cleanly. The bench shim is now accessible via a single, unambiguous route.

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T20:26:15.902Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_covers_all_copy_strategies` derives copy lengths from a helper `active_chunk_size_for_tests()` (not hardcoded values) so that single-chunk, multi-chunk, and sub-chunk (fallback) cases adapt to whichever SIMD backend width is active at runtime (AVX-512/AVX2/SSE2/NEON/scalar). Do not flag hardcoded test lengths as missing coverage for other backends — the test is backend-adaptive by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-09T06:17:40.408Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), both `copy_with_checks` and `copy_with_nobranch_check` thread aggregate source/destination capacities (`m1_src_cap`, `m2_src_cap`, `f1_dst_cap`, `f2_dst_cap`) into `simd_copy::copy_bytes_overshooting` so the SIMD eligibility check (`min(src_len, dst_len) >= copy_multiple`) is applied uniformly. Do not flag the capacity parameters in `copy_with_nobranch_check` as unnecessary or inconsistent with `copy_with_checks`.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:56.626Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_preserves_prefix_for_runtime_chunk_lengths` (or equivalent) includes a misaligned extra-capacity case that uses `add(1)`-offset pointers and a `chunk_plus_one` length to exercise the rounded-up wildcopy overshoot path (where `copy_multiple > copy_at_least`) and validates prefix correctness. Do not flag the absence of a misaligned-pointer case in this test — it is present by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:51.926Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test suite for `copy_bytes_overshooting` includes a misaligned extra-capacity case that exercises the rounded-up wildcopy path (where `copy_multiple > copy_at_least`) and validates prefix correctness for that scenario. Do not flag the absence of a misaligned/overshooting extra-capacity sub-case as missing test coverage.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T11:56:17.795Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#102`), after commit 46927db, hash-mix kernel dispatch was moved from the global `HASH_MIX_KERNEL` AtomicU8 hot-path read into a per-matcher instance field `hash_mix_kernel` stored directly in `DfastMatchGenerator` and `RowMatchGenerator`. The hot path no longer performs an atomic load or global branch; kernel selection happens once at matcher initialisation and is subsequently a plain struct-field read. Do not flag the hot path as doing a per-call atomic load — that pattern was superseded by commit 46927db.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-06T01:40:24.378Z
Learning: In `zstd/benches/compare_ffi.rs` (structured-world/structured-zstd), Rust FastCOVER trains with the post-finalization content budget in both the `REPORT_DICT_TRAIN` emission path (around lines 208-225) and the Criterion benchmark path (around lines 266-280). Both paths were aligned in commit 8622344. Do not flag these ranges as using inconsistent budget values.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/frame_compressor.rs:779-805
Timestamp: 2026-04-11T07:48:58.281Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd, PR `#99`), a temporary compatibility guard exists for sub-256-byte compressed payloads: the 1-byte-FCS single-segment form is not yet fully C-FFI compatible for payloads below 256 bytes, and a compat guard was added to handle this case while the path is hardened. Do not flag this guard as unnecessary.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-09T15:16:10.603Z
Learning: In structured-world/structured-zstd, the `fuzz_exports` API concern was resolved via "Option B": internal decode/encode helpers remain `pub(crate)` by default, and explicit public shims are exposed only behind a dedicated feature flag (e.g., `fuzz`) in a separate file. Do not flag these internal symbols as needing a wider public API — the feature-gated shim layer is the intentional design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T11:31:46.064Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#102`), the test-only `with_forced_hash_mix_kernel(...)` helper forces the global `HASH_MIX_KERNEL` atomic to a specific kernel for coverage. Tests that call this helper must acquire a shared mutex lock *before* the `with_forced_hash_mix_kernel` call (not only around the assertion) to prevent cross-test kernel races when `cargo nextest` runs tests in parallel. This pattern was locked in commit 904d2b0. Do not flag the lock acquisition as unnecessarily early.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-09T15:16:17.570Z
Learning: In `zstd/src/huff0/huff0_decoder.rs` (structured-world/structured-zstd), `decode_symbol` and `next_state` are intentionally private (not `pub`). Public fuzz-only shims `fuzz_decode_symbol` and `fuzz_next_state` are exposed exclusively behind `#[cfg(feature = "fuzz_exports")]`. Do not flag `decode_symbol` or `next_state` as needing public visibility — the fuzz shim pattern is the intended API contract.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 58
File: .github/workflows/ci.yml:31-32
Timestamp: 2026-04-03T11:24:17.021Z
Learning: In `structured-world/structured-zstd`, the CI Clippy lint job (`.github/workflows/ci.yml`) uses two separate steps: (1) the required gate: `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings`, and (2) a separate bench-internals step: `cargo clippy -p structured-zstd --features hash,std,dict_builder,bench_internals -- -D warnings`. The `bench_internals` feature must NOT replace the primary gate — it runs as an additional step only.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/benches/wildcopy_candidates.rs`:
- Around line 191-199: Extract the candidate construction logic into
select_candidate_copy_kernel by returning a BenchPath (instead of just kernel)
so the chunk size and name are chosen in one place; update
select_candidate_copy_kernel to return BenchPath (containing name, chunk,
kernel) and change bench_wildcopy_candidates to use that returned BenchPath
rather than computing chunk via string comparison on candidate_name (remove the
special-case for "candidate_avx2_unroll2" there); ensure you update all call
sites to use the new BenchPath return type and keep symbols like BenchPath,
select_candidate_copy_kernel, bench_wildcopy_candidates, candidate_avx2_unroll2,
candidate_name, and candidate_copy_kernel in mind when locating the changes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2ab1af44-9f8e-40ab-873d-f62c7450d8f8

📥 Commits

Reviewing files that changed from the base of the PR and between e9e073e and f507c7e.

📒 Files selected for processing (3)

zstd/benches/wildcopy_candidates.rs
zstd/src/decoding/mod.rs
zstd/src/lib.rs

Copilot

Pull request overview

This PR improves performance research and benchmarking ergonomics by (1) adding a focused Criterion bench to compare wildcopy kernels and (2) enhancing the GitHub Pages benchmark dashboard with a selectable snapshot window.

Changes:

Added a wildcopy_candidates Criterion benchmark (feature-gated by bench_internals) and a bench-only internal entrypoint to call the production overshooting copy routine.
Updated the benchmark dashboard UI to use a taller chart area and added From / To snapshot selectors with inclusive time-window filtering.
Documented Issue #87 research mapping and local sample measurements in BENCHMARKS.md.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
zstd/src/lib.rs	Re-exports a bench-only wildcopy entrypoint via `structured_zstd::testing` behind `bench_internals`.
zstd/src/decoding/simd_copy.rs	Adds `copy_bytes_overshooting_for_bench` wrapper (cfg-gated) to expose the production wildcopy for benchmarks.
zstd/src/decoding/mod.rs	Makes `simd_copy` visible within the crate (`pub(crate)`) to support the bench-only re-export path.
zstd/Cargo.toml	Registers the new `wildcopy_candidates` bench gated by `bench_internals`.
zstd/benches/wildcopy_candidates.rs	New Criterion bench comparing baseline vs candidate copy kernels across sizes with correctness checks vs production output.
BENCHMARKS.md	Adds Issue #87 research mapping + local sample results + follow-up linkage.
.github/bench-dashboard/index.html	Improves chart layout and introduces snapshot window filtering (`From`/`To`) for relative dashboard points.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/benches/wildcopy_candidates.rs`:
- Around line 116-129: select_candidate_copy_kernel currently chooses the AVX2
64-byte chunk path solely by CPU feature, which forces small lengths to be
rounded up to 64 and prevents benchmarking the scalar fallback; update
select_candidate_copy_kernel (and the duplicate at the other location) to take
the requested length (or a max_chunk parameter) and choose the candidate based
on both CPU feature and requested length (e.g., if avx2 is available but
requested length < 64 select the scalar/smaller-chunk BenchPath or a smaller
unrolled kernel), referencing the existing symbols select_candidate_copy_kernel,
copy_candidate_unroll2_avx2, and copy_candidate_scalar so the function returns
the appropriate chunk and kernel for small lengths instead of always returning
chunk: 64 on AVX2.
- Around line 170-176: The bench currently hardcodes baseline_scalar for all
non-x86 targets in select_baseline_copy_kernel (returning a BenchPath with name
"baseline_scalar" and kernel copy_candidate_scalar), which misrepresents the
production baseline; either (A) change select_baseline_copy_kernel to mirror the
decoder's runtime strategy selection by calling the same production dispatcher
(the runtime CPU/feature probe used elsewhere) and return the matching BenchPath
so non-x86 targets use their actual backend (NEON/scalar/etc.), or (B) restrict
this benchmark to only the architectures the bench models by cfg-gating the
entire bench to the same target_arch set as the production dispatcher; update
the implementation of select_baseline_copy_kernel and its returned BenchPath
accordingly so results compare candidates against the real runtime baseline
rather than always "baseline_scalar".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: cb87a0f9-19de-45c7-a93f-c7dfbf4747ab

📥 Commits

Reviewing files that changed from the base of the PR and between f507c7e and 9aa0c69.

📒 Files selected for processing (1)

zstd/benches/wildcopy_candidates.rs

coderabbitai

♻️ Duplicate comments (2)

zstd/benches/wildcopy_candidates.rs (2)
170-176: ⚠️ Potential issue | 🟠 Major

Non-x86 runs still benchmark scalar, not the production baseline.

Line 171 hardwires baseline_scalar for every non-x86 target. On ARM/NEON runners this bench would compare against a path the decoder does not actually use, so those results are not representative for issue #87. Either mirror the non-x86 dispatcher here or cfg-gate the benchmark to the architectures it actually models. Based on learnings, copy_bytes_overshooting strategy coverage adapts across AVX-512/AVX2/SSE2/NEON/scalar, so the production dispatcher is not scalar-only off x86.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/benches/wildcopy_candidates.rs` around lines 170 - 176, The benchmark
currently forces select_baseline_copy_kernel to return the scalar BenchPath
(name "baseline_scalar", kernel copy_candidate_scalar) for all non-x86 targets;
update it to mirror the production dispatcher used by copy_bytes_overshooting so
ARM/NEON runs pick the NEON baseline instead of scalar, or alternatively
cfg-gate the entire benchmark to only run on the architectures modeled
(x86/x86_64/arm/arm64) to match production; modify the
select_baseline_copy_kernel implementation (or add matching #[cfg] arms) so the
returned BenchPath name and kernel match the real dispatcher choices (e.g., NEON
when NEON is available) rather than always returning baseline_scalar.
116-129: ⚠️ Potential issue | 🟠 Major

Drive candidate selection from len, not only CPU caps.

Line 202 caches a single candidate_path before the loop, so on AVX2 hosts the new 17/33/63 cases are all rounded up to 64 by copy_with_overshoot_policy(). That means copy_candidate_unroll2_avx2() never exercises its < 64 scalar branch, and the small-block points are still reporting the wide AVX2 path rather than candidate small-block behavior. Select the candidate inside the for len in lengths loop and key it off len as well as runtime features.

Also applies to: 202-203
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/benches/wildcopy_candidates.rs` around lines 116 - 129, The candidate
path is chosen once by select_candidate_copy_kernel so AVX2 hosts always pick
the 64-byte path; change the logic to select the BenchPath inside the loop over
lengths (where `len` is available) and pick based on both runtime CPU features
(std::arch::is_x86_feature_detected!("avx2")) and the current `len` so small
lengths choose `copy_candidate_scalar` (or a smaller chunk) while larger lengths
use `copy_candidate_unroll2_avx2`; update any use of `candidate_path`/BenchPath
so selection occurs per-iteration before calling `copy_with_overshoot_policy()`
to ensure the scalar branch of `copy_candidate_unroll2_avx2` and small-block
candidates are exercised.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@zstd/benches/wildcopy_candidates.rs`:
- Around line 170-176: The benchmark currently forces
select_baseline_copy_kernel to return the scalar BenchPath (name
"baseline_scalar", kernel copy_candidate_scalar) for all non-x86 targets; update
it to mirror the production dispatcher used by copy_bytes_overshooting so
ARM/NEON runs pick the NEON baseline instead of scalar, or alternatively
cfg-gate the entire benchmark to only run on the architectures modeled
(x86/x86_64/arm/arm64) to match production; modify the
select_baseline_copy_kernel implementation (or add matching #[cfg] arms) so the
returned BenchPath name and kernel match the real dispatcher choices (e.g., NEON
when NEON is available) rather than always returning baseline_scalar.
- Around line 116-129: The candidate path is chosen once by
select_candidate_copy_kernel so AVX2 hosts always pick the 64-byte path; change
the logic to select the BenchPath inside the loop over lengths (where `len` is
available) and pick based on both runtime CPU features
(std::arch::is_x86_feature_detected!("avx2")) and the current `len` so small
lengths choose `copy_candidate_scalar` (or a smaller chunk) while larger lengths
use `copy_candidate_unroll2_avx2`; update any use of `candidate_path`/BenchPath
so selection occurs per-iteration before calling `copy_with_overshoot_policy()`
to ensure the scalar branch of `copy_candidate_unroll2_avx2` and small-block
candidates are exercised.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 21277d7e-7f8c-49b7-aaf6-a0c90abe2e7f

📥 Commits

Reviewing files that changed from the base of the PR and between f507c7e and 9aa0c69.

📒 Files selected for processing (1)

zstd/benches/wildcopy_candidates.rs

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/benches/wildcopy_candidates.rs`:
- Around line 123-136: When an AVX2 host is detected but len < 64, the scalar
fallback uses chunk = size_of::<usize>() which changes overshoot behavior; in
select_candidate_copy_kernel ensure the fallback candidate uses the same
production chunk as the AVX2 path so both paths pay the same overshoot (i.e.,
return BenchPath with chunk = 64 even when kernel = copy_candidate_scalar) or
alternatively skip returning a candidate when no drop-in candidate exists;
update the BenchPath construction in select_candidate_copy_kernel (and analogous
spots) so chunk matches the production copy size used by
copy_with_overshoot_policy.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 97b84742-3f90-4903-a723-eca1d3f077db

📥 Commits

Reviewing files that changed from the base of the PR and between 9aa0c69 and c6e741b.

📒 Files selected for processing (1)

zstd/benches/wildcopy_candidates.rs

Copilot

Pull request overview

This PR improves benchmark research and visualization tooling for structured-zstd, adding a focused wildcopy candidate benchmark (for issue #87 research) and enhancing the bench dashboard UX with a selectable snapshot range.

Changes:

Add wildcopy_candidates Criterion benchmark (feature-gated by bench_internals) to compare copy-kernel candidates vs the production wildcopy behavior.
Expose a bench-only entrypoint via structured_zstd::testing to call the production overshooting copy function from benches.
Improve the benchmark dashboard layout and add From/To snapshot selectors to filter the displayed time window.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`zstd/src/lib.rs`	Re-export bench-only internal entrypoint under `testing` (feature-gated).
`zstd/src/decoding/simd_copy.rs`	Add bench-only wrapper for the production overshooting wildcopy helper.
`zstd/src/decoding/mod.rs`	Make `simd_copy` accessible within the crate for bench re-exports.
`zstd/Cargo.toml`	Register the new `wildcopy_candidates` benchmark (requires `bench_internals`).
`zstd/benches/wildcopy_candidates.rs`	New benchmark suite comparing baseline/candidate copy kernels and validating output vs production.
`BENCHMARKS.md`	Document issue #87 mapping, sample results, and follow-up linkage.
`.github/bench-dashboard/index.html`	Increase chart height and add inclusive `From`/`To` snapshot range filtering.

Copilot

Pull request overview

This PR improves benchmarking support for structured-zstd by (1) adding a Criterion research bench for evaluating alternative overshooting copy kernels (wildcopy candidates) and (2) improving the GitHub Pages benchmark dashboard usability with a selectable snapshot range and better chart sizing.

Changes:

Add a bench_internals-gated wildcopy_candidates Criterion benchmark to compare baseline vs candidate copy kernels.
Expose a bench_internals-gated bench-only entrypoint for the production overshooting copy routine via structured_zstd::testing.
Enhance the benchmark dashboard with From/To snapshot selectors and a taller, responsive chart area.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`zstd/src/lib.rs`	Re-export the bench-only wildcopy entrypoint from `structured_zstd::testing` (feature-gated).
`zstd/src/decoding/simd_copy.rs`	Add `copy_bytes_overshooting_for_bench` behind `bench_internals` to call the production implementation.
`zstd/src/decoding/mod.rs`	Make `simd_copy` `pub(crate)` so it can be re-exported for benches.
`zstd/Cargo.toml`	Register the new `wildcopy_candidates` bench behind `bench_internals`.
`zstd/benches/wildcopy_candidates.rs`	New x86/x86_64-only benchmark comparing baseline vs candidate copy kernels against production output.
`BENCHMARKS.md`	Document issue #87 research mapping, sample numbers, and follow-up link.
`.github/bench-dashboard/index.html`	Add From/To filters and update chart sizing / aspect ratio behavior.

- keep bench internals exposed only via structured_zstd::testing - select candidate kernel once and harden benchmark against copy elision - document #87 decision mapping and link follow-up issue #108

- benchmark baseline and candidate with symmetric overshoot policy - add non-aligned length coverage and tighten bench shim visibility

- baseline selector now mirrors avx512/avx2/sse2/scalar thresholds - include sub-64 lengths and keep bench shim only under testing facade

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

zstd/benches/wildcopy_candidates.rs (1)

240-267: ⚠️ Potential issue | 🟠 Major

Validate the rounded wildcopy span, not only the requested prefix.

These assertions stop at [..len], so a candidate that preserves the first len bytes but diverges in the rounded-up overshoot region would still pass here. Since issue #87 explicitly calls out preserving the wildcopy overshoot contract, the check should cover the full copy_multiple span whenever the buffers are sized for it.

🧪 Minimal tightening

+        let check_len = len.next_multiple_of(baseline_path.chunk);
         assert_eq!(
-            &dst_baseline[..len],
-            &dst_production[..len],
+            &dst_baseline[..check_len],
+            &dst_production[..check_len],
             "baseline path must match production wildcopy for len={len}"
         );
         assert_eq!(
-            &dst_candidate[..len],
-            &dst_production[..len],
+            &dst_candidate[..check_len],
+            &dst_production[..check_len],
             "candidate path must match production wildcopy for len={len}"
         );

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@zstd/benches/wildcopy_candidates.rs` around lines 240 - 267, The assertions
only compare the first len bytes, missing differences in the rounded/overshoot
region; compute the full copied span (e.g. let rounded = copy_multiple(len) or
the function used to round wildcopy lengths) and assert equality over
&dst_baseline[..rounded], &dst_candidate[..rounded] against
&dst_production[..rounded] after the unsafe calls to
copy_bytes_overshooting_for_bench and copy_with_overshoot_policy so the test
validates the entire wildcopy/overshoot contract rather than just the requested
prefix (use the existing symbols copy_bytes_overshooting_for_bench,
copy_with_overshoot_policy, dst_production, dst_baseline, dst_candidate, len,
and copy_multiple).

zstd/src/decoding/mod.rs (1)

24-24: 🧹 Nitpick | 🔵 Trivial

Narrow the visibility back to the bench shim.

pub(crate) mod simd_copy; exposes the whole decoder-internal module across the crate again, but this PR still only needs one bench-only entrypoint. Re-exporting just the shim keeps the surface tighter.
♻️ Possible tightening
- pub(crate) mod simd_copy;
+ mod simd_copy;
+ #[cfg(feature = "bench_internals")]
+ pub(crate) use self::simd_copy::copy_bytes_overshooting_for_bench;
Based on learnings, internal decode/encode helpers remain pub(crate) by default, and explicit public shims are exposed only behind a dedicated feature flag.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/decoding/mod.rs` at line 24, Change the module from being
crate-visible to private and only re-export the bench entrypoint: replace
"pub(crate) mod simd_copy;" with "mod simd_copy;" and add a focused re-export
for the bench shim (e.g. "pub(crate) use simd_copy::bench_shim;") placed behind
the dedicated bench feature gate (cfg(feature = "bench")) so only the shim is
exposed rather than the entire simd_copy module.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@BENCHMARKS.md`:
- Around line 72-74: Update the decision label to clarify scope by changing the
line starting with "Decision: **GO** for production integration of AVX2 unroll2
candidate." to a microbench-specific, provisional recommendation—for example,
prefix with "Microbench recommendation / provisional GO:" or "Provisional GO
(microbench only):" and mention that this conclusion is based on local
`wildcopy_candidates` microbench data so it does not close issue `#87`; keep the
follow-up implementation issue reference [`#108`] as-is.

In `@zstd/benches/wildcopy_candidates.rs`:
- Around line 142-156: The benchmark incorrectly compares an AVX2 candidate
against an AVX-512 baseline: update select_candidate_copy_kernel to detect when
the chosen baseline is the AVX-512 path (the same detection used by
select_baseline_copy_kernel which returns baseline_avx512 for len >= 64 on
AVX-512 hosts) and either skip returning candidate_avx2_unroll2 or instead
return an AVX-512-equivalent candidate; specifically, when baseline_avx512 is
active do not select candidate_avx2_unroll2 (or add a candidate_avx512_unroll2
and return it), otherwise keep the existing AVX2 and scalar fallback choices
(refer to select_candidate_copy_kernel, select_baseline_copy_kernel,
baseline_avx512, and candidate_avx2_unroll2).

---

Duplicate comments:
In `@zstd/benches/wildcopy_candidates.rs`:
- Around line 240-267: The assertions only compare the first len bytes, missing
differences in the rounded/overshoot region; compute the full copied span (e.g.
let rounded = copy_multiple(len) or the function used to round wildcopy lengths)
and assert equality over &dst_baseline[..rounded], &dst_candidate[..rounded]
against &dst_production[..rounded] after the unsafe calls to
copy_bytes_overshooting_for_bench and copy_with_overshoot_policy so the test
validates the entire wildcopy/overshoot contract rather than just the requested
prefix (use the existing symbols copy_bytes_overshooting_for_bench,
copy_with_overshoot_policy, dst_production, dst_baseline, dst_candidate, len,
and copy_multiple).

In `@zstd/src/decoding/mod.rs`:
- Line 24: Change the module from being crate-visible to private and only
re-export the bench entrypoint: replace "pub(crate) mod simd_copy;" with "mod
simd_copy;" and add a focused re-export for the bench shim (e.g. "pub(crate) use
simd_copy::bench_shim;") placed behind the dedicated bench feature gate
(cfg(feature = "bench")) so only the shim is exposed rather than the entire
simd_copy module.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d18239bf-9656-4ab7-9943-cf9e592f94b3

📥 Commits

Reviewing files that changed from the base of the PR and between 17207ec and 2b871a7.

📒 Files selected for processing (7)

.github/bench-dashboard/index.html
BENCHMARKS.md
zstd/Cargo.toml
zstd/benches/wildcopy_candidates.rs
zstd/src/decoding/mod.rs
zstd/src/decoding/simd_copy.rs
zstd/src/lib.rs

Copilot

Pull request overview

Adds bench-focused infrastructure for researching alternative “wildcopy” kernels (issue #87), plus usability improvements to the benchmark dashboard used for visualizing relative performance trends.

Changes:

Added a new wildcopy_candidates Criterion benchmark (feature-gated by bench_internals) to compare candidate copy kernels vs the production overshooting copy.
Exposed a bench-only entrypoint to call the production overshooting copy implementation for parity checking in benchmarks.
Improved the benchmark dashboard with a taller chart area and From/To snapshot range filtering.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`zstd/src/lib.rs`	Re-exports the new bench-only overshooting copy entrypoint under `structured_zstd::testing` (feature-gated).
`zstd/src/decoding/simd_copy.rs`	Adds `copy_bytes_overshooting_for_bench` behind `bench_internals` to reach production wildcopy from benches.
`zstd/src/decoding/mod.rs`	Makes `simd_copy` visible within the crate (`pub(crate)`) so the bench re-export can reference it.
`zstd/Cargo.toml`	Registers the new `wildcopy_candidates` bench target and gates it with `bench_internals`.
`zstd/benches/wildcopy_candidates.rs`	New benchmark comparing baseline vs candidate kernels with correctness parity checks against production wildcopy.
`BENCHMARKS.md`	Documents the issue #87 research mapping, sample measurements, and follow-up link.
`.github/bench-dashboard/index.html`	Adds snapshot `From`/`To` selectors and adjusts chart sizing to keep legend/toggles visible.

polaz · 2026-04-12T15:53:37Z

@coderabbitai Duplicate review-body findings were addressed in the latest commit as well: zstd/src/decoding/mod.rs now keeps simd_copy private with a bench-gated focused re-export, and wildcopy validation now checks the rounded copy span.

coderabbitai · 2026-04-12T15:53:55Z

Tip

For best results, initiate chat on the files or code changes.

@polaz: Thanks for the follow-up! Both fixes look right:

simd_copy visibility — keeping the module private in decoding/mod.rs and routing the bench-only symbol through a narrower, feature-gated re-export is the cleaner approach; it avoids widening the internal module surface just for bench access.
Rounded-span validation — checking the rounded copy span (i.e. the actual bytes written, including the overshoot) rather than only the first len bytes correctly captures the full invariant that copy_bytes_overshooting guarantees.

Both are good hardening steps.

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:51.957Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test suite for `copy_bytes_overshooting` includes a misaligned extra-capacity case that exercises the rounded-up wildcopy path (where `copy_multiple > copy_at_least`) and validates prefix correctness for that scenario. Do not flag the absence of a misaligned/overshooting extra-capacity sub-case as missing test coverage.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T20:26:15.924Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_covers_all_copy_strategies` derives copy lengths from a helper `active_chunk_size_for_tests()` (not hardcoded values) so that single-chunk, multi-chunk, and sub-chunk (fallback) cases adapt to whichever SIMD backend width is active at runtime (AVX-512/AVX2/SSE2/NEON/scalar). Do not flag hardcoded test lengths as missing coverage for other backends — the test is backend-adaptive by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-08T21:51:56.652Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), the test `copy_bytes_overshooting_preserves_prefix_for_runtime_chunk_lengths` (or equivalent) includes a misaligned extra-capacity case that uses `add(1)`-offset pointers and a `chunk_plus_one` length to exercise the rounded-up wildcopy overshoot path (where `copy_multiple > copy_at_least`) and validates prefix correctness. Do not flag the absence of a misaligned-pointer case in this test — it is present by design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-09T06:17:40.433Z
Learning: In `zstd/src/decoding/ringbuffer.rs` (structured-world/structured-zstd), both `copy_with_checks` and `copy_with_nobranch_check` thread aggregate source/destination capacities (`m1_src_cap`, `m2_src_cap`, `f1_dst_cap`, `f2_dst_cap`) into `simd_copy::copy_bytes_overshooting` so the SIMD eligibility check (`min(src_len, dst_len) >= copy_multiple`) is applied uniformly. Do not flag the capacity parameters in `copy_with_nobranch_check` as unnecessary or inconsistent with `copy_with_checks`.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-06T01:40:24.378Z
Learning: In `zstd/benches/compare_ffi.rs` (structured-world/structured-zstd), Rust FastCOVER trains with the post-finalization content budget in both the `REPORT_DICT_TRAIN` emission path (around lines 208-225) and the Criterion benchmark path (around lines 266-280). Both paths were aligned in commit 8622344. Do not flag these ranges as using inconsistent budget values.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 99
File: zstd/src/encoding/frame_compressor.rs:779-805
Timestamp: 2026-04-11T07:49:03.699Z
Learning: In `zstd/src/encoding/frame_compressor.rs` (structured-world/structured-zstd, PR `#99`), a temporary compatibility guard exists for sub-256-byte compressed payloads: the 1-byte-FCS single-segment form is not yet fully C-FFI compatible for payloads below 256 bytes, and a compat guard was added to handle this case while the path is hardened. Do not flag this guard as unnecessary.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T11:56:17.814Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#102`), after commit 46927db, hash-mix kernel dispatch was moved from the global `HASH_MIX_KERNEL` AtomicU8 hot-path read into a per-matcher instance field `hash_mix_kernel` stored directly in `DfastMatchGenerator` and `RowMatchGenerator`. The hot path no longer performs an atomic load or global branch; kernel selection happens once at matcher initialisation and is subsequently a plain struct-field read. Do not flag the hot path as doing a per-call atomic load — that pattern was superseded by commit 46927db.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-09T15:16:10.624Z
Learning: In structured-world/structured-zstd, the `fuzz_exports` API concern was resolved via "Option B": internal decode/encode helpers remain `pub(crate)` by default, and explicit public shims are exposed only behind a dedicated feature flag (e.g., `fuzz`) in a separate file. Do not flag these internal symbols as needing a wider public API — the feature-gated shim layer is the intentional design.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T11:31:46.105Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#102`), the test-only `with_forced_hash_mix_kernel(...)` helper forces the global `HASH_MIX_KERNEL` atomic to a specific kernel for coverage. Tests that call this helper must acquire a shared mutex lock *before* the `with_forced_hash_mix_kernel` call (not only around the assertion) to prevent cross-test kernel races when `cargo nextest` runs tests in parallel. This pattern was locked in commit 904d2b0. Do not flag the lock acquisition as unnecessarily early.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 0
File: :0-0
Timestamp: 2026-04-11T11:31:54.292Z
Learning: In `zstd/src/encoding/match_generator.rs` (structured-world/structured-zstd, PR `#102`), the test `row_hash_and_row_extracts_high_bits` acquires `HASH_MIX_KERNEL_TEST_LOCK` before any `hash_mix_u64` call (fixed in commit 904d2b0). This lock guards all tests that force a specific `HashMixKernel` via `with_forced_hash_mix_kernel(...)` to prevent cross-test kernel races caused by the global `HASH_MIX_KERNEL` atomic. Do not flag the absence of this lock as missing — it is acquired at the top of each affected test.

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 58
File: .github/workflows/ci.yml:31-32
Timestamp: 2026-04-03T11:24:17.021Z
Learning: In `structured-world/structured-zstd`, the CI Clippy lint job (`.github/workflows/ci.yml`) uses two separate steps: (1) the required gate: `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings`, and (2) a separate bench-internals step: `cargo clippy -p structured-zstd --features hash,std,dict_builder,bench_internals -- -D warnings`. The `bench_internals` feature must NOT replace the primary gate — it runs as an additional step only.

Copilot

Pull request overview

Adds bench-only infrastructure and a focused Criterion microbench to evaluate alternative “wildcopy” kernels for issue #87, plus improves the benchmark dashboard UX by resizing the chart and adding snapshot range filters.

Changes:

Expose a bench-only entrypoint for the production copy_bytes_overshooting implementation (feature-gated by bench_internals).
Add a new Criterion benchmark target (wildcopy_candidates) to compare baseline vs candidate copy kernels and validate byte-exact parity.
Enhance the GH Pages benchmark dashboard with a taller chart and From/To snapshot selectors to filter the visible time window.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`zstd/src/lib.rs`	Re-export the new bench-only copy entrypoint from `structured_zstd::testing` behind `bench_internals`.
`zstd/src/decoding/simd_copy.rs`	Add a bench-only wrapper entrypoint for the production overshooting copy routine.
`zstd/src/decoding/mod.rs`	Re-export the bench-only entrypoint within the decoding module (crate-visible, feature-gated).
`zstd/Cargo.toml`	Register the new `wildcopy_candidates` Criterion bench (requires `bench_internals`).
`zstd/benches/wildcopy_candidates.rs`	New microbench comparing baseline vs candidate kernels with correctness assertions and ISA dispatch.
`BENCHMARKS.md`	Document the issue #87 research mapping, sample results, and provisional recommendation.
`.github/bench-dashboard/index.html`	UI/layout updates plus `From`/`To` range filtering for benchmark snapshots.

Copilot

Pull request overview

This PR enhances the project’s benchmarking and performance-research workflow by (1) adding a new Criterion microbench to evaluate decoder wildcopy copy-kernel candidates for issue #87 research, and (2) improving the benchmark dashboard UX with a selectable snapshot range window.

Changes:

Expose a bench-only shim (structured_zstd::testing::copy_bytes_overshooting_for_bench) for benchmarking the production wildcopy implementation under bench_internals.
Add a new Criterion benchmark target wildcopy_candidates (feature-gated) to compare candidate copy kernels vs the production overshooting wildcopy behavior.
Improve the gh-pages benchmark dashboard layout and add From/To snapshot selectors to filter the displayed time window.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`zstd/src/lib.rs`	Adds a `bench_internals`-gated public testing facade exposing a wildcopy bench shim.
`zstd/src/decoding/simd_copy.rs`	Adds a `bench_internals` entrypoint delegating to the production overshooting copy implementation.
`zstd/src/decoding/mod.rs`	Re-exports the bench-only copy entrypoint (crate-visible) from the decoding module.
`zstd/Cargo.toml`	Registers the new `wildcopy_candidates` Criterion bench gated behind `bench_internals`.
`zstd/benches/wildcopy_candidates.rs`	Introduces the x86/x86_64 microbench comparing production wildcopy vs baseline/candidate kernels.
`BENCHMARKS.md`	Documents issue #87 research mapping, local microbench results, and a provisional recommendation.
`.github/bench-dashboard/index.html`	Updates chart sizing and adds From/To snapshot selectors + filtering logic for the dashboard.

Copilot AI review requested due to automatic review settings April 12, 2026 10:33

Copilot started reviewing on behalf of polaz April 12, 2026 10:33 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

Comment thread zstd/src/decoding/mod.rs

Comment thread zstd/benches/wildcopy_candidates.rs

Comment thread zstd/benches/wildcopy_candidates.rs Outdated

polaz mentioned this pull request Apr 12, 2026

research(decoding): evaluate advanced SIMD wildcopy paths beyond current baseline #87

Closed

4 tasks

coderabbitai Bot reviewed Apr 12, 2026

View reviewed changes

Comment thread zstd/benches/wildcopy_candidates.rs Outdated

Comment thread zstd/src/decoding/mod.rs Outdated

polaz requested a review from Copilot April 12, 2026 11:49

Copilot AI reviewed Apr 12, 2026

View reviewed changes

sw-release-bot Bot reviewed Apr 12, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 12, 2026

View reviewed changes

Comment thread zstd/benches/wildcopy_candidates.rs Outdated

Comment thread zstd/benches/wildcopy_candidates.rs Outdated

polaz requested a review from Copilot April 12, 2026 12:37

Copilot started reviewing on behalf of polaz April 12, 2026 12:37 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

Comment thread zstd/src/decoding/mod.rs Outdated

Comment thread zstd/src/lib.rs Outdated

polaz requested a review from Copilot April 12, 2026 12:49

Copilot started reviewing on behalf of polaz April 12, 2026 12:49 View session

coderabbitai Bot reviewed Apr 12, 2026

View reviewed changes

Comment thread zstd/benches/wildcopy_candidates.rs Outdated

Copilot AI reviewed Apr 12, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 12, 2026

View reviewed changes

Comment thread zstd/benches/wildcopy_candidates.rs Outdated

Comment thread zstd/benches/wildcopy_candidates.rs Outdated

polaz requested a review from Copilot April 12, 2026 13:29

coderabbitai Bot reviewed Apr 12, 2026

View reviewed changes

Copilot started reviewing on behalf of polaz April 12, 2026 13:29 View session

coderabbitai Bot reviewed Apr 12, 2026

View reviewed changes

Comment thread zstd/benches/wildcopy_candidates.rs Outdated

Copilot AI reviewed Apr 12, 2026

View reviewed changes

Comment thread zstd/src/decoding/simd_copy.rs

Comment thread zstd/benches/wildcopy_candidates.rs

Comment thread zstd/benches/wildcopy_candidates.rs

polaz requested a review from Copilot April 12, 2026 14:41

Copilot started reviewing on behalf of polaz April 12, 2026 14:42 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

Comment thread zstd/benches/wildcopy_candidates.rs

Comment thread zstd/benches/wildcopy_candidates.rs Outdated

feat(bench-dashboard): add snapshot range filter and taller chart

abffecf

polaz added 8 commits April 12, 2026 17:54

perf(bench): add wildcopy candidate benchmark harness

9de75f3

fix(bench): address review feedback and CI lint

af2d144

- keep bench internals exposed only via structured_zstd::testing - select candidate kernel once and harden benchmark against copy elision - document #87 decision mapping and link follow-up issue #108

fix(bench): align candidate benchmark paths

4ab88c1

- benchmark baseline and candidate with symmetric overshoot policy - add non-aligned length coverage and tighten bench shim visibility

fix(bench): align baseline dispatcher with production

d4eeeae

- baseline selector now mirrors avx512/avx2/sse2/scalar thresholds - include sub-64 lengths and keep bench shim only under testing facade

refactor(bench): return BenchPath from candidate selector

e2699c6

fix(bench): select candidate by len and gate x86 bench

d58a610

fix(bench): align candidate overshoot with baseline policy

44e8752

fix(bench): add explicit unsafe blocks in wildcopy bench

2b871a7

polaz force-pushed the perf/#87-wildcopy-research branch from 17207ec to 2b871a7 Compare April 12, 2026 15:05

coderabbitai Bot reviewed Apr 12, 2026

View reviewed changes

Comment thread BENCHMARKS.md Outdated

Comment thread zstd/benches/wildcopy_candidates.rs Outdated

polaz requested a review from Copilot April 12, 2026 15:28

Copilot started reviewing on behalf of polaz April 12, 2026 15:29 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

fix(bench): address wildcard review findings

5779bd2

polaz requested a review from Copilot April 12, 2026 15:55

Copilot started reviewing on behalf of polaz April 12, 2026 15:56 View session

fix(bench): keep bench shim public without re-export leak

7fb9592

Copilot AI reviewed Apr 12, 2026

View reviewed changes

polaz requested a review from Copilot April 12, 2026 16:10

Copilot started reviewing on behalf of polaz April 12, 2026 16:11 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

Comment thread zstd/src/decoding/simd_copy.rs

fix(bench): narrow bench shim visibility in simd copy

70e7a5c

polaz merged commit fe27fc6 into main Apr 12, 2026
14 checks passed

polaz deleted the perf/#87-wildcopy-research branch April 12, 2026 18:11

sw-release-bot Bot mentioned this pull request Apr 12, 2026

chore: release v0.0.19 #109

Merged

Conversation

polaz commented Apr 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issue linkage

Bench findings (local)

Validation

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 12, 2026

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

sw-release-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

polaz commented Apr 12, 2026

Uh oh!

coderabbitai Bot commented Apr 12, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

polaz commented Apr 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 12, 2026 •

edited

Loading

sw-release-bot Bot left a comment •

edited

Loading