Skip to content

perf(decoding): integrate AVX2 unroll2 wildcopy candidate #108

@polaz

Description

@polaz

Context

Issue #87 completed the research stage for decode wildcopy candidates. Local Criterion microbench shows consistent speed gains for AVX2 unroll2 candidate vs current production wildcopy path.

Observed local samples (cargo bench --bench wildcopy_candidates -p structured-zstd --features bench_internals -- --output-format bencher):

  • 64B: 3ns -> 2ns
  • 256B: 7ns -> 4ns
  • 1024B: 28ns -> 14ns
  • 4096B: 94ns -> 58ns
  • 16384B: 347ns -> 268ns
  • 65536B: 1368ns -> 1121ns

Goal

Integrate AVX2 unroll2 candidate into production decode wildcopy strategy without interoperability or correctness regressions.

Implementation plan

  1. Wire AVX2 candidate into copy_strategy() for x86/x86_64 runtime-dispatched path (std + no_std configs where applicable).
  2. Keep donor-compatible wildcopy semantics (overshoot contract and copy safety invariants).
  3. Add regression tests for boundary/tail behavior and parity vs current implementation.
  4. Run full benchmark matrix (compare_ffi) and candidate microbench to validate throughput/ratio impact.
  5. Document go/no-go decision in BENCHMARKS with benchmark evidence and donor path reference.

Acceptance criteria

  • Runtime decoder uses integrated AVX2 unroll2 candidate on AVX2-capable CPUs.
  • No correctness regressions (cargo nextest run --workspace, cargo test --doc --workspace, cross-validation green).
  • No interoperability regressions with C zstd streams.
  • Benchmark evidence recorded and shows reproducible gain on target workloads.

Estimate

1d 6h

  • 1d: implementation + tests
  • 4h: benchmark runs + analysis
  • 2h: documentation and PR polishing

Related: #87

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2-mediumMedium priority — important improvementenhancementNew feature or requestperformancePerformance optimization

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions