Skip to content

perf(decoding): branchless bitstream reader with mask table and BMI2 support#58

Merged
polaz merged 12 commits intomainfrom
feat/#13-perf-bitstream-reader--branchless-operations-and-b
Apr 3, 2026
Merged

perf(decoding): branchless bitstream reader with mask table and BMI2 support#58
polaz merged 12 commits intomainfrom
feat/#13-perf-bitstream-reader--branchless-operations-and-b

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Apr 3, 2026

Summary

Optimizes BitReaderReversed — the foundation of all entropy decoding (Huffman + FSE) — with branchless operations and a pre-computed mask table.

  • Pre-computed BIT_MASK[65] lookup table replaces per-call (1u64 << n) - 1 computation, eliminating a shift + subtract on every symbol decode
  • BMI2 fast path (_bzhi_u64) behind #[cfg(target_feature = "bmi2")] — single-cycle bit masking on supported x86-64 CPUs, with automatic fallback to the mask table elsewhere
  • Branchless peek_bits() — removed the if n == 0 guard using wrapping_shr + zero mask, eliminating a branch from the hottest inner loop
  • Branchless peek_bits_triple() — same pattern applied to the triple-read variant used in FSE sequence decoding; inner field extraction uses wrapping_shr for shift-by-64 safety
  • Conditional refill in get_bits_triple() — replaced unconditional refill() with ensure_bits(), skipping redundant work when the bit container already holds enough bits
  • bench_internals feature gate — benchmark re-exports are gated behind a non-default feature so normal builds keep BitReaderReversed fully crate-private
  • Bitstream microbenchmark (criterion, requires --features bench_internals) covering sequential reads, FSE-pattern triple reads, and ensure+unchecked batched reads

Bounds checking strategy

mask_lower_bits uses debug_assert!(n <= 64)not a release-mode assert! — because it sits on the hot decode path (called per symbol). In release builds:

  • Non-BMI2 path: out-of-range n panics naturally via BIT_MASK[n] index-out-of-bounds
  • BMI2 path: _bzhi_u64 would silently truncate, but this path is only reachable with compile-time -C target-feature=+bmi2, and all callers guarantee n <= 56 (max zstd symbol width)

This is a deliberate performance vs. safety tradeoff documented in the function's doc comment.

Test plan

  • All 203 existing tests pass (unit + integration + cross-validation)
  • 9 new unit tests: mask table correctness (including n=64 boundary), mask_lower_bits edge cases, peek_bits(0) invariant across all states (including bits_consumed == 0), get_bits_triple equivalence (refill, no-refill, and mixed zero-width paths)
  • debug_assert guards on branchless operations: n <= 64 bounds, bits_consumed + n <= 64 contract, sum == n1 + n2 + n3 consistency
  • Clippy clean (0 warnings, including bench_internals feature with --benches)
  • Doc tests pass
  • Benchmark compiles and runs (cargo bench --bench bitstream --features bench_internals -- --test)
  • No regression on non-BMI2 platforms (BMI2 path is compile-time gated)

Closes #13

Summary by CodeRabbit

  • Chores

    • Optimized bit-reading performance with internal improvements.
  • Tests

    • Added comprehensive internal benchmarking infrastructure for performance testing.

…support

- Add pre-computed BIT_MASK[65] lookup table replacing per-call
  (1u64 << n) - 1 computation on the hot decode path
- Add mask_lower_bits() helper with #[cfg(target_feature = "bmi2")]
  fast path using _bzhi_u64 single-cycle instruction
- Make peek_bits() branchless: remove n==0 guard, use wrapping_shr
  to handle the bits_consumed=0 edge case without branching
- Make peek_bits_triple() branchless: same wrapping_shr pattern,
  mask table for all three sub-values
- Fix get_bits_triple() to use conditional ensure_bits() instead of
  unconditional refill(), avoiding redundant work when the bit
  container already holds enough bits
- Add bitstream microbenchmark (criterion) covering sequential reads,
  FSE-pattern triple reads, and ensure+unchecked batched reads
- Add unit tests for mask table correctness, mask_lower_bits edge
  cases, peek_bits(0) invariant, and get_bits_triple equivalence

Closes #13
Copilot AI review requested due to automatic review settings April 3, 2026 09:22
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 3, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a384aa04-9b50-4709-b7dc-b30a7b8dd563

📥 Commits

Reviewing files that changed from the base of the PR and between 249fa62 and 9fc72f5.

📒 Files selected for processing (1)
  • zstd/src/bit_io/bit_reader_reverse.rs

📝 Walkthrough

Walkthrough

Adds a feature-gated benchmark target and testing re-export, introduces a precomputed BIT_MASK with an optional BMI2 fast path, refactors bit-extraction to use wrapping shifts and a mask helper, adjusts triple-read refill behavior, expands unit tests, and runs Clippy with the feature enabled in CI.

Changes

Cohort / File(s) Summary
Cargo & bench config
zstd/Cargo.toml
Adds bench_internals = [] feature and a [[bench]] name = "bitstream" target gated on that feature.
Benchmarks
zstd/benches/bitstream.rs
New Criterion microbench exercising BitReaderReversed hot paths (get_bits, get_bits_triple, ensure_bits + unchecked reads) with deterministic input.
Bit reader internals
zstd/src/bit_io/bit_reader_reverse.rs
Adds private BIT_MASK: [u64;65] and mask_lower_bits() (BMI2 _bzhi_u64 fast path); refactors peek_bits/peek_bits_triple to use wrapping shifts + mask helper; get_bits_triple fast path uses ensure_bits(sum); adds unit tests for masks and edge cases.
Module exports / test access
zstd/src/bit_io/mod.rs, zstd/src/lib.rs
Narrows re-export to BitReaderReversed and conditionally exposes it via a feature-gated #[doc(hidden)] pub mod testing when bench_internals is enabled so benchmarks can access internals.
CI linting
.github/workflows/ci.yml
Adds a Clippy step that runs with the bench_internals feature and --benches, treating warnings as errors.

Sequence Diagram(s)

sequenceDiagram
    participant Bench as Criterion bench
    participant Testing as crate::testing
    participant BitReader as BitReaderReversed
    participant Mask as mask_lower_bits

    Bench->>Testing: invoke benchmark (feature-gated)
    Testing->>BitReader: construct BitReaderReversed with buffer
    loop per iteration
        BitReader->>BitReader: ensure_bits / peek_bits / get_bits_triple / get_bits_unchecked
        BitReader->>Mask: request lower-n-bit mask
        Mask-->>BitReader: return masked value (rgba(0,128,0,0.5) BMI2 or rgba(0,0,128,0.5) table)
        BitReader-->>Testing: extracted values
    end
    Testing-->>Bench: return accumulated results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • structured-world/structured-zstd#54 — Modifies BitReaderReversed internals and visibility; strong API overlap.
  • structured-world/structured-zstd#55 — Alters bitstream internals and batched/unchecked read patterns exercised by the new benchmarks.

Poem

🐰 I hopped through bytes both near and far,
I masked and shifted like a tiny star,
BMI2 hummed as bits slipped free,
Benches counted fast — one, two, three,
Hop, hop, hooray — a carrot for me!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main optimization: branchless bitstream reader with pre-computed mask table and BMI2 support, which aligns with the changeset's primary focus on BitReaderReversed performance improvements.
Linked Issues check ✅ Passed The PR fully addresses all coding requirements from issue #13: branchless peek/get operations via wrapping shifts and zero masks, pre-computed BIT_MASK[65] lookup table, BMI2 fast path via _bzhi_u64 with fallback, conditional ensure_bits for bulk consumption, retained CTZ/CLZ usage, compile-time gating for non-BMI2 safety, and Criterion benchmarks for sequential/triple/batched reads.
Out of Scope Changes check ✅ Passed All changes are in-scope: bit_reader_reverse.rs optimizations, bench_internals feature gating, Criterion benchmark file, CI Clippy additions, and module re-exports align with issue #13 objectives; no unrelated refactoring or cross-cutting changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#13-perf-bitstream-reader--branchless-operations-and-b

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/benches/bitstream.rs`:
- Around line 6-7: Reorder the import items in the use statement so rustfmt
passes: change the criterion import list in bitstream.rs (the use
criterion::{...} line) to list items in the expected alphabetical/ASCII order —
put Criterion and Throughput first, then criterion_group and criterion_main
(i.e., Criterion, Throughput, criterion_group, criterion_main) while keeping the
separate use std::hint::black_box; line as-is.

In `@zstd/src/bit_io/bit_reader_reverse.rs`:
- Around line 26-36: mask_lower_bits can index BIT_MASK out of bounds if called
with n > 64; add a debug-only check to ensure n <= 64 (e.g., debug_assert!(n <=
64)) at the top of the function so misuse is caught in debug builds, and keep
the existing BMI2 path and fallback that use BIT_MASK unchanged; reference the
function name mask_lower_bits and the BIT_MASK constant when adding the
assertion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ac72feb9-0a33-4f25-bdba-1736ef796d9e

📥 Commits

Reviewing files that changed from the base of the PR and between 7115945 and 9973f72.

📒 Files selected for processing (5)
  • zstd/Cargo.toml
  • zstd/benches/bitstream.rs
  • zstd/src/bit_io/bit_reader_reverse.rs
  • zstd/src/bit_io/mod.rs
  • zstd/src/lib.rs

Comment thread zstd/benches/bitstream.rs Outdated
Comment thread zstd/src/bit_io/bit_reader_reverse.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes the reversed bitstream reader (BitReaderReversed) used across entropy decoding (Huffman/FSE) by reducing branches and improving bit-masking efficiency, and adds a targeted microbenchmark to measure the hot-path operations.

Changes:

  • Add a precomputed BIT_MASK table plus an x86-64 BMI2 (bzhi) fast path for masking lower bits.
  • Make peek_bits / peek_bits_triple branchless and avoid unconditional refills in get_bits_triple by using ensure_bits.
  • Expose BitReaderReversed for benchmarks and add a new Criterion benchmark (benches/bitstream.rs).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
zstd/src/lib.rs Exposes bit_io and adds a testing module to re-export BitReaderReversed for benchmarks.
zstd/src/bit_io/mod.rs Changes re-export visibility of reverse bit reader items.
zstd/src/bit_io/bit_reader_reverse.rs Implements mask table + BMI2 path; makes peek operations branchless; updates triple-read refill strategy; adds unit tests.
zstd/Cargo.toml Registers the new bitstream benchmark target.
zstd/benches/bitstream.rs Adds Criterion microbench for sequential, triple, and ensure+unchecked read patterns.

Comment thread zstd/src/lib.rs Outdated
Comment thread zstd/src/lib.rs Outdated
Comment thread zstd/src/bit_io/mod.rs Outdated
Comment thread zstd/src/bit_io/bit_reader_reverse.rs
Comment thread zstd/src/bit_io/bit_reader_reverse.rs
Comment thread zstd/benches/bitstream.rs Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

- Revert `bit_io` to private module — only `BitReaderReversed` is
  re-exported through `testing` for benchmarks, nothing else leaks
- Add `debug_assert!(n <= 64)` to `mask_lower_bits` for consistent
  bounds checking across BMI2 and fallback paths
- Add `debug_assert!(bits_consumed + n <= 64)` to `peek_bits` and
  `peek_bits_triple` so invalid caller state is caught in debug
  builds despite the branchless wrapping arithmetic
- Switch benchmark throughput from `Throughput::Bytes` to
  `Throughput::Elements` to accurately report actual reads per
  iteration rather than overstating by counting unread trailing bits
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comment thread zstd/src/lib.rs
Comment thread zstd/src/bit_io/mod.rs Outdated
Comment thread zstd/src/bit_io/bit_reader_reverse.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/bit_io/bit_reader_reverse.rs`:
- Around line 365-385: Extend the test to exercise the no-refill fast path by
priming both readers so ensure_bits(sum) already has enough cached bits: create
the two BitReaderReversed instances as before, call a small get_bits (e.g.
get_bits(1)) on each to force an initial refill/primal state, then perform the
triple read (get_bits_triple(8,9,9)) on one and three individual get_bits(8),
get_bits(9), get_bits(9) on the other and assert equality and bits_remaining()
as before; this ensures the code path in ensure_bits that skips refill is
covered.

In `@zstd/src/lib.rs`:
- Around line 69-75: The testing module (pub mod testing) currently exports
BitReaderReversed publicly in all builds; change it to be compiled only when a
non-default bench feature is enabled (e.g., add #[cfg(feature = "bench")] on the
module) so downstream crates can't reach
structured_zstd::testing::BitReaderReversed in normal builds; update
zstd/Cargo.toml to declare a non-default "bench" feature and make the bitstream
bench opt into that feature (enable the feature for the bench target) so
Criterion benches can still access BitReaderReversed and its get_bits_unchecked
helpers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c8d64fc4-27fd-4163-b833-3edd3c29e359

📥 Commits

Reviewing files that changed from the base of the PR and between 9973f72 and 1207792.

📒 Files selected for processing (4)
  • zstd/benches/bitstream.rs
  • zstd/src/bit_io/bit_reader_reverse.rs
  • zstd/src/bit_io/mod.rs
  • zstd/src/lib.rs

Comment thread zstd/src/bit_io/bit_reader_reverse.rs
Comment thread zstd/src/lib.rs
polaz added 2 commits April 3, 2026 12:45
- Add `bench_internals` feature flag (default off) to Cargo.toml
- Gate `pub mod testing` and `pub use BitReaderReversed` behind
  `#[cfg(feature = "bench_internals")]` so normal builds keep the
  type fully crate-private
- Add `required-features = ["bench_internals"]` to the bitstream
  bench target
- Add `debug_assert_eq!(sum, n1 + n2 + n3)` in `peek_bits_triple`
  to catch mismatched width arguments in debug builds
Prime both readers with 8 bits so the subsequent triple read (26 bits)
fits within the already-loaded container. This exercises the
ensure_bits() skip-refill path introduced by the conditional refill
change.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 3, 2026

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{}

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
zstd/Cargo.toml (1)

40-59: 🧹 Nitpick | 🔵 Trivial

Add a CI compile check for the feature-gated bench.

required-features = ["bench_internals"] keeps bitstream out of the default build matrix, so this target can silently rot unless CI compiles it explicitly with the feature enabled.

As per coding guidelines, code must pass cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings, which does not exercise the new bench_internals bench path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/Cargo.toml` around lines 40 - 59, CI doesn't currently compile the
feature-gated bench "bitstream" (it has required-features =
["bench_internals"]), so add a CI job/step that compiles the crate with that
feature enabled; update your workflow to run a build/lint for the package
(structured-zstd) with --features bench_internals (for example extend the
existing cargo clippy/cargo build invocation to include --features
"hash,std,dict_builder,bench_internals" or add a separate matrix entry) so the
bench_internals path is compiled by CI and won't rot.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/bit_io/bit_reader_reverse.rs`:
- Around line 353-372: The test doesn't actually hit the bits_consumed == 0
branch; force that state before the final assert by explicitly setting the
reader into the exhausted/post-refill state. In the test function
peek_bits_zero_is_always_zero, after the loop that calls br.get_bits(8) set
br.bits_consumed = 0 (or otherwise drive the reader to bits_consumed == 0) and
then assert_eq!(br.peek_bits(0), 0); this ensures
BitReaderReversed::peek_bits(0) exercises the shift-by-64 edge.

---

Outside diff comments:
In `@zstd/Cargo.toml`:
- Around line 40-59: CI doesn't currently compile the feature-gated bench
"bitstream" (it has required-features = ["bench_internals"]), so add a CI
job/step that compiles the crate with that feature enabled; update your workflow
to run a build/lint for the package (structured-zstd) with --features
bench_internals (for example extend the existing cargo clippy/cargo build
invocation to include --features "hash,std,dict_builder,bench_internals" or add
a separate matrix entry) so the bench_internals path is compiled by CI and won't
rot.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 53a5d25d-90f7-4946-b208-18e1b6df01a6

📥 Commits

Reviewing files that changed from the base of the PR and between 1207792 and 50de8f2.

📒 Files selected for processing (4)
  • zstd/Cargo.toml
  • zstd/src/bit_io/bit_reader_reverse.rs
  • zstd/src/bit_io/mod.rs
  • zstd/src/lib.rs

Comment thread zstd/src/bit_io/bit_reader_reverse.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread zstd/src/bit_io/bit_reader_reverse.rs
Copy link
Copy Markdown

@sw-release-bot sw-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'structured-zstd vs C FFI'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite Current: 9fc72f5 Previous: 7115945 Ratio
compress/fastest/small-1k-random/matrix/c_ffi 0.005 ms 0.004 ms 1.25
compress/best/small-1k-random/matrix/c_ffi 0.495 ms 0.32 ms 1.55
compress/better/small-10k-random/matrix/c_ffi 0.123 ms 0.096 ms 1.28
compress/best/small-10k-random/matrix/c_ffi 0.457 ms 0.296 ms 1.54
compress/better/decodecorpus-z000033/matrix/pure_rust 75.718 ms 57.701 ms 1.31
compress/best/decodecorpus-z000033/matrix/pure_rust 99.245 ms 63.85 ms 1.55
compress/best/decodecorpus-z000033/matrix/c_ffi 22.365 ms 18.702 ms 1.20
compress/fastest/high-entropy-1m/matrix/c_ffi 0.604 ms 0.3 ms 2.01
compress/default/high-entropy-1m/matrix/c_ffi 0.628 ms 0.339 ms 1.85
compress/better/high-entropy-1m/matrix/pure_rust 90.001 ms 70.105 ms 1.28
compress/better/high-entropy-1m/matrix/c_ffi 0.97 ms 0.651 ms 1.49
compress/best/high-entropy-1m/matrix/pure_rust 114.138 ms 60.547 ms 1.89
compress/best/high-entropy-1m/matrix/c_ffi 1.892 ms 1.021 ms 1.85
compress/default/low-entropy-1m/matrix/pure_rust 11.138 ms 9.02 ms 1.23
compress/best/low-entropy-1m/matrix/c_ffi 1.77 ms 1.171 ms 1.51
decompress/fastest/high-entropy-1m/c_stream/matrix/c_ffi 0.038 ms 0.026 ms 1.46
decompress/default/high-entropy-1m/c_stream/matrix/c_ffi 0.037 ms 0.026 ms 1.42
decompress/best/high-entropy-1m/c_stream/matrix/c_ffi 0.038 ms 0.026 ms 1.46

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

- Set bits_consumed = 0 directly instead of looping via get_bits
- Document mask_lower_bits caller contract (n <= 56)
- Enable bench_internals feature in CI clippy step
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

zstd/src/bit_io/bit_reader_reverse.rs:248

  • get_bits_triple computes sum as n1 + n2 + n3 in u8. In release builds, u8 addition wraps on overflow, so extreme/misused inputs could make sum appear small (and <= 56) even when the true total exceeds 56, sending execution down the fast path with an incorrect sum and potentially invalid shifts/reads. To make this robust (even under bench_internals where this is publicly reachable), compute the sum in a wider type (e.g. u16) and/or use checked/saturating addition with an explicit bounds check before taking the fast path.
    pub fn get_bits_triple(&mut self, n1: u8, n2: u8, n3: u8) -> (u64, u64, u64) {
        let sum = n1 + n2 + n3;
        if sum <= 56 {
            self.ensure_bits(sum);

            let triple = self.peek_bits_triple(sum, n1, n2, n3);
            self.consume(sum);
            return triple;

Comment thread zstd/src/bit_io/bit_reader_reverse.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 31-32: The CI Clippy step currently replaces the mandated lint
command with a bench-only feature set; restore the original Clippy gate by
keeping the step named "Clippy" (or the command invoked for Clippy) to run:
cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings,
and if you still want to lint the bench-only path add a new separate workflow
step (e.g., "Clippy (bench_internals)") that runs cargo clippy -p
structured-zstd --features hash,std,dict_builder,bench_internals -- -D warnings
so the primary gate enforces the standard feature set while bench coverage is
checked separately.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ee273291-048c-4638-a73c-55fd6ef04d9c

📥 Commits

Reviewing files that changed from the base of the PR and between 1207792 and cdc7b77.

📒 Files selected for processing (5)
  • .github/workflows/ci.yml
  • zstd/Cargo.toml
  • zstd/src/bit_io/bit_reader_reverse.rs
  • zstd/src/bit_io/mod.rs
  • zstd/src/lib.rs

Comment thread .github/workflows/ci.yml Outdated
… CI clippy

- Replace debug_assert!(n <= 64) with assert! for consistent panic
  behavior across BMI2 and non-BMI2 targets in release builds
- Restore original CI clippy gate (hash,std,dict_builder only)
- Add separate "Clippy (bench_internals)" step for feature-gated code
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/bit_io/bit_reader_reverse.rs`:
- Around line 349-356: Add a test asserting mask_lower_bits behavior when n ==
64 in mask_lower_bits_edge_cases: call super::mask_lower_bits(u64::MAX, 64) and
assert it equals u64::MAX to cover the special BIT_MASK[64]/_bzhi_u64(..., 64)
path; update the mask_lower_bits_edge_cases test to include this boundary case
so the implementation’s special-case for width 64 is explicitly pinned.
- Around line 381-418: Add a mixed zero-width triple test to ensure
get_bits_triple/peek_bits_triple handles zero-width fields correctly: extend the
test get_bits_triple_matches_individual in BitReaderReversed to include a case
like (5, 0, 4) where one of the widths is zero, and verify the triple read
(t1,t2,t3) matches three individual get_bits calls and that bits_remaining()
stays equal; this will exercise the branchless zero-mask behavior in
peek_bits_triple (and peek_bits(0)) rather than only the all-nonzero path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 54573f70-e782-4b8d-94fd-ff114a4ed7d6

📥 Commits

Reviewing files that changed from the base of the PR and between cdc7b77 and 2be24a6.

📒 Files selected for processing (2)
  • .github/workflows/ci.yml
  • zstd/src/bit_io/bit_reader_reverse.rs

Comment thread zstd/src/bit_io/bit_reader_reverse.rs
Comment thread zstd/src/bit_io/bit_reader_reverse.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread zstd/src/bit_io/bit_reader_reverse.rs
Comment thread zstd/src/bit_io/bit_reader_reverse.rs
…2 dead_code

- Revert assert! back to debug_assert! — callers guarantee n <= 56,
  and the BIT_MASK table panics on OOB in release anyway. The assert!
  added a branch to every peek_bits call on the hot decode path.
- Add cfg_attr(allow(dead_code)) on BIT_MASK for BMI2 target builds
  where the table is only referenced by tests.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread zstd/src/bit_io/bit_reader_reverse.rs Outdated
Comment thread .github/workflows/ci.yml Outdated
…in CI

- Add doc comment specifying sum must equal n1+n2+n3
- Add --benches flag to CI bench_internals clippy step so the
  bitstream benchmark is actually compiled and linted
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment thread zstd/src/bit_io/bit_reader_reverse.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread zstd/src/bit_io/bit_reader_reverse.rs
Comment thread zstd/src/bit_io/bit_reader_reverse.rs Outdated
n1 + n2 + n3 in u8 can wrap in release builds (debug builds panic).
Use u16 for addition in get_bits_triple and peek_bits_triple's
debug_assert_eq, cast to u8 only after confirming sum <= 56.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: bitstream reader — branchless operations and bulk reload

2 participants