perf(decoding): branchless bitstream reader with mask table and BMI2 support by polaz · Pull Request #58 · structured-world/structured-zstd

polaz · 2026-04-03T09:22:25Z

Summary

Optimizes BitReaderReversed — the foundation of all entropy decoding (Huffman + FSE) — with branchless operations and a pre-computed mask table.

Pre-computed BIT_MASK[65] lookup table replaces per-call (1u64 << n) - 1 computation, eliminating a shift + subtract on every symbol decode
BMI2 fast path (_bzhi_u64) behind #[cfg(target_feature = "bmi2")] — single-cycle bit masking on supported x86-64 CPUs, with automatic fallback to the mask table elsewhere
Branchless peek_bits() — removed the if n == 0 guard using wrapping_shr + zero mask, eliminating a branch from the hottest inner loop
Branchless peek_bits_triple() — same pattern applied to the triple-read variant used in FSE sequence decoding; inner field extraction uses wrapping_shr for shift-by-64 safety
Conditional refill in get_bits_triple() — replaced unconditional refill() with ensure_bits(), skipping redundant work when the bit container already holds enough bits
bench_internals feature gate — benchmark re-exports are gated behind a non-default feature so normal builds keep BitReaderReversed fully crate-private
Bitstream microbenchmark (criterion, requires --features bench_internals) covering sequential reads, FSE-pattern triple reads, and ensure+unchecked batched reads

Bounds checking strategy

mask_lower_bits uses debug_assert!(n <= 64) — not a release-mode assert! — because it sits on the hot decode path (called per symbol). In release builds:

Non-BMI2 path: out-of-range n panics naturally via BIT_MASK[n] index-out-of-bounds
BMI2 path: _bzhi_u64 would silently truncate, but this path is only reachable with compile-time -C target-feature=+bmi2, and all callers guarantee n <= 56 (max zstd symbol width)

This is a deliberate performance vs. safety tradeoff documented in the function's doc comment.

Test plan

All 203 existing tests pass (unit + integration + cross-validation)
9 new unit tests: mask table correctness (including n=64 boundary), mask_lower_bits edge cases, peek_bits(0) invariant across all states (including bits_consumed == 0), get_bits_triple equivalence (refill, no-refill, and mixed zero-width paths)
debug_assert guards on branchless operations: n <= 64 bounds, bits_consumed + n <= 64 contract, sum == n1 + n2 + n3 consistency
Clippy clean (0 warnings, including bench_internals feature with --benches)
Doc tests pass
Benchmark compiles and runs (cargo bench --bench bitstream --features bench_internals -- --test)
No regression on non-BMI2 platforms (BMI2 path is compile-time gated)

Closes #13

Summary by CodeRabbit

Chores
- Optimized bit-reading performance with internal improvements.
Tests
- Added comprehensive internal benchmarking infrastructure for performance testing.

…support - Add pre-computed BIT_MASK[65] lookup table replacing per-call (1u64 << n) - 1 computation on the hot decode path - Add mask_lower_bits() helper with #[cfg(target_feature = "bmi2")] fast path using _bzhi_u64 single-cycle instruction - Make peek_bits() branchless: remove n==0 guard, use wrapping_shr to handle the bits_consumed=0 edge case without branching - Make peek_bits_triple() branchless: same wrapping_shr pattern, mask table for all three sub-values - Fix get_bits_triple() to use conditional ensure_bits() instead of unconditional refill(), avoiding redundant work when the bit container already holds enough bits - Add bitstream microbenchmark (criterion) covering sequential reads, FSE-pattern triple reads, and ensure+unchecked batched reads - Add unit tests for mask table correctness, mask_lower_bits edge cases, peek_bits(0) invariant, and get_bits_triple equivalence Closes #13

coderabbitai · 2026-04-03T09:23:10Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a384aa04-9b50-4709-b7dc-b30a7b8dd563

📥 Commits

Reviewing files that changed from the base of the PR and between 249fa62 and 9fc72f5.

📒 Files selected for processing (1)

zstd/src/bit_io/bit_reader_reverse.rs

📝 Walkthrough

Walkthrough

Adds a feature-gated benchmark target and testing re-export, introduces a precomputed BIT_MASK with an optional BMI2 fast path, refactors bit-extraction to use wrapping shifts and a mask helper, adjusts triple-read refill behavior, expands unit tests, and runs Clippy with the feature enabled in CI.

Changes

Cohort / File(s)	Summary
Cargo & bench config `zstd/Cargo.toml`	Adds `bench_internals = []` feature and a `[[bench]] name = "bitstream"` target gated on that feature.
Benchmarks `zstd/benches/bitstream.rs`	New Criterion microbench exercising `BitReaderReversed` hot paths (`get_bits`, `get_bits_triple`, `ensure_bits` + unchecked reads) with deterministic input.
Bit reader internals `zstd/src/bit_io/bit_reader_reverse.rs`	Adds private `BIT_MASK: [u64;65]` and `mask_lower_bits()` (BMI2 `_bzhi_u64` fast path); refactors `peek_bits`/`peek_bits_triple` to use wrapping shifts + mask helper; `get_bits_triple` fast path uses `ensure_bits(sum)`; adds unit tests for masks and edge cases.
Module exports / test access `zstd/src/bit_io/mod.rs`, `zstd/src/lib.rs`	Narrows re-export to `BitReaderReversed` and conditionally exposes it via a feature-gated `#[doc(hidden)] pub mod testing` when `bench_internals` is enabled so benchmarks can access internals.
CI linting `.github/workflows/ci.yml`	Adds a Clippy step that runs with the `bench_internals` feature and `--benches`, treating warnings as errors.

Sequence Diagram(s)

sequenceDiagram
    participant Bench as Criterion bench
    participant Testing as crate::testing
    participant BitReader as BitReaderReversed
    participant Mask as mask_lower_bits

    Bench->>Testing: invoke benchmark (feature-gated)
    Testing->>BitReader: construct BitReaderReversed with buffer
    loop per iteration
        BitReader->>BitReader: ensure_bits / peek_bits / get_bits_triple / get_bits_unchecked
        BitReader->>Mask: request lower-n-bit mask
        Mask-->>BitReader: return masked value (rgba(0,128,0,0.5) BMI2 or rgba(0,0,128,0.5) table)
        BitReader-->>Testing: extracted values
    end
    Testing-->>Bench: return accumulated results

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

structured-world/structured-zstd#54 — Modifies BitReaderReversed internals and visibility; strong API overlap.
structured-world/structured-zstd#55 — Alters bitstream internals and batched/unchecked read patterns exercised by the new benchmarks.

Poem

🐰 I hopped through bytes both near and far,
I masked and shifted like a tiny star,
BMI2 hummed as bits slipped free,
Benches counted fast — one, two, three,
Hop, hop, hooray — a carrot for me!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main optimization: branchless bitstream reader with pre-computed mask table and BMI2 support, which aligns with the changeset's primary focus on BitReaderReversed performance improvements.
Linked Issues check	✅ Passed	The PR fully addresses all coding requirements from issue `#13`: branchless peek/get operations via wrapping shifts and zero masks, pre-computed BIT_MASK[65] lookup table, BMI2 fast path via _bzhi_u64 with fallback, conditional ensure_bits for bulk consumption, retained CTZ/CLZ usage, compile-time gating for non-BMI2 safety, and Criterion benchmarks for sequential/triple/batched reads.
Out of Scope Changes check	✅ Passed	All changes are in-scope: bit_reader_reverse.rs optimizations, bench_internals feature gating, Criterion benchmark file, CI Clippy additions, and module re-exports align with issue `#13` objectives; no unrelated refactoring or cross-cutting changes detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/#13-perf-bitstream-reader--branchless-operations-and-b

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/benches/bitstream.rs`:
- Around line 6-7: Reorder the import items in the use statement so rustfmt
passes: change the criterion import list in bitstream.rs (the use
criterion::{...} line) to list items in the expected alphabetical/ASCII order —
put Criterion and Throughput first, then criterion_group and criterion_main
(i.e., Criterion, Throughput, criterion_group, criterion_main) while keeping the
separate use std::hint::black_box; line as-is.

In `@zstd/src/bit_io/bit_reader_reverse.rs`:
- Around line 26-36: mask_lower_bits can index BIT_MASK out of bounds if called
with n > 64; add a debug-only check to ensure n <= 64 (e.g., debug_assert!(n <=
64)) at the top of the function so misuse is caught in debug builds, and keep
the existing BMI2 path and fallback that use BIT_MASK unchanged; reference the
function name mask_lower_bits and the BIT_MASK constant when adding the
assertion.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ac72feb9-0a33-4f25-bdba-1736ef796d9e

📥 Commits

Reviewing files that changed from the base of the PR and between 7115945 and 9973f72.

📒 Files selected for processing (5)

zstd/Cargo.toml
zstd/benches/bitstream.rs
zstd/src/bit_io/bit_reader_reverse.rs
zstd/src/bit_io/mod.rs
zstd/src/lib.rs

Copilot

Pull request overview

Optimizes the reversed bitstream reader (BitReaderReversed) used across entropy decoding (Huffman/FSE) by reducing branches and improving bit-masking efficiency, and adds a targeted microbenchmark to measure the hot-path operations.

Changes:

Add a precomputed BIT_MASK table plus an x86-64 BMI2 (bzhi) fast path for masking lower bits.
Make peek_bits / peek_bits_triple branchless and avoid unconditional refills in get_bits_triple by using ensure_bits.
Expose BitReaderReversed for benchmarks and add a new Criterion benchmark (benches/bitstream.rs).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`zstd/src/lib.rs`	Exposes `bit_io` and adds a `testing` module to re-export `BitReaderReversed` for benchmarks.
`zstd/src/bit_io/mod.rs`	Changes re-export visibility of reverse bit reader items.
`zstd/src/bit_io/bit_reader_reverse.rs`	Implements mask table + BMI2 path; makes peek operations branchless; updates triple-read refill strategy; adds unit tests.
`zstd/Cargo.toml`	Registers the new `bitstream` benchmark target.
`zstd/benches/bitstream.rs`	Adds Criterion microbench for sequential, triple, and ensure+unchecked read patterns.

codecov · 2026-04-03T09:28:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

- Revert `bit_io` to private module — only `BitReaderReversed` is re-exported through `testing` for benchmarks, nothing else leaks - Add `debug_assert!(n <= 64)` to `mask_lower_bits` for consistent bounds checking across BMI2 and fallback paths - Add `debug_assert!(bits_consumed + n <= 64)` to `peek_bits` and `peek_bits_triple` so invalid caller state is caught in debug builds despite the branchless wrapping arithmetic - Switch benchmark throughput from `Throughput::Bytes` to `Throughput::Elements` to accurately report actual reads per iteration rather than overstating by counting unread trailing bits

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/bit_io/bit_reader_reverse.rs`:
- Around line 365-385: Extend the test to exercise the no-refill fast path by
priming both readers so ensure_bits(sum) already has enough cached bits: create
the two BitReaderReversed instances as before, call a small get_bits (e.g.
get_bits(1)) on each to force an initial refill/primal state, then perform the
triple read (get_bits_triple(8,9,9)) on one and three individual get_bits(8),
get_bits(9), get_bits(9) on the other and assert equality and bits_remaining()
as before; this ensures the code path in ensure_bits that skips refill is
covered.

In `@zstd/src/lib.rs`:
- Around line 69-75: The testing module (pub mod testing) currently exports
BitReaderReversed publicly in all builds; change it to be compiled only when a
non-default bench feature is enabled (e.g., add #[cfg(feature = "bench")] on the
module) so downstream crates can't reach
structured_zstd::testing::BitReaderReversed in normal builds; update
zstd/Cargo.toml to declare a non-default "bench" feature and make the bitstream
bench opt into that feature (enable the feature for the bench target) so
Criterion benches can still access BitReaderReversed and its get_bits_unchecked
helpers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c8d64fc4-27fd-4163-b833-3edd3c29e359

📥 Commits

Reviewing files that changed from the base of the PR and between 9973f72 and 1207792.

📒 Files selected for processing (4)

zstd/benches/bitstream.rs
zstd/src/bit_io/bit_reader_reverse.rs
zstd/src/bit_io/mod.rs
zstd/src/lib.rs

- Add `bench_internals` feature flag (default off) to Cargo.toml - Gate `pub mod testing` and `pub use BitReaderReversed` behind `#[cfg(feature = "bench_internals")]` so normal builds keep the type fully crate-private - Add `required-features = ["bench_internals"]` to the bitstream bench target - Add `debug_assert_eq!(sum, n1 + n2 + n3)` in `peek_bits_triple` to catch mismatched width arguments in debug builds

Prime both readers with 8 bits so the subsequent triple read (26 bits) fits within the already-loaded container. This exercises the ensure_bits() skip-refill path introduced by the conditional refill change.

coderabbitai · 2026-04-03T10:00:26Z

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details

{}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

zstd/Cargo.toml (1)
40-59: 🧹 Nitpick | 🔵 Trivial

Add a CI compile check for the feature-gated bench.

required-features = ["bench_internals"] keeps bitstream out of the default build matrix, so this target can silently rot unless CI compiles it explicitly with the feature enabled.

As per coding guidelines, code must pass cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings, which does not exercise the new bench_internals bench path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/Cargo.toml` around lines 40 - 59, CI doesn't currently compile the
feature-gated bench "bitstream" (it has required-features =
["bench_internals"]), so add a CI job/step that compiles the crate with that
feature enabled; update your workflow to run a build/lint for the package
(structured-zstd) with --features bench_internals (for example extend the
existing cargo clippy/cargo build invocation to include --features
"hash,std,dict_builder,bench_internals" or add a separate matrix entry) so the
bench_internals path is compiled by CI and won't rot.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/bit_io/bit_reader_reverse.rs`:
- Around line 353-372: The test doesn't actually hit the bits_consumed == 0
branch; force that state before the final assert by explicitly setting the
reader into the exhausted/post-refill state. In the test function
peek_bits_zero_is_always_zero, after the loop that calls br.get_bits(8) set
br.bits_consumed = 0 (or otherwise drive the reader to bits_consumed == 0) and
then assert_eq!(br.peek_bits(0), 0); this ensures
BitReaderReversed::peek_bits(0) exercises the shift-by-64 edge.

---

Outside diff comments:
In `@zstd/Cargo.toml`:
- Around line 40-59: CI doesn't currently compile the feature-gated bench
"bitstream" (it has required-features = ["bench_internals"]), so add a CI
job/step that compiles the crate with that feature enabled; update your workflow
to run a build/lint for the package (structured-zstd) with --features
bench_internals (for example extend the existing cargo clippy/cargo build
invocation to include --features "hash,std,dict_builder,bench_internals" or add
a separate matrix entry) so the bench_internals path is compiled by CI and won't
rot.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 53a5d25d-90f7-4946-b208-18e1b6df01a6

📥 Commits

Reviewing files that changed from the base of the PR and between 1207792 and 50de8f2.

📒 Files selected for processing (4)

zstd/Cargo.toml
zstd/src/bit_io/bit_reader_reverse.rs
zstd/src/bit_io/mod.rs
zstd/src/lib.rs

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

sw-release-bot

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'structured-zstd vs C FFI'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite	Current: `9fc72f5`	Previous: `7115945`	Ratio
`compress/fastest/small-1k-random/matrix/c_ffi`	`0.005` ms	`0.004` ms	`1.25`
`compress/best/small-1k-random/matrix/c_ffi`	`0.495` ms	`0.32` ms	`1.55`
`compress/better/small-10k-random/matrix/c_ffi`	`0.123` ms	`0.096` ms	`1.28`
`compress/best/small-10k-random/matrix/c_ffi`	`0.457` ms	`0.296` ms	`1.54`
`compress/better/decodecorpus-z000033/matrix/pure_rust`	`75.718` ms	`57.701` ms	`1.31`
`compress/best/decodecorpus-z000033/matrix/pure_rust`	`99.245` ms	`63.85` ms	`1.55`
`compress/best/decodecorpus-z000033/matrix/c_ffi`	`22.365` ms	`18.702` ms	`1.20`
`compress/fastest/high-entropy-1m/matrix/c_ffi`	`0.604` ms	`0.3` ms	`2.01`
`compress/default/high-entropy-1m/matrix/c_ffi`	`0.628` ms	`0.339` ms	`1.85`
`compress/better/high-entropy-1m/matrix/pure_rust`	`90.001` ms	`70.105` ms	`1.28`
`compress/better/high-entropy-1m/matrix/c_ffi`	`0.97` ms	`0.651` ms	`1.49`
`compress/best/high-entropy-1m/matrix/pure_rust`	`114.138` ms	`60.547` ms	`1.89`
`compress/best/high-entropy-1m/matrix/c_ffi`	`1.892` ms	`1.021` ms	`1.85`
`compress/default/low-entropy-1m/matrix/pure_rust`	`11.138` ms	`9.02` ms	`1.23`
`compress/best/low-entropy-1m/matrix/c_ffi`	`1.77` ms	`1.171` ms	`1.51`
`decompress/fastest/high-entropy-1m/c_stream/matrix/c_ffi`	`0.038` ms	`0.026` ms	`1.46`
`decompress/default/high-entropy-1m/c_stream/matrix/c_ffi`	`0.037` ms	`0.026` ms	`1.42`
`decompress/best/high-entropy-1m/c_stream/matrix/c_ffi`	`0.038` ms	`0.026` ms	`1.46`

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

- Set bits_consumed = 0 directly instead of looping via get_bits - Document mask_lower_bits caller contract (n <= 56) - Enable bench_internals feature in CI clippy step

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

zstd/src/bit_io/bit_reader_reverse.rs:248

get_bits_triple computes sum as n1 + n2 + n3 in u8. In release builds, u8 addition wraps on overflow, so extreme/misused inputs could make sum appear small (and <= 56) even when the true total exceeds 56, sending execution down the fast path with an incorrect sum and potentially invalid shifts/reads. To make this robust (even under bench_internals where this is publicly reachable), compute the sum in a wider type (e.g. u16) and/or use checked/saturating addition with an explicit bounds check before taking the fast path.

    pub fn get_bits_triple(&mut self, n1: u8, n2: u8, n3: u8) -> (u64, u64, u64) {
        let sum = n1 + n2 + n3;
        if sum <= 56 {
            self.ensure_bits(sum);

            let triple = self.peek_bits_triple(sum, n1, n2, n3);
            self.consume(sum);
            return triple;

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 31-32: The CI Clippy step currently replaces the mandated lint
command with a bench-only feature set; restore the original Clippy gate by
keeping the step named "Clippy" (or the command invoked for Clippy) to run:
cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings,
and if you still want to lint the bench-only path add a new separate workflow
step (e.g., "Clippy (bench_internals)") that runs cargo clippy -p
structured-zstd --features hash,std,dict_builder,bench_internals -- -D warnings
so the primary gate enforces the standard feature set while bench coverage is
checked separately.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ee273291-048c-4638-a73c-55fd6ef04d9c

📥 Commits

Reviewing files that changed from the base of the PR and between 1207792 and cdc7b77.

📒 Files selected for processing (5)

.github/workflows/ci.yml
zstd/Cargo.toml
zstd/src/bit_io/bit_reader_reverse.rs
zstd/src/bit_io/mod.rs
zstd/src/lib.rs

… CI clippy - Replace debug_assert!(n <= 64) with assert! for consistent panic behavior across BMI2 and non-BMI2 targets in release builds - Restore original CI clippy gate (hash,std,dict_builder only) - Add separate "Clippy (bench_internals)" step for feature-gated code

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/bit_io/bit_reader_reverse.rs`:
- Around line 349-356: Add a test asserting mask_lower_bits behavior when n ==
64 in mask_lower_bits_edge_cases: call super::mask_lower_bits(u64::MAX, 64) and
assert it equals u64::MAX to cover the special BIT_MASK[64]/_bzhi_u64(..., 64)
path; update the mask_lower_bits_edge_cases test to include this boundary case
so the implementation’s special-case for width 64 is explicitly pinned.
- Around line 381-418: Add a mixed zero-width triple test to ensure
get_bits_triple/peek_bits_triple handles zero-width fields correctly: extend the
test get_bits_triple_matches_individual in BitReaderReversed to include a case
like (5, 0, 4) where one of the widths is zero, and verify the triple read
(t1,t2,t3) matches three individual get_bits calls and that bits_remaining()
stays equal; this will exercise the branchless zero-mask behavior in
peek_bits_triple (and peek_bits(0)) rather than only the all-nonzero path.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 54573f70-e782-4b8d-94fd-ff114a4ed7d6

📥 Commits

Reviewing files that changed from the base of the PR and between cdc7b77 and 2be24a6.

📒 Files selected for processing (2)

.github/workflows/ci.yml
zstd/src/bit_io/bit_reader_reverse.rs

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

…2 dead_code - Revert assert! back to debug_assert! — callers guarantee n <= 56, and the BIT_MASK table panics on OOB in release anyway. The assert! added a branch to every peek_bits call on the hot decode path. - Add cfg_attr(allow(dead_code)) on BIT_MASK for BMI2 target builds where the table is only referenced by tests.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

…in CI - Add doc comment specifying sum must equal n1+n2+n3 - Add --benches flag to CI bench_internals clippy step so the bitstream benchmark is actually compiled and linted

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

n1 + n2 + n3 in u8 can wrap in release builds (debug builds panic). Use u16 for addition in get_bits_triple and peek_bits_triple's debug_assert_eq, cast to u8 only after confirming sum <= 56.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings April 3, 2026 09:22

Copilot started reviewing on behalf of polaz April 3, 2026 09:23 View session

style: fix rustfmt import ordering in bitstream bench

4fb61f7

coderabbitai Bot reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/benches/bitstream.rs Outdated

Comment thread zstd/src/bit_io/bit_reader_reverse.rs

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/src/lib.rs Outdated

Comment thread zstd/src/lib.rs Outdated

Comment thread zstd/src/bit_io/mod.rs Outdated

Comment thread zstd/src/bit_io/bit_reader_reverse.rs

Comment thread zstd/src/bit_io/bit_reader_reverse.rs

Comment thread zstd/benches/bitstream.rs Outdated

polaz requested a review from Copilot April 3, 2026 09:34

Copilot started reviewing on behalf of polaz April 3, 2026 09:35 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/src/lib.rs

Comment thread zstd/src/bit_io/mod.rs Outdated

Comment thread zstd/src/bit_io/bit_reader_reverse.rs

coderabbitai Bot reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/src/bit_io/bit_reader_reverse.rs

Comment thread zstd/src/lib.rs

polaz added 2 commits April 3, 2026 12:45

test(bit_io): cover no-refill fast path in get_bits_triple

50de8f2

Prime both readers with 8 bits so the subsequent triple read (26 bits) fits within the already-loaded container. This exercises the ensure_bits() skip-refill path introduced by the conditional refill change.

polaz requested a review from Copilot April 3, 2026 09:57

Copilot started reviewing on behalf of polaz April 3, 2026 09:58 View session

coderabbitai Bot reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/src/bit_io/bit_reader_reverse.rs

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/src/bit_io/bit_reader_reverse.rs

sw-release-bot Bot reviewed Apr 3, 2026

View reviewed changes

fix(bit_io): resolve infinite loop in peek_bits_zero test

cdc7b77

- Set bits_consumed = 0 directly instead of looping via get_bits - Document mask_lower_bits caller contract (n <= 56) - Enable bench_internals feature in CI clippy step

polaz requested a review from Copilot April 3, 2026 10:30

Copilot started reviewing on behalf of polaz April 3, 2026 10:30 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/src/bit_io/bit_reader_reverse.rs Outdated

coderabbitai Bot reviewed Apr 3, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml Outdated

polaz requested a review from Copilot April 3, 2026 11:31

Copilot started reviewing on behalf of polaz April 3, 2026 11:31 View session

coderabbitai Bot reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/src/bit_io/bit_reader_reverse.rs

Comment thread zstd/src/bit_io/bit_reader_reverse.rs

Copilot started reviewing on behalf of polaz April 3, 2026 12:43 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/src/bit_io/bit_reader_reverse.rs

Comment thread zstd/src/bit_io/bit_reader_reverse.rs

polaz requested a review from Copilot April 3, 2026 12:59

Copilot started reviewing on behalf of polaz April 3, 2026 12:59 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/src/bit_io/bit_reader_reverse.rs Outdated

Comment thread .github/workflows/ci.yml Outdated

fix(bit_io): document peek_bits_triple sum contract and lint benches …

249fa62

…in CI - Add doc comment specifying sum must equal n1+n2+n3 - Add --benches flag to CI bench_internals clippy step so the bitstream benchmark is actually compiled and linted

polaz requested a review from Copilot April 3, 2026 13:35

Copilot started reviewing on behalf of polaz April 3, 2026 13:36 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/src/bit_io/bit_reader_reverse.rs

polaz requested a review from Copilot April 3, 2026 15:01

Copilot started reviewing on behalf of polaz April 3, 2026 15:02 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Comment thread zstd/src/bit_io/bit_reader_reverse.rs

Comment thread zstd/src/bit_io/bit_reader_reverse.rs Outdated

fix(bit_io): compute triple sum in u16 to prevent u8 overflow

9fc72f5

n1 + n2 + n3 in u8 can wrap in release builds (debug builds panic). Use u16 for addition in get_bits_triple and peek_bits_triple's debug_assert_eq, cast to u8 only after confirming sum <= 56.

polaz requested a review from Copilot April 3, 2026 15:16

Copilot started reviewing on behalf of polaz April 3, 2026 15:16 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

polaz merged commit aa92d4f into main Apr 3, 2026
14 of 15 checks passed

polaz deleted the feat/#13-perf-bitstream-reader--branchless-operations-and-b branch April 3, 2026 15:46

sw-release-bot Bot mentioned this pull request Apr 3, 2026

chore: release v0.0.6 #57

Merged

coderabbitai Bot mentioned this pull request Apr 12, 2026

perf(bench): add wildcopy candidate research bench and dashboard range filter #107

Merged

Conversation

polaz commented Apr 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bounds checking strategy

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 3, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 3, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

sw-release-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

polaz commented Apr 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 3, 2026 •

edited

Loading

sw-release-bot Bot left a comment •

edited

Loading