perf(encoding): complete ARM histogram path for #71#104
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a private histogram module with scalar and striped-parallel byte counters and a dispatcher; introduces Changes
Sequence Diagram(s)sequenceDiagram
participant Huff as HuffmanEncoder
participant FSE as FSE encoder
participant Hist as histogram::count_bytes
participant Counts as Counts[256]
Huff->>Hist: count_bytes(weights, Counts)
Hist-->>Huff: (max_symbol, largest_count)
Huff->>FSE: build_table_from_bytes(weights, max_log, avoid_0_numbit)
FSE->>Hist: count_bytes(weights, Counts)
Hist-->>FSE: (max_symbol, largest_count)
FSE->>FSE: build_table_from_counts(&counts[..=max_symbol], max_log, avoid_0_numbit)
FSE-->>Huff: FSETable
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR introduces a shared byte-frequency histogram implementation and wires it into entropy-table construction paths (Huffman and FSE) to enable ARM/AArch64-optimized counting with scalar fallback, as part of the broader ARM optimization work for #71.
Changes:
- Add a new
histogrammodule providingcount_bytes()with scalar + striped (“donor-style”) counting and an AArch64 SVE2-gated variant. - Use the shared histogram counter when building Huffman symbol counts and when building FSE tables from byte slices.
- Replace iterator-based
build_table_from_data(...)usage in non-test code with a new slice-basedbuild_table_from_bytes(...)helper.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
zstd/src/lib.rs |
Registers the new histogram module in the crate. |
zstd/src/histogram.rs |
Implements shared histogram counting + tests; adds SVE2-gated variant. |
zstd/src/huff0/huff0_encoder.rs |
Switches Huffman counting to histogram::count_bytes and updates FSE weight-table build to slice-based API. |
zstd/src/fse/fse_encoder.rs |
Adds build_table_from_bytes() using the shared histogram and gates iterator-based builder to tests/fuzz. |
zstd/src/dictionary/mod.rs |
Updates dictionary FSE table serialization to use build_table_from_bytes(). |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/fse/fse_encoder.rs`:
- Around line 316-320: build_table_from_bytes currently calls
histogram::count_bytes on empty input which yields (0,0) and later causes a deep
panic; add an explicit precondition check at the start of build_table_from_bytes
to reject empty slices (e.g. assert! or panic with a clear message) before
calling histogram::count_bytes, so callers get an immediate, descriptive
failure; keep the rest of the logic (the call to histogram::count_bytes and the
subsequent build_table_from_counts(&counts[..=max_symbol], max_log,
avoid_0_numbit)) unchanged.
In `@zstd/src/histogram.rs`:
- Around line 137-148: The test count_bytes_handles_small_input_with_tail never
exercises the parallel path; update the test so it triggers count_bytes()'s
parallel branch by using a data length > PARALLEL_COUNT_THRESHOLD and not a
multiple of 16, or alternatively call count_bytes_parallel(&data, &mut fast)
directly from the test; ensure you still compare results against
count_bytes_scalar(&data, &mut scalar) and assert both the histogram arrays and
returned metadata match.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7f80c42a-b0f5-47a6-8608-0dfdb190e956
📒 Files selected for processing (5)
zstd/src/dictionary/mod.rszstd/src/fse/fse_encoder.rszstd/src/histogram.rszstd/src/huff0/huff0_encoder.rszstd/src/lib.rs
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/histogram.rs`:
- Around line 29-74: count_bytes_parallel can overflow when merging four u32
bucket arrays because the code sums them as u32 then casts; change the merge to
widen each term first (e.g. let value = counting1[symbol] as usize +
counting2[symbol] as usize + counting3[symbol] as usize + counting4[symbol] as
usize; counts[symbol] = value;) and add a large-input fallback: if data.len() >
u32::MAX as usize or any countingN[symbol] == u32::MAX detected
(counting1..counting4), switch to a safe accumulation path that uses a single
usize accumulator (or u64) per symbol and recomputes counts to avoid wraparound.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: e266b924-9ee6-4cbc-8fbe-838873e6abb5
📒 Files selected for processing (2)
zstd/src/fse/fse_encoder.rszstd/src/histogram.rs
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/histogram.rs`:
- Around line 205-209: The test merge_lane_counts_widens_before_sum assumes
usize is 64-bit and overflows on 32-bit targets; either gate the test with
#[cfg(target_pointer_width = "64")] placed before #[test] or change the test to
use width-aware values (e.g., cast to u64 before summing or assert via u64
arithmetic) so it doesn't rely on usize widening—update the test function
merge_lane_counts_widens_before_sum accordingly to use one of these approaches.
- Around line 149-153: The test fails on 32-bit because per-lane accumulators
use usize and can overflow; update merge_lane_counts to widen its accumulator to
u64 (use u64 temps when summing lane/u32 counts before aggregating into final
counts) or alternatively gate the test with #[cfg(target_pointer_width = "64")];
also ensure the SVE2 specialization is preserved by making the parallel
implementation guaranteed to inline into the #[target_feature(enable = "sve2")]
function: either mark count_bytes_parallel with #[inline(always)] or move the
loop/body of count_bytes_parallel directly into the unsafe fn count_bytes_sve2
so the SVE2-compiled code contains the specialized loop. Target symbols:
merge_lane_counts, count_bytes_sve2, count_bytes_parallel.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 70bac832-713b-493a-8c8f-2872ec5b7acf
📒 Files selected for processing (1)
zstd/src/histogram.rs
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/histogram.rs`:
- Around line 43-47: In count_bytes_parallel, replace the loop condition that
uses data.len().saturating_sub(16) (index <= data.len().saturating_sub(16)) with
an explicit bounds check (index + 16 <= data.len()) so the unsafe read_unaligned
at ptr is directly guarded by the loop condition; this makes the 16-byte read
self-contained and avoids relying on an external length invariant, ensuring the
unsafe block remains sound if the surrounding logic changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 429b5a62-1291-4a8a-bab0-39dcf10c288d
📒 Files selected for processing (2)
zstd/src/fse/fse_encoder.rszstd/src/histogram.rs
Summary
This PR finalizes the remaining
#71work by adding the shared histogram-count path used by Huffman/FSE/dictionary entropy builders.What this PR changes
zstd/src/histogram.rs) with scalar fallback#[target_feature(enable = "sve2")])#71 Objective Coverage
prfm) path: completed in PR perf(decoding): branchless offset history, prefetch pipeline, and BMI2 triple extract #90Validation
Closes #71
Summary by CodeRabbit