Skip to content

perf(encoding): complete ARM histogram path for #71#104

Merged
polaz merged 9 commits intomainfrom
perf/#71-histogram-count-path
Apr 11, 2026
Merged

perf(encoding): complete ARM histogram path for #71#104
polaz merged 9 commits intomainfrom
perf/#71-histogram-count-path

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Apr 11, 2026

Summary

This PR finalizes the remaining #71 work by adding the shared histogram-count path used by Huffman/FSE/dictionary entropy builders.

What this PR changes

  • add shared donor-style striped histogram counter (zstd/src/histogram.rs) with scalar fallback
  • wire histogram counting into Huffman/FSE/dictionary entropy-table paths
  • add AArch64 runtime dispatch for an SVE2-gated histogram variant (#[target_feature(enable = "sve2")])

#71 Objective Coverage

Validation

  • cargo fmt --all
  • cargo clippy --all-targets -- -D warnings
  • cargo nextest run -p structured-zstd

Closes #71

Summary by CodeRabbit

  • Refactor
    • Streamlined entropy-table construction to use a direct byte-slice path and simplified internal imports.
  • Performance
    • Faster encoding on large inputs via an optimized counting path and runtime CPU-path selection where supported.
  • New Features
    • Added a shared, efficient byte-frequency histogram utility used across encoders.
  • Tests
    • Added unit tests covering counting correctness, dispatcher behavior, and edge cases.

Copilot AI review requested due to automatic review settings April 11, 2026 13:03
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 11, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9de243d1-2f5e-412d-a3f3-396c4b6dba5b

📥 Commits

Reviewing files that changed from the base of the PR and between 8d7675c and 00e87e1.

📒 Files selected for processing (1)
  • zstd/src/histogram.rs

📝 Walkthrough

Walkthrough

Adds a private histogram module with scalar and striped-parallel byte counters and a dispatcher; introduces build_table_from_bytes(...) in the FSE encoder (gating the iterator-based builder); and updates Huffman and dictionary call sites to use the new histogram and byte-slice table builder. (<=50 words)

Changes

Cohort / File(s) Summary
New histogram module
zstd/src/histogram.rs
Adds pub(crate) fn count_bytes(data: &[u8], counts: &mut [usize;256]) -> (usize, usize), count_bytes_scalar, striped count_bytes_parallel, lane-merge logic, SVE2-targeting dispatch paths, and unit tests.
FSE encoder API
zstd/src/fse/fse_encoder.rs
Gates iterator-based build_table_from_data(...) behind #[cfg(any(test, feature = "fuzz_exports"))]; adds pub(crate) fn build_table_from_bytes(data: &[u8], max_log: u8, avoid_0_numbit: bool) -> FSETable which counts bytes then calls build_table_from_counts.
Huffman encoder updates
zstd/src/huff0/huff0_encoder.rs
Replaces manual counting with histogram::count_bytes(...); calls fse_encoder::build_table_from_bytes(weights, 6, true) and uses max_symbol returned by the histogram.
Dictionary module update
zstd/src/dictionary/mod.rs
Replaces iterator-based FSE table construction with build_table_from_bytes(&symbols, max_log, false) and adjusts imports to the new API.
Crate mod registration
zstd/src/lib.rs
Adds private mod histogram; declaration.
Imports adjusted
zstd/src/dictionary/mod.rs, zstd/src/huff0/huff0_encoder.rs
Switches use statements to import build_table_from_bytes and histogram utilities instead of the iterator-based helpers.

Sequence Diagram(s)

sequenceDiagram
    participant Huff as HuffmanEncoder
    participant FSE as FSE encoder
    participant Hist as histogram::count_bytes
    participant Counts as Counts[256]

    Huff->>Hist: count_bytes(weights, Counts)
    Hist-->>Huff: (max_symbol, largest_count)
    Huff->>FSE: build_table_from_bytes(weights, max_log, avoid_0_numbit)
    FSE->>Hist: count_bytes(weights, Counts)
    Hist-->>FSE: (max_symbol, largest_count)
    FSE->>FSE: build_table_from_counts(&counts[..=max_symbol], max_log, avoid_0_numbit)
    FSE-->>Huff: FSETable
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I tallied bytes in moonlit rows,
Four lanes of hops where histogram grows,
I built the table, byte by byte,
Quiet paws in coder's night,
Hops, counts, and a fse delight.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'perf(encoding): complete ARM histogram path for #71' directly matches the main change: implementing the SVE2 histogram-count path for ARM platforms to close issue #71.
Linked Issues check ✅ Passed The PR implements the SVE2 histogram-count path for ARM with runtime detection and scalar fallbacks, completing the encoding optimization portion of #71 as stated in objectives.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #71: adding histogram counting infrastructure and integrating it into FSE/Huffman/dictionary paths with ARM SVE2 support.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/#71-histogram-count-path

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 11, 2026

Codecov Report

❌ Patch coverage is 98.59155% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
zstd/src/histogram.rs 98.47% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a shared byte-frequency histogram implementation and wires it into entropy-table construction paths (Huffman and FSE) to enable ARM/AArch64-optimized counting with scalar fallback, as part of the broader ARM optimization work for #71.

Changes:

  • Add a new histogram module providing count_bytes() with scalar + striped (“donor-style”) counting and an AArch64 SVE2-gated variant.
  • Use the shared histogram counter when building Huffman symbol counts and when building FSE tables from byte slices.
  • Replace iterator-based build_table_from_data(...) usage in non-test code with a new slice-based build_table_from_bytes(...) helper.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
zstd/src/lib.rs Registers the new histogram module in the crate.
zstd/src/histogram.rs Implements shared histogram counting + tests; adds SVE2-gated variant.
zstd/src/huff0/huff0_encoder.rs Switches Huffman counting to histogram::count_bytes and updates FSE weight-table build to slice-based API.
zstd/src/fse/fse_encoder.rs Adds build_table_from_bytes() using the shared histogram and gates iterator-based builder to tests/fuzz.
zstd/src/dictionary/mod.rs Updates dictionary FSE table serialization to use build_table_from_bytes().

Comment thread zstd/src/histogram.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/fse/fse_encoder.rs`:
- Around line 316-320: build_table_from_bytes currently calls
histogram::count_bytes on empty input which yields (0,0) and later causes a deep
panic; add an explicit precondition check at the start of build_table_from_bytes
to reject empty slices (e.g. assert! or panic with a clear message) before
calling histogram::count_bytes, so callers get an immediate, descriptive
failure; keep the rest of the logic (the call to histogram::count_bytes and the
subsequent build_table_from_counts(&counts[..=max_symbol], max_log,
avoid_0_numbit)) unchanged.

In `@zstd/src/histogram.rs`:
- Around line 137-148: The test count_bytes_handles_small_input_with_tail never
exercises the parallel path; update the test so it triggers count_bytes()'s
parallel branch by using a data length > PARALLEL_COUNT_THRESHOLD and not a
multiple of 16, or alternatively call count_bytes_parallel(&data, &mut fast)
directly from the test; ensure you still compare results against
count_bytes_scalar(&data, &mut scalar) and assert both the histogram arrays and
returned metadata match.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7f80c42a-b0f5-47a6-8608-0dfdb190e956

📥 Commits

Reviewing files that changed from the base of the PR and between e8ad7c1 and 0c842af.

📒 Files selected for processing (5)
  • zstd/src/dictionary/mod.rs
  • zstd/src/fse/fse_encoder.rs
  • zstd/src/histogram.rs
  • zstd/src/huff0/huff0_encoder.rs
  • zstd/src/lib.rs

Comment thread zstd/src/fse/fse_encoder.rs Outdated
Comment thread zstd/src/histogram.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/histogram.rs`:
- Around line 29-74: count_bytes_parallel can overflow when merging four u32
bucket arrays because the code sums them as u32 then casts; change the merge to
widen each term first (e.g. let value = counting1[symbol] as usize +
counting2[symbol] as usize + counting3[symbol] as usize + counting4[symbol] as
usize; counts[symbol] = value;) and add a large-input fallback: if data.len() >
u32::MAX as usize or any countingN[symbol] == u32::MAX detected
(counting1..counting4), switch to a safe accumulation path that uses a single
usize accumulator (or u64) per symbol and recomputes counts to avoid wraparound.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e266b924-9ee6-4cbc-8fbe-838873e6abb5

📥 Commits

Reviewing files that changed from the base of the PR and between 0c842af and 69d2fba.

📒 Files selected for processing (2)
  • zstd/src/fse/fse_encoder.rs
  • zstd/src/histogram.rs

Comment thread zstd/src/histogram.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread zstd/src/histogram.rs Outdated
Comment thread zstd/src/histogram.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread zstd/src/histogram.rs Outdated
Comment thread zstd/src/histogram.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/histogram.rs`:
- Around line 205-209: The test merge_lane_counts_widens_before_sum assumes
usize is 64-bit and overflows on 32-bit targets; either gate the test with
#[cfg(target_pointer_width = "64")] placed before #[test] or change the test to
use width-aware values (e.g., cast to u64 before summing or assert via u64
arithmetic) so it doesn't rely on usize widening—update the test function
merge_lane_counts_widens_before_sum accordingly to use one of these approaches.
- Around line 149-153: The test fails on 32-bit because per-lane accumulators
use usize and can overflow; update merge_lane_counts to widen its accumulator to
u64 (use u64 temps when summing lane/u32 counts before aggregating into final
counts) or alternatively gate the test with #[cfg(target_pointer_width = "64")];
also ensure the SVE2 specialization is preserved by making the parallel
implementation guaranteed to inline into the #[target_feature(enable = "sve2")]
function: either mark count_bytes_parallel with #[inline(always)] or move the
loop/body of count_bytes_parallel directly into the unsafe fn count_bytes_sve2
so the SVE2-compiled code contains the specialized loop. Target symbols:
merge_lane_counts, count_bytes_sve2, count_bytes_parallel.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 70bac832-713b-493a-8c8f-2872ec5b7acf

📥 Commits

Reviewing files that changed from the base of the PR and between c729b7a and 977bdaa.

📒 Files selected for processing (1)
  • zstd/src/histogram.rs

Comment thread zstd/src/histogram.rs
Comment thread zstd/src/histogram.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread zstd/src/fse/fse_encoder.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/histogram.rs`:
- Around line 43-47: In count_bytes_parallel, replace the loop condition that
uses data.len().saturating_sub(16) (index <= data.len().saturating_sub(16)) with
an explicit bounds check (index + 16 <= data.len()) so the unsafe read_unaligned
at ptr is directly guarded by the loop condition; this makes the 16-byte read
self-contained and avoids relying on an external length invariant, ensuring the
unsafe block remains sound if the surrounding logic changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 429b5a62-1291-4a8a-bab0-39dcf10c288d

📥 Commits

Reviewing files that changed from the base of the PR and between 977bdaa and 8d7675c.

📒 Files selected for processing (2)
  • zstd/src/fse/fse_encoder.rs
  • zstd/src/histogram.rs

Comment thread zstd/src/histogram.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread zstd/src/histogram.rs Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@polaz polaz merged commit b850c04 into main Apr 11, 2026
17 checks passed
@polaz polaz deleted the perf/#71-histogram-count-path branch April 11, 2026 15:32
@sw-release-bot sw-release-bot Bot mentioned this pull request Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: ARM platform optimizations (CRC32 hash, NEON copy, SVE2 histcnt)

2 participants