perf(encoding): CRC-gated hash mix for ARM and x86_64#102
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughIntroduces a runtime-selectable 64-bit hash-mixing abstraction used by match generators: adds Changes
Sequence Diagram(s)sequenceDiagram
participant Generator as "MatchGenerator"
participant Selector as "HashMixKernel selector"
participant Impl as "hash_mix_u64 implementation"
rect rgba(30,144,255,0.5)
Generator->>Selector: request kernel (init / first use)
Selector->>Selector: detect CPU features / read cached value
Selector-->>Generator: return kernel
end
rect rgba(34,139,34,0.5)
Generator->>Impl: call hash_mix_u64_with_kernel(value, kernel)
alt Kernel == SSE4.2
Impl->>Impl: use `_mm_crc32_u64` + mix
else Kernel == AArch64 CRC
Impl->>Impl: use `__crc32d` + mix
else
Impl->>Impl: scalar multiplicative fallback
end
Impl-->>Generator: return mixed 64-bit hash
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
Adds a shared hash_mix_u64 helper used by encoding matchers, with runtime CPU-feature–gated CRC32-based mixing on AArch64 (crc) and x86_64 (SSE4.2), plus scalar fallback, to reduce hash-mix cost in hot paths while keeping portability.
Changes:
- Introduces
hash_mix_u64with CRC32-based mix paths for AArch64 and x86_64 (runtime-gated understd, compile-time-gated underno_std). - Switches dfast and row matchers to use the shared hash-mix contract instead of per-site multiplicative hashing.
- Adds CPU-gated tests intended to validate determinism/availability of the CRC hash paths.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 102-145: Remove the unnecessary `return` keywords in the
feature-probe helpers and drop redundant casts of the CRC intrinsics: in
sse42_crc_hash_available() remove the explicit `return` and use the expression
value from HAS_SSE42.get_or_init(...); in crc_hash_available() do the same; in
hash_mix_u64_sse42() remove the cast on the result of _mm_crc32_u64 (it already
yields the correct integer type) before shifting/xoring; likewise remove any
redundant cast on __crc32d in hash_mix_u64_crc() if present. Update the
functions sse42_crc_hash_available, crc_hash_available, hash_mix_u64_sse42, and
hash_mix_u64_crc accordingly to satisfy clippy.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5939ac12-c4fb-4eeb-986b-88ce23a92d03
📒 Files selected for processing (1)
zstd/src/encoding/match_generator.rs
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 4299-4323: The tests only check hash_mix_u64 is deterministic, not
that the accelerated path is selected; update each guarded test (the ones using
is_x86_feature_detected!("sse4.2") and is_aarch64_feature_detected!("crc")) to
invoke the architecture-specific implementation directly and compare it to
hash_mix_u64 so the accelerated path is exercised and verified (i.e., call the
SSE/CRC helper used by hash_mix_u64 under the same runtime guard and assert
equality between that helper and hash_mix_u64 for the test vector).
- Around line 79-98: hash_mix_u64 currently checks
sse42_crc_hash_available()/crc_hash_available() on every call, which hurts
hot-path performance in DfastMatchGenerator::hash_index and
RowMatchGenerator::hash_and_row; instead, determine the mixer once at startup or
first use and dispatch directly. Replace the per-call branching in hash_mix_u64
with a thin dispatcher that calls a preselected function (e.g., a static
OnceLock or lazy-initialized function pointer/enum) which is set to either
hash_mix_u64_sse42, hash_mix_u64_crc, or the scalar multiplier, and ensure the
unsafe SSE4.2/CRC variants are only invoked when the corresponding feature
detection has run and selected them (preserve the SAFETY guards). Update callers
(DfastMatchGenerator::hash_index, RowMatchGenerator::hash_and_row) to call the
new dispatcher (or keep calling hash_mix_u64 if it becomes the cheap indirect
call) so the hot path no longer performs per-hash feature checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3faf388a-2c9f-497d-a893-a19acfcc6a1f
📒 Files selected for processing (1)
zstd/src/encoding/match_generator.rs
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 193-199: with_forced_hash_mix_kernel currently unsafely mutates
global HASH_MIX_KERNEL without synchronization and fails to restore on panic;
change it to acquire the HASH_MIX_KERNEL_TEST_LOCK (serialize with other tests)
and set HASH_MIX_KERNEL inside a guard object that restores the previous value
in Drop so restoration happens even if f() panics (i.e., make
with_forced_hash_mix_kernel take the lock, swap kernel, run f(), and rely on a
RAII restore-on-drop guard to reset HASH_MIX_KERNEL). Also update
accelerated-path tests to acquire the same HASH_MIX_KERNEL_TEST_LOCK before
calling hash_mix_u64(...) so they cannot race with forced-kernel tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 69135b0f-650a-48b6-bdf9-33e48c065723
📒 Files selected for processing (1)
zstd/src/encoding/match_generator.rs
- add aarch64 crc-enabled hash mixing path with runtime/static feature detection - keep multiplicative fallback for non-crc and non-arm targets - apply shared hash mix in dfast and row hash indexing - align row hash test with shared hash-mix contract
- extend hash mix with runtime-gated SSE4.2 CRC32 on x86_64 - keep scalar fallback and existing aarch64 CRC path - add CPU-gated determinism tests for crc hash paths
- remove needless returns in feature-detection helpers - drop redundant u64 cast in sse4.2 crc mix path
7737c5f to
c4ea535
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
zstd/src/encoding/match_generator.rs (1)
4311-4340:⚠️ Potential issue | 🟡 MinorSerialize this row hash test against the forced-kernel coverage tests.
Line 4324 and Line 4331 both depend on
hash_mix_u64, but this test does not holdHASH_MIX_KERNEL_TEST_LOCK. If it runs in parallel with the forced-scalar coverage on Lines 4383-4411, the first call can see one kernel and the second another, making the row/tag assertion flaky.🛠️ Suggested fix
fn row_hash_and_row_extracts_high_bits() { + #[cfg(feature = "std")] + let _lock = HASH_MIX_KERNEL_TEST_LOCK + .lock() + .expect("hash mix test lock poisoned"); + let mut matcher = RowMatchGenerator::new(1 << 22); matcher.configure(ROW_CONFIG); matcher.add_data(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@zstd/src/encoding/match_generator.rs` around lines 4311 - 4340, The test row_hash_and_row_extracts_high_bits calls hash_mix_u64 twice without holding the global HASH_MIX_KERNEL_TEST_LOCK, causing flaky behavior when kernel selection is raced by other coverage tests; fix by acquiring HASH_MIX_KERNEL_TEST_LOCK at the start of row_hash_and_row_extracts_high_bits (e.g., call HASH_MIX_KERNEL_TEST_LOCK.lock() and hold the guard for the duration of the test) before any hash_mix_u64 calls and let the guard drop at the end so both hash_mix_u64 invocations see the same kernel selection.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 4311-4340: The test row_hash_and_row_extracts_high_bits calls
hash_mix_u64 twice without holding the global HASH_MIX_KERNEL_TEST_LOCK, causing
flaky behavior when kernel selection is raced by other coverage tests; fix by
acquiring HASH_MIX_KERNEL_TEST_LOCK at the start of
row_hash_and_row_extracts_high_bits (e.g., call HASH_MIX_KERNEL_TEST_LOCK.lock()
and hold the guard for the duration of the test) before any hash_mix_u64 calls
and let the guard drop at the end so both hash_mix_u64 invocations see the same
kernel selection.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 174b7518-49cc-4f15-897e-178709d312f3
📒 Files selected for processing (1)
zstd/src/encoding/match_generator.rs
|
@coderabbitai Duplicate body item about \ lock is fixed in commit 904d2b0: the test now acquires \ before any \ call to avoid cross-test kernel races. |
|
@coderabbitai Duplicate body item about row_hash_and_row_extracts_high_bits lock is fixed in commit 904d2b0: the test now acquires HASH_MIX_KERNEL_TEST_LOCK before any hash_mix_u64 call to avoid cross-test kernel races. |
|
Tip For best results, initiate chat on the files or code changes.
✏️ Learnings added
🧠 Learnings used |
|
Tip For best results, initiate chat on the files or code changes.
✏️ Learnings added
🧠 Learnings used |
|
@coderabbitai Copilot outdated thread PRRT_kwDORv-2jM56S4K5 is now stale after 46927db: hash kernel dispatch moved into matcher state ( |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain
🏁 Script executed: Length of output: 92
✏️ Learnings added
🧠 Learnings used |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 4308-4345: The tests hard-code accelerated enums
(HashMixKernel::X86Sse42 / HashMixKernel::Aarch64Crc) which bypass runtime
selection; change them to call detect_hash_mix_kernel() under the same
CPU-feature guard (is_x86_feature_detected! / is_aarch64_feature_detected!) and
assert that detect_hash_mix_kernel() returns the expected accelerated variant,
then use that detected kernel in hash_mix_u64_with_kernel for the comparison
(instead of the hard-coded enum); keep the existing accelerated unsafe call
(hash_mix_u64_sse42 / hash_mix_u64_crc) as the reference value but only run the
assert if the detected kernel matches the accelerated variant.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4deef961-5e86-49d4-86a2-ad2dc4642036
📒 Files selected for processing (1)
zstd/src/encoding/match_generator.rs
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'structured-zstd vs C FFI (x86_64-gnu)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.30.
| Benchmark suite | Current: 33b482f | Previous: ae02906 | Ratio |
|---|---|---|---|
compress/level4-row/small-4k-log-lines/matrix/pure_rust |
0.069 ms |
0.053 ms |
1.30 |
compress/level4-row/low-entropy-1m/matrix/pure_rust |
9.264 ms |
5.554 ms |
1.67 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @polaz
Summary
Validation
Part of #71
Summary by CodeRabbit
Performance
Tests