perf(encoding): interleave fastest hash fill insertion#41
Conversation
- add interleaved suffix insertion path for fill-step=3 in matcher - keep tail-anchor backfill behavior unchanged
- add benchmark-report.md and benchmark-results.json to gitignore
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughRefactors Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR optimizes the Fastest encoder’s hash-fill path in MatchGenerator::add_suffixes_till() by unrolling/interleaving suffix insertions (using FAST_HASH_FILL_STEP = 3) to reduce overhead and improve compression throughput, while keeping the existing tail-anchor/backfill behavior unchanged.
Changes:
- Add a specialized “interleaved” insertion path for
FAST_HASH_FILL_STEPplus shared helper (insert_suffix_if_absent). - Replace the prior
windows(...).enumerate().step_by(...)insertion loop with explicit position-based insertion loops. - Update
.gitignoreto ignore benchmark artifacts (though the new entries currently duplicate existing ones).
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
zstd/src/encoding/match_generator.rs |
Adds a fast-path, unrolled/interleaved suffix insertion loop for fastest hash-fill step while preserving tail-anchor behavior. |
.gitignore |
Attempts to ignore benchmark artifacts, but introduces duplicate ignore patterns. |
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'structured-zstd vs C FFI'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.
| Benchmark suite | Current: 5f64b7c | Previous: f3310cb | Ratio |
|---|---|---|---|
compress/default/large-log-stream/matrix/c_ffi |
4.345 ms |
3.424 ms |
1.27 |
decompress/fastest/high-entropy-1m/matrix/c_ffi |
0.032 ms |
0.027 ms |
1.19 |
decompress/default/high-entropy-1m/matrix/c_ffi |
0.032 ms |
0.027 ms |
1.19 |
compress-dict/fastest/small-10k-random/matrix/c_ffi_without_dict |
0.007 ms |
0.006 ms |
1.17 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @polaz
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedReview triggered.
|
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.gitignore:
- Around line 13-14: The .gitignore has duplicate entries for
benchmark-results.json and benchmark-report.md; remove the redundant lines in
the .gitignore (the second occurrences shown in the diff) so each of these
filenames appears only once, keeping the original entries already declared
earlier and deleting the duplicate entries added in this change.
In `@zstd/src/encoding/match_generator.rs`:
- Around line 529-535: Add a debug assertion in insert_suffix_if_absent to
ensure the slice access is safe by checking the caller invariant (pos +
MIN_MATCH_LEN <= data.len()); specifically, add a debug_assert referencing
MIN_MATCH_LEN and pos at the top of the function so during development it will
panic if the invariant is violated while leaving release performance unchanged;
keep the rest of the logic (checking suffixes.contains_key and suffixes.insert)
unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 67cd0651-cac4-4845-b34b-b51443030f05
📒 Files selected for processing (2)
.gitignorezstd/src/encoding/match_generator.rs
- remove duplicate benchmark artifact entries in .gitignore - add debug_assert for insert_suffix_if_absent slice bound invariant
- add regression for idx < MIN_MATCH_LEN in add_suffixes_till - add focused interleaved-position registration test - fix insert_limit saturation to preserve original windows() behavior
Summary
MatchGenerator::add_suffixes_till()for fastest mode (FAST_HASH_FILL_STEP = 3)benchmark-report.md,benchmark-results.json)Benchmark (vs
origin/main,STRUCTURED_ZSTD_BENCH_LARGE_BYTES=16777216)Fastest pure-Rust compression improved on key scenarios:
decodecorpus-z000033: 13.613 ms -> 12.090 ms (-11.19%)high-entropy-1m: 19.056 ms -> 17.542 ms (-7.95%)large-log-stream: 14.466 ms -> 12.480 ms (-13.73%)low-entropy-1m: 0.903 ms -> 0.759 ms (-15.95%)Validation
cargo fmt -- --checkcargo clippy --all-targets -- -D warningscargo buildcargo nextest run --workspacecargo bench --bench compare_ffi -p structured-zstd --no-runSTRUCTURED_ZSTD_BENCH_LARGE_BYTES=16777216 bash .github/scripts/run-benchmarks.sh(branch + baseline in temp worktree)Refs #14
Summary by CodeRabbit