Skip to content

perf(encoding): align fastest matcher with zstd fast path#39

Merged
polaz merged 6 commits intomainfrom
perf/#14-hash-interleaving-match-counting
Mar 27, 2026
Merged

perf(encoding): align fastest matcher with zstd fast path#39
polaz merged 6 commits intomainfrom
perf/#14-hash-interleaving-match-counting

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Mar 26, 2026

Summary

  • align fastest matcher behavior with donor fast strategy
  • add 32-byte-style hash table fill stepping without skipping search positions
  • preserve boundary anchors during stepped insertion and force full seeding for skipped blocks
  • narrow repcode probing to immediate zero-literal follow-up path
  • implement bulk match-length counting with word-sized XOR + trailing zeros
  • keep matcher tests aligned with overlapping replay semantics while preserving roundtrip correctness

Validation

  • cargo fmt -- --check
  • cargo clippy --all-targets -- -D warnings
  • cargo nextest run --workspace

Benchmark

  • compared against system zstd on project corpus; output sizes match for covered cases

Closes #14

Summary by CodeRabbit

  • Performance & Optimization
    • Optimized hash-fill strategy for faster compression, particularly at highest speed settings.
    • Improved match selection algorithm for better compression ratio.
    • Enhanced offset-tracking mechanism for more efficient compression of repeated patterns.

Copilot AI review requested due to automatic review settings March 26, 2026 22:49
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: da2d40b5-156f-4418-89b2-5d4311a1e917

📥 Commits

Reviewing files that changed from the base of the PR and between bd7b941 and 4771c92.

📒 Files selected for processing (1)
  • zstd/src/encoding/match_generator.rs

📝 Walkthrough

Walkthrough

Introduces configurable hash-fill interleaving, repcode pre-checks, offset-history tracking, and word-sized match-length scanning in the encoder match generator; threads hash_fill_step via reset() and updates tests to validate the new behaviors.

Changes

Cohort / File(s) Summary
Match generator core
zstd/src/encoding/match_generator.rs
Added hash_fill_step field and threaded it from level config through reset(); add_suffixes_till and seeding use .step_by(fill_step); skip_matching updated to pass fill_step.
Repcode & offset history
zstd/src/encoding/match_generator.rs
Added offset_hist: [u32;3], repcode_candidate() and offset_match_len() helpers; perform repcode pre-checks and call encode_offset_with_history() after emitting triples.
Match-length scanning & mismatch logic
zstd/src/encoding/match_generator.rs
Replaced previous mismatch_chunks::<8> with unrolled word-wise mismatch scanning using usize reads and endian-aware byte mismatch helpers; simplified match-slice construction and candidate selection flow.
Suffix seeding behavior
zstd/src/encoding/match_generator.rs
add_suffixes_till now accepts fill_step and inserts anchors with .step_by(fill_step); backfills final tail anchor when applicable.
Tests
zstd/src/encoding/match_generator.rs (tests)
Tests rewritten to replay sequences for decode validation; added coverage for Fastest hash_fill_step, offset_hist updates, repcode zero-literal and window-boundary behavior, searchable-prefix rejection, and seeding behavior with non-1 hash_fill_step.

Sequence Diagram(s)

sequenceDiagram
    participant Encoder
    participant MatchGenerator
    participant HashTable
    participant Output

    Encoder->>MatchGenerator: request next_sequence(pos)
    MatchGenerator->>MatchGenerator: compute literals_len
    MatchGenerator->>MatchGenerator: repcode_candidate(pos, literals_len)
    alt repcode hit
        MatchGenerator->>Output: emit Sequence::Triple (rep offset)
        MatchGenerator->>MatchGenerator: encode_offset_with_history(offset, literals_len, &mut offset_hist)
        MatchGenerator->>MatchGenerator: add_suffixes_till(end, fill_step)
    else no repcode
        MatchGenerator->>HashTable: lookup current & interleaved positions (uses hash_fill_step)
        HashTable-->>MatchGenerator: match entries
        MatchGenerator->>MatchGenerator: count match length (word-wise)
        MatchGenerator->>Output: emit Sequence::Triple (match)
        MatchGenerator->>MatchGenerator: encode_offset_with_history(offset, literals_len, &mut offset_hist)
        MatchGenerator->>MatchGenerator: add_suffixes_till(end, fill_step)
    end
    MatchGenerator-->>Encoder: sequence or None
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I hop through hashes, skipping every few,
I test repcodes old and new,
I count by words and stitch the tail,
Offsets remembered on my trail,
A tiny rabbit cheers: encode anew!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: aligning the fastest matcher with zstd's fast path optimization strategy.
Linked Issues check ✅ Passed The PR fulfills key requirements from issue #14: implements bulk match-length counting with word-sized XOR, adds configurable hash fill-step, optimizes repcode handling, and fixes offset-history tracking.
Out of Scope Changes check ✅ Passed All changes are scoped to match_generator.rs and directly support the performance optimization objectives specified in issue #14.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/#14-hash-interleaving-match-counting

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Fastest/simple match generator to more closely follow zstd’s fast-path behavior, targeting higher compression throughput while maintaining roundtrip correctness.

Changes:

  • Add stepped hash fill for Fastest (pipeline-style fill rate) while still probing every position.
  • Add repcode-aware candidate selection (limited to the zero-literal follow-up path) and track repeat offset history.
  • Speed up match-length counting via word-sized XOR + trailing-zeros, and update tests to handle overlapping match replay semantics.
Comments suppressed due to low confidence (1)

zstd/src/encoding/match_generator.rs:399

  • is_last no longer changes behavior here (both branches slice &match_entry.data[match_index..]). This conditional can be removed to simplify the control flow and avoid suggesting there’s a special-case for the last entry when there isn’t.
            for match_entry in self.window.iter() {
                if let Some(match_index) = match_entry.suffixes.get(key) {
                    let match_slice = &match_entry.data[match_index..];

                    // Check how long the common prefix actually is
                    let match_len = Self::common_prefix_len(match_slice, data_slice);

                    // Collisions in the suffix store might make this check fail

Comment thread zstd/src/encoding/match_generator.rs
Comment thread zstd/src/encoding/match_generator.rs Outdated
Comment thread zstd/src/encoding/match_generator.rs Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 26, 2026

Codecov Report

❌ Patch coverage is 94.37751% with 14 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
zstd/src/encoding/match_generator.rs 94.37% 14 Missing ⚠️

📢 Thoughts on this report? Let us know!

- fix repcode offset validation across full searchable window
- make word-wise prefix counting endian-safe on big-endian targets
- strengthen matches() invariants and add cross-slice repcode regression test
- split endian mismatch-byte logic with compile-time cfg helpers
- add regression for out-of-range repcode offset rejection
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (2)

zstd/src/encoding/match_generator.rs:472

  • common_prefix_len introduces unsafe unaligned word reads but doesn’t document the safety invariants. Please add a brief // SAFETY: comment explaining that off + chunk <= max guarantees lhs.add(off)..lhs.add(off+chunk) and rhs... are in-bounds and properly aligned for read_unaligned.
        }
    }

zstd/src/encoding/match_generator.rs:1054

  • The updated matches() test now only replays sequences and asserts roundtrip equality, but it no longer checks that specific match decisions (e.g., offsets/match_len boundaries) are made. This can allow regressions that still roundtrip but significantly change compression ratio/perf. Consider restoring a small number of targeted assertions for at least one or two expected Sequence::Triple outputs (or sequence count/order) in these scenarios while keeping the overlap-safe replay logic.

#[test]
fn matches() {
    let mut matcher = MatchGenerator::new(1000);
    let mut original_data = Vec::new();
    let mut reconstructed = Vec::new();

    let replay_sequence = |seq: Sequence<'_>, reconstructed: &mut Vec<u8>| match seq {
        Sequence::Literals { literals } => {
            assert!(!literals.is_empty());
            reconstructed.extend_from_slice(literals);
        }
        Sequence::Triple {
            literals,
            offset,
            match_len,
        } => {
            assert!(offset > 0);
            assert!(match_len >= MIN_MATCH_LEN);
            reconstructed.extend_from_slice(literals);
            assert!(offset <= reconstructed.len());
            let start = reconstructed.len() - offset;
            for i in 0..match_len {
                let byte = reconstructed[start + i];
                reconstructed.push(byte);
            }
        }
    };

    matcher.add_data(
        alloc::vec![0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        SuffixStore::with_capacity(100),
        |_, _| {},
    );
    original_data.extend_from_slice(&[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]);

    matcher.next_sequence(|seq| replay_sequence(seq, &mut reconstructed));

    assert!(!matcher.next_sequence(|_| {}));

    matcher.add_data(
        alloc::vec![
            1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0,
        ],
        SuffixStore::with_capacity(100),
        |_, _| {},
    );

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
zstd/src/encoding/match_generator.rs (1)

499-509: ⚠️ Potential issue | 🟠 Major

Don't reuse the stepped fill when seeding a skipped block.

skip_matching() relies on add_suffixes_till(len) to make the entire current block searchable by later blocks. With step_by(self.hash_fill_step), Fastest now inserts only every third start into the suffix table, and the omitted starts are never revisited because suffix_idx jumps straight to len. A skipped/raw block can therefore stop being a valid match source at those positions.

Possible fix
-    fn add_suffixes_till(&mut self, idx: usize) {
+    fn add_suffixes_till(&mut self, idx: usize, fill_step: usize) {
         let last_entry = self.window.last_mut().unwrap();
         if last_entry.data.len() < MIN_MATCH_LEN {
             return;
         }
         let slice = &last_entry.data[self.suffix_idx..idx];
         for (key_index, key) in slice
             .windows(MIN_MATCH_LEN)
             .enumerate()
-            .step_by(self.hash_fill_step)
+            .step_by(fill_step)
         {
             if !last_entry.suffixes.contains_key(key) {
                 last_entry.suffixes.insert(key, self.suffix_idx + key_index);
             }
         }
     }
...
-                self.add_suffixes_till(self.suffix_idx + match_len);
+                self.add_suffixes_till(self.suffix_idx + match_len, self.hash_fill_step);
...
-        self.add_suffixes_till(len);
+        self.add_suffixes_till(len, 1);

Also applies to: 561-564

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zstd/src/encoding/match_generator.rs` around lines 499 - 509,
add_suffixes_till currently skips inserting some suffix starts because it uses
.step_by(self.hash_fill_step) on the window slice, which breaks
skip_matching()'s contract of making the entire block searchable; remove the
.step_by(...) so the loop over slice.windows(MIN_MATCH_LEN).enumerate() inserts
every start between suffix_idx and idx (i.e., iterate every position), and make
the same change for the analogous loop around the code referenced at 561-564;
ensure you still honor MIN_MATCH_LEN and update suffix_idx as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 516-523: repcode_candidate currently only probes offset_hist[1]
for zero-literal repeats but encode_offset_with_history can map repcodes to
offset_hist[1], offset_hist[2], and offset_hist[0] - 1; update repcode_candidate
to, when literals_len == 0, probe those three candidate offsets (offset_hist[1],
offset_hist[2], and offset_hist[0].saturating_sub(1)) in turn, call
offset_match_len for each, and return the first (offset, match_len) where
match_len >= MIN_MATCH_LEN; ensure you guard against duplicates and out-of-range
offsets when reading offset_hist.

---

Outside diff comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 499-509: add_suffixes_till currently skips inserting some suffix
starts because it uses .step_by(self.hash_fill_step) on the window slice, which
breaks skip_matching()'s contract of making the entire block searchable; remove
the .step_by(...) so the loop over slice.windows(MIN_MATCH_LEN).enumerate()
inserts every start between suffix_idx and idx (i.e., iterate every position),
and make the same change for the analogous loop around the code referenced at
561-564; ensure you still honor MIN_MATCH_LEN and update suffix_idx as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fd787445-8328-413f-8d68-dd71d0bc58c0

📥 Commits

Reviewing files that changed from the base of the PR and between c14ade8 and 9ad8430.

📒 Files selected for processing (1)
  • zstd/src/encoding/match_generator.rs

Comment thread zstd/src/encoding/match_generator.rs Outdated
- check rep1, rep2, and rep0-1 when literals_len is zero
- deduplicate repeat offsets before probing to avoid duplicate work
- add regression test covering rep2 fallback when rep1 misses
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 27, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 27, 2026

@polaz Sure! I'll re-review the PR now.

✅ Actions performed

Full review triggered.

2 similar comments
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 27, 2026

@polaz Sure! I'll re-review the PR now.

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 27, 2026

@polaz Sure! I'll re-review the PR now.

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 1309-1323: The test
simple_matcher_zero_literal_repcode_checks_rep2_and_rep0_minus1 is misnamed
because with matcher.offset_hist = [99, 4, 10] the rep0 - 1 branch never runs
(rep0-1 == 98); either rename the test to reflect it only checks rep2, or modify
the fixture so rep0-1 equals 10 (for example set offset_hist to [11, 4, 99] or
similar) to actually exercise the rep0 - 1 branch; update the assertion or name
accordingly and keep this change localized in MatchGenerator::repcode_candidate
test setup in match_generator.rs.
- Around line 505-513: The current loop that inserts suffix anchors using
.step_by(self.hash_fill_step) (iterating over slice.windows(MIN_MATCH_LEN) in
match_generator.rs) can skip the final tail start(s) when hash_fill_step > 1, so
backfill the tail anchors before advancing suffix_idx: after the stepped loop
that updates last_entry.suffixes with key -> self.suffix_idx + key_index,
explicitly iterate the last one or two window starts at the end of slice (the
final MIN_MATCH_LEN-aligned windows) and insert them into last_entry.suffixes if
missing (using the same key -> self.suffix_idx + index calculation), so that
boundary starts are preserved across blocks even when step_by skips interior
positions; reference symbols: last_entry.suffixes, self.suffix_idx,
self.hash_fill_step, MIN_MATCH_LEN.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 455bc5bc-edd3-4227-ab29-1057895c402a

📥 Commits

Reviewing files that changed from the base of the PR and between bd7b941 and 6a988db.

📒 Files selected for processing (1)
  • zstd/src/encoding/match_generator.rs

Comment thread zstd/src/encoding/match_generator.rs Outdated
Comment thread zstd/src/encoding/match_generator.rs
- backfill the last searchable suffix anchor in add_suffixes_till
- force full-position seeding for skip_matching blocks
- split zero-literal repcode tests into explicit rep2 and rep0-1 cases
@polaz polaz requested a review from Copilot March 27, 2026 06:53
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 27, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 27, 2026

@polaz Sure! I'll kick off a full re-review of PR #39 now.

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Mar 27, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 27, 2026

@polaz Sure! I'll kick off a full re-review of PR #39 now.

✅ Actions performed

Full review triggered.

@polaz polaz merged commit 5ca2e4e into main Mar 27, 2026
15 checks passed
@polaz polaz deleted the perf/#14-hash-interleaving-match-counting branch March 27, 2026 07:45
@sw-release-bot sw-release-bot Bot mentioned this pull request Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: encoder match finding — hash interleaving and match length counting

2 participants