perf(encoding): align fastest matcher with zstd fast path by polaz · Pull Request #39 · structured-world/structured-zstd

polaz · 2026-03-26T22:49:32Z

Summary

align fastest matcher behavior with donor fast strategy
add 32-byte-style hash table fill stepping without skipping search positions
preserve boundary anchors during stepped insertion and force full seeding for skipped blocks
narrow repcode probing to immediate zero-literal follow-up path
implement bulk match-length counting with word-sized XOR + trailing zeros
keep matcher tests aligned with overlapping replay semantics while preserving roundtrip correctness

Validation

cargo fmt -- --check
cargo clippy --all-targets -- -D warnings
cargo nextest run --workspace

Benchmark

compared against system zstd on project corpus; output sizes match for covered cases

Closes #14

Summary by CodeRabbit

Performance & Optimization
- Optimized hash-fill strategy for faster compression, particularly at highest speed settings.
- Improved match selection algorithm for better compression ratio.
- Enhanced offset-tracking mechanism for more efficient compression of repeated patterns.

coderabbitai · 2026-03-26T22:49:38Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: da2d40b5-156f-4418-89b2-5d4311a1e917

📥 Commits

Reviewing files that changed from the base of the PR and between bd7b941 and 4771c92.

📒 Files selected for processing (1)

zstd/src/encoding/match_generator.rs

📝 Walkthrough

Walkthrough

Introduces configurable hash-fill interleaving, repcode pre-checks, offset-history tracking, and word-sized match-length scanning in the encoder match generator; threads hash_fill_step via reset() and updates tests to validate the new behaviors.

Changes

Cohort / File(s)	Summary
Match generator core `zstd/src/encoding/match_generator.rs`	Added `hash_fill_step` field and threaded it from level config through `reset()`; `add_suffixes_till` and seeding use `.step_by(fill_step)`; `skip_matching` updated to pass `fill_step`.
Repcode & offset history `zstd/src/encoding/match_generator.rs`	Added `offset_hist: [u32;3]`, `repcode_candidate()` and `offset_match_len()` helpers; perform repcode pre-checks and call `encode_offset_with_history()` after emitting triples.
Match-length scanning & mismatch logic `zstd/src/encoding/match_generator.rs`	Replaced previous `mismatch_chunks::<8>` with unrolled word-wise mismatch scanning using `usize` reads and endian-aware byte mismatch helpers; simplified match-slice construction and candidate selection flow.
Suffix seeding behavior `zstd/src/encoding/match_generator.rs`	`add_suffixes_till` now accepts `fill_step` and inserts anchors with `.step_by(fill_step)`; backfills final tail anchor when applicable.
Tests `zstd/src/encoding/match_generator.rs` (tests)	Tests rewritten to replay sequences for decode validation; added coverage for `Fastest` `hash_fill_step`, `offset_hist` updates, repcode zero-literal and window-boundary behavior, searchable-prefix rejection, and seeding behavior with non-1 `hash_fill_step`.

Sequence Diagram(s)

sequenceDiagram
    participant Encoder
    participant MatchGenerator
    participant HashTable
    participant Output

    Encoder->>MatchGenerator: request next_sequence(pos)
    MatchGenerator->>MatchGenerator: compute literals_len
    MatchGenerator->>MatchGenerator: repcode_candidate(pos, literals_len)
    alt repcode hit
        MatchGenerator->>Output: emit Sequence::Triple (rep offset)
        MatchGenerator->>MatchGenerator: encode_offset_with_history(offset, literals_len, &mut offset_hist)
        MatchGenerator->>MatchGenerator: add_suffixes_till(end, fill_step)
    else no repcode
        MatchGenerator->>HashTable: lookup current & interleaved positions (uses hash_fill_step)
        HashTable-->>MatchGenerator: match entries
        MatchGenerator->>MatchGenerator: count match length (word-wise)
        MatchGenerator->>Output: emit Sequence::Triple (match)
        MatchGenerator->>MatchGenerator: encode_offset_with_history(offset, literals_len, &mut offset_hist)
        MatchGenerator->>MatchGenerator: add_suffixes_till(end, fill_step)
    end
    MatchGenerator-->>Encoder: sequence or None

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

fix(encoding): implement default compression level #34 — Overlaps in match-generator internals and encode_offset_with_history usage; relates to offset-history and repcode handling.

Poem

🐇 I hop through hashes, skipping every few,
I test repcodes old and new,
I count by words and stitch the tail,
Offsets remembered on my trail,
A tiny rabbit cheers: encode anew!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: aligning the fastest matcher with zstd's fast path optimization strategy.
Linked Issues check	✅ Passed	The PR fulfills key requirements from issue `#14`: implements bulk match-length counting with word-sized XOR, adds configurable hash fill-step, optimizes repcode handling, and fixes offset-history tracking.
Out of Scope Changes check	✅ Passed	All changes are scoped to match_generator.rs and directly support the performance optimization objectives specified in issue `#14`.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/#14-hash-interleaving-match-counting

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR updates the Fastest/simple match generator to more closely follow zstd’s fast-path behavior, targeting higher compression throughput while maintaining roundtrip correctness.

Changes:

Add stepped hash fill for Fastest (pipeline-style fill rate) while still probing every position.
Add repcode-aware candidate selection (limited to the zero-literal follow-up path) and track repeat offset history.
Speed up match-length counting via word-sized XOR + trailing-zeros, and update tests to handle overlapping match replay semantics.

Comments suppressed due to low confidence (1)

zstd/src/encoding/match_generator.rs:399

is_last no longer changes behavior here (both branches slice &match_entry.data[match_index..]). This conditional can be removed to simplify the control flow and avoid suggesting there’s a special-case for the last entry when there isn’t.

            for match_entry in self.window.iter() {
                if let Some(match_index) = match_entry.suffixes.get(key) {
                    let match_slice = &match_entry.data[match_index..];

                    // Check how long the common prefix actually is
                    let match_len = Self::common_prefix_len(match_slice, data_slice);

                    // Collisions in the suffix store might make this check fail

codecov · 2026-03-26T22:55:02Z

Codecov Report

❌ Patch coverage is 94.37751% with 14 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
zstd/src/encoding/match_generator.rs	94.37%	14 Missing ⚠️

📢 Thoughts on this report? Let us know!

- fix repcode offset validation across full searchable window - make word-wise prefix counting endian-safe on big-endian targets - strengthen matches() invariants and add cross-slice repcode regression test

- split endian mismatch-byte logic with compile-time cfg helpers - add regression for out-of-range repcode offset rejection

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (2)

zstd/src/encoding/match_generator.rs:472

common_prefix_len introduces unsafe unaligned word reads but doesn’t document the safety invariants. Please add a brief // SAFETY: comment explaining that off + chunk <= max guarantees lhs.add(off)..lhs.add(off+chunk) and rhs... are in-bounds and properly aligned for read_unaligned.

        }
    }

zstd/src/encoding/match_generator.rs:1054

The updated matches() test now only replays sequences and asserts roundtrip equality, but it no longer checks that specific match decisions (e.g., offsets/match_len boundaries) are made. This can allow regressions that still roundtrip but significantly change compression ratio/perf. Consider restoring a small number of targeted assertions for at least one or two expected Sequence::Triple outputs (or sequence count/order) in these scenarios while keeping the overlap-safe replay logic.


#[test]
fn matches() {
    let mut matcher = MatchGenerator::new(1000);
    let mut original_data = Vec::new();
    let mut reconstructed = Vec::new();

    let replay_sequence = |seq: Sequence<'_>, reconstructed: &mut Vec<u8>| match seq {
        Sequence::Literals { literals } => {
            assert!(!literals.is_empty());
            reconstructed.extend_from_slice(literals);
        }
        Sequence::Triple {
            literals,
            offset,
            match_len,
        } => {
            assert!(offset > 0);
            assert!(match_len >= MIN_MATCH_LEN);
            reconstructed.extend_from_slice(literals);
            assert!(offset <= reconstructed.len());
            let start = reconstructed.len() - offset;
            for i in 0..match_len {
                let byte = reconstructed[start + i];
                reconstructed.push(byte);
            }
        }
    };

    matcher.add_data(
        alloc::vec![0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        SuffixStore::with_capacity(100),
        |_, _| {},
    );
    original_data.extend_from_slice(&[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]);

    matcher.next_sequence(|seq| replay_sequence(seq, &mut reconstructed));

    assert!(!matcher.next_sequence(|_| {}));

    matcher.add_data(
        alloc::vec![
            1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0,
        ],
        SuffixStore::with_capacity(100),
        |_, _| {},
    );

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

zstd/src/encoding/match_generator.rs (1)

499-509: ⚠️ Potential issue | 🟠 Major

Don't reuse the stepped fill when seeding a skipped block.

skip_matching() relies on add_suffixes_till(len) to make the entire current block searchable by later blocks. With step_by(self.hash_fill_step), Fastest now inserts only every third start into the suffix table, and the omitted starts are never revisited because suffix_idx jumps straight to len. A skipped/raw block can therefore stop being a valid match source at those positions.

Possible fix

-    fn add_suffixes_till(&mut self, idx: usize) {
+    fn add_suffixes_till(&mut self, idx: usize, fill_step: usize) {
         let last_entry = self.window.last_mut().unwrap();
         if last_entry.data.len() < MIN_MATCH_LEN {
             return;
         }
         let slice = &last_entry.data[self.suffix_idx..idx];
         for (key_index, key) in slice
             .windows(MIN_MATCH_LEN)
             .enumerate()
-            .step_by(self.hash_fill_step)
+            .step_by(fill_step)
         {
             if !last_entry.suffixes.contains_key(key) {
                 last_entry.suffixes.insert(key, self.suffix_idx + key_index);
             }
         }
     }
...
-                self.add_suffixes_till(self.suffix_idx + match_len);
+                self.add_suffixes_till(self.suffix_idx + match_len, self.hash_fill_step);
...
-        self.add_suffixes_till(len);
+        self.add_suffixes_till(len, 1);

Also applies to: 561-564

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@zstd/src/encoding/match_generator.rs` around lines 499 - 509,
add_suffixes_till currently skips inserting some suffix starts because it uses
.step_by(self.hash_fill_step) on the window slice, which breaks
skip_matching()'s contract of making the entire block searchable; remove the
.step_by(...) so the loop over slice.windows(MIN_MATCH_LEN).enumerate() inserts
every start between suffix_idx and idx (i.e., iterate every position), and make
the same change for the analogous loop around the code referenced at 561-564;
ensure you still honor MIN_MATCH_LEN and update suffix_idx as before.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 516-523: repcode_candidate currently only probes offset_hist[1]
for zero-literal repeats but encode_offset_with_history can map repcodes to
offset_hist[1], offset_hist[2], and offset_hist[0] - 1; update repcode_candidate
to, when literals_len == 0, probe those three candidate offsets (offset_hist[1],
offset_hist[2], and offset_hist[0].saturating_sub(1)) in turn, call
offset_match_len for each, and return the first (offset, match_len) where
match_len >= MIN_MATCH_LEN; ensure you guard against duplicates and out-of-range
offsets when reading offset_hist.

---

Outside diff comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 499-509: add_suffixes_till currently skips inserting some suffix
starts because it uses .step_by(self.hash_fill_step) on the window slice, which
breaks skip_matching()'s contract of making the entire block searchable; remove
the .step_by(...) so the loop over slice.windows(MIN_MATCH_LEN).enumerate()
inserts every start between suffix_idx and idx (i.e., iterate every position),
and make the same change for the analogous loop around the code referenced at
561-564; ensure you still honor MIN_MATCH_LEN and update suffix_idx as before.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fd787445-8328-413f-8d68-dd71d0bc58c0

📥 Commits

Reviewing files that changed from the base of the PR and between c14ade8 and 9ad8430.

📒 Files selected for processing (1)

zstd/src/encoding/match_generator.rs

- check rep1, rep2, and rep0-1 when literals_len is zero - deduplicate repeat offsets before probing to avoid duplicate work - add regression test covering rep2 fallback when rep1 misses

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

polaz · 2026-03-27T06:16:37Z

@coderabbitai re-review

coderabbitai · 2026-03-27T06:16:51Z

@polaz Sure! I'll re-review the PR now.

✅ Actions performed

Full review triggered.

coderabbitai · 2026-03-27T06:17:10Z

@polaz Sure! I'll re-review the PR now.

✅ Actions performed

Full review triggered.

coderabbitai · 2026-03-27T06:30:12Z

@polaz Sure! I'll re-review the PR now.

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 1309-1323: The test
simple_matcher_zero_literal_repcode_checks_rep2_and_rep0_minus1 is misnamed
because with matcher.offset_hist = [99, 4, 10] the rep0 - 1 branch never runs
(rep0-1 == 98); either rename the test to reflect it only checks rep2, or modify
the fixture so rep0-1 equals 10 (for example set offset_hist to [11, 4, 99] or
similar) to actually exercise the rep0 - 1 branch; update the assertion or name
accordingly and keep this change localized in MatchGenerator::repcode_candidate
test setup in match_generator.rs.
- Around line 505-513: The current loop that inserts suffix anchors using
.step_by(self.hash_fill_step) (iterating over slice.windows(MIN_MATCH_LEN) in
match_generator.rs) can skip the final tail start(s) when hash_fill_step > 1, so
backfill the tail anchors before advancing suffix_idx: after the stepped loop
that updates last_entry.suffixes with key -> self.suffix_idx + key_index,
explicitly iterate the last one or two window starts at the end of slice (the
final MIN_MATCH_LEN-aligned windows) and insert them into last_entry.suffixes if
missing (using the same key -> self.suffix_idx + index calculation), so that
boundary starts are preserved across blocks even when step_by skips interior
positions; reference symbols: last_entry.suffixes, self.suffix_idx,
self.hash_fill_step, MIN_MATCH_LEN.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 455bc5bc-edd3-4227-ab29-1057895c402a

📥 Commits

Reviewing files that changed from the base of the PR and between bd7b941 and 6a988db.

📒 Files selected for processing (1)

zstd/src/encoding/match_generator.rs

- backfill the last searchable suffix anchor in add_suffixes_till - force full-position seeding for skip_matching blocks - split zero-literal repcode tests into explicit rep2 and rep0-1 cases

polaz · 2026-03-27T06:53:28Z

@coderabbitai re-review

coderabbitai · 2026-03-27T06:58:39Z

@polaz Sure! I'll kick off a full re-review of PR #39 now.

✅ Actions performed

Full review triggered.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

polaz · 2026-03-27T07:25:10Z

@coderabbitai re-review

coderabbitai · 2026-03-27T07:25:24Z

@polaz Sure! I'll kick off a full re-review of PR #39 now.

✅ Actions performed

Full review triggered.

perf(encoding): align fastest matcher with zstd fast path

72fd67d

Copilot AI review requested due to automatic review settings March 26, 2026 22:49

Copilot started reviewing on behalf of polaz March 26, 2026 22:50 View session

style(encoding): satisfy fmt and clippy for matcher

f18eccc

Copilot AI reviewed Mar 26, 2026

View reviewed changes

Comment thread zstd/src/encoding/match_generator.rs

Comment thread zstd/src/encoding/match_generator.rs Outdated

Comment thread zstd/src/encoding/match_generator.rs Outdated

fix(encoding): resolve matcher review feedback

c14ade8

- fix repcode offset validation across full searchable window - make word-wise prefix counting endian-safe on big-endian targets - strengthen matches() invariants and add cross-slice repcode regression test

polaz requested a review from Copilot March 26, 2026 23:14

Copilot started reviewing on behalf of polaz March 26, 2026 23:15 View session

test(encoding): improve patch coverage for matcher fixes

9ad8430

- split endian mismatch-byte logic with compile-time cfg helpers - add regression for out-of-range repcode offset rejection

Copilot AI reviewed Mar 26, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 26, 2026

View reviewed changes

Comment thread zstd/src/encoding/match_generator.rs Outdated

fix(encoding): probe all zero-literal repcode candidates

6a988db

- check rep1, rep2, and rep0-1 when literals_len is zero - deduplicate repeat offsets before probing to avoid duplicate work - add regression test covering rep2 fallback when rep1 misses

polaz requested a review from Copilot March 26, 2026 23:32

Copilot started reviewing on behalf of polaz March 26, 2026 23:33 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 27, 2026

View reviewed changes

Comment thread zstd/src/encoding/match_generator.rs Outdated

Comment thread zstd/src/encoding/match_generator.rs

fix(encoding): preserve boundary anchors with stepped hash fill

4771c92

- backfill the last searchable suffix anchor in add_suffixes_till - force full-position seeding for skip_matching blocks - split zero-literal repcode tests into explicit rep2 and rep0-1 cases

polaz requested a review from Copilot March 27, 2026 06:53

Copilot started reviewing on behalf of polaz March 27, 2026 06:53 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

polaz merged commit 5ca2e4e into main Mar 27, 2026
15 checks passed

polaz deleted the perf/#14-hash-interleaving-match-counting branch March 27, 2026 07:45

sw-release-bot Bot mentioned this pull request Mar 27, 2026

chore: release v0.0.4 #40

Merged

Conversation

polaz commented Mar 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Benchmark

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

polaz commented Mar 27, 2026

Uh oh!

coderabbitai Bot commented Mar 27, 2026

Uh oh!

coderabbitai Bot commented Mar 27, 2026

Uh oh!

coderabbitai Bot commented Mar 27, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

polaz commented Mar 27, 2026

Uh oh!

coderabbitai Bot commented Mar 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

polaz commented Mar 27, 2026

Uh oh!

coderabbitai Bot commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

polaz commented Mar 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 26, 2026 •

edited

Loading

codecov Bot commented Mar 26, 2026 •

edited

Loading