perf(encoding): align fastest matcher with zstd fast path#39
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughIntroduces configurable hash-fill interleaving, repcode pre-checks, offset-history tracking, and word-sized match-length scanning in the encoder match generator; threads Changes
Sequence Diagram(s)sequenceDiagram
participant Encoder
participant MatchGenerator
participant HashTable
participant Output
Encoder->>MatchGenerator: request next_sequence(pos)
MatchGenerator->>MatchGenerator: compute literals_len
MatchGenerator->>MatchGenerator: repcode_candidate(pos, literals_len)
alt repcode hit
MatchGenerator->>Output: emit Sequence::Triple (rep offset)
MatchGenerator->>MatchGenerator: encode_offset_with_history(offset, literals_len, &mut offset_hist)
MatchGenerator->>MatchGenerator: add_suffixes_till(end, fill_step)
else no repcode
MatchGenerator->>HashTable: lookup current & interleaved positions (uses hash_fill_step)
HashTable-->>MatchGenerator: match entries
MatchGenerator->>MatchGenerator: count match length (word-wise)
MatchGenerator->>Output: emit Sequence::Triple (match)
MatchGenerator->>MatchGenerator: encode_offset_with_history(offset, literals_len, &mut offset_hist)
MatchGenerator->>MatchGenerator: add_suffixes_till(end, fill_step)
end
MatchGenerator-->>Encoder: sequence or None
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR updates the Fastest/simple match generator to more closely follow zstd’s fast-path behavior, targeting higher compression throughput while maintaining roundtrip correctness.
Changes:
- Add stepped hash fill for
Fastest(pipeline-style fill rate) while still probing every position. - Add repcode-aware candidate selection (limited to the zero-literal follow-up path) and track repeat offset history.
- Speed up match-length counting via word-sized XOR + trailing-zeros, and update tests to handle overlapping match replay semantics.
Comments suppressed due to low confidence (1)
zstd/src/encoding/match_generator.rs:399
is_lastno longer changes behavior here (both branches slice&match_entry.data[match_index..]). This conditional can be removed to simplify the control flow and avoid suggesting there’s a special-case for the last entry when there isn’t.
for match_entry in self.window.iter() {
if let Some(match_index) = match_entry.suffixes.get(key) {
let match_slice = &match_entry.data[match_index..];
// Check how long the common prefix actually is
let match_len = Self::common_prefix_len(match_slice, data_slice);
// Collisions in the suffix store might make this check fail
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
- fix repcode offset validation across full searchable window - make word-wise prefix counting endian-safe on big-endian targets - strengthen matches() invariants and add cross-slice repcode regression test
- split endian mismatch-byte logic with compile-time cfg helpers - add regression for out-of-range repcode offset rejection
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (2)
zstd/src/encoding/match_generator.rs:472
common_prefix_lenintroducesunsafeunaligned word reads but doesn’t document the safety invariants. Please add a brief// SAFETY:comment explaining thatoff + chunk <= maxguaranteeslhs.add(off)..lhs.add(off+chunk)andrhs...are in-bounds and properly aligned forread_unaligned.
}
}
zstd/src/encoding/match_generator.rs:1054
- The updated
matches()test now only replays sequences and asserts roundtrip equality, but it no longer checks that specific match decisions (e.g., offsets/match_len boundaries) are made. This can allow regressions that still roundtrip but significantly change compression ratio/perf. Consider restoring a small number of targeted assertions for at least one or two expectedSequence::Tripleoutputs (or sequence count/order) in these scenarios while keeping the overlap-safe replay logic.
#[test]
fn matches() {
let mut matcher = MatchGenerator::new(1000);
let mut original_data = Vec::new();
let mut reconstructed = Vec::new();
let replay_sequence = |seq: Sequence<'_>, reconstructed: &mut Vec<u8>| match seq {
Sequence::Literals { literals } => {
assert!(!literals.is_empty());
reconstructed.extend_from_slice(literals);
}
Sequence::Triple {
literals,
offset,
match_len,
} => {
assert!(offset > 0);
assert!(match_len >= MIN_MATCH_LEN);
reconstructed.extend_from_slice(literals);
assert!(offset <= reconstructed.len());
let start = reconstructed.len() - offset;
for i in 0..match_len {
let byte = reconstructed[start + i];
reconstructed.push(byte);
}
}
};
matcher.add_data(
alloc::vec![0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
SuffixStore::with_capacity(100),
|_, _| {},
);
original_data.extend_from_slice(&[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]);
matcher.next_sequence(|seq| replay_sequence(seq, &mut reconstructed));
assert!(!matcher.next_sequence(|_| {}));
matcher.add_data(
alloc::vec![
1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0,
],
SuffixStore::with_capacity(100),
|_, _| {},
);
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
zstd/src/encoding/match_generator.rs (1)
499-509:⚠️ Potential issue | 🟠 MajorDon't reuse the stepped fill when seeding a skipped block.
skip_matching()relies onadd_suffixes_till(len)to make the entire current block searchable by later blocks. Withstep_by(self.hash_fill_step), Fastest now inserts only every third start into the suffix table, and the omitted starts are never revisited becausesuffix_idxjumps straight tolen. A skipped/raw block can therefore stop being a valid match source at those positions.Possible fix
- fn add_suffixes_till(&mut self, idx: usize) { + fn add_suffixes_till(&mut self, idx: usize, fill_step: usize) { let last_entry = self.window.last_mut().unwrap(); if last_entry.data.len() < MIN_MATCH_LEN { return; } let slice = &last_entry.data[self.suffix_idx..idx]; for (key_index, key) in slice .windows(MIN_MATCH_LEN) .enumerate() - .step_by(self.hash_fill_step) + .step_by(fill_step) { if !last_entry.suffixes.contains_key(key) { last_entry.suffixes.insert(key, self.suffix_idx + key_index); } } } ... - self.add_suffixes_till(self.suffix_idx + match_len); + self.add_suffixes_till(self.suffix_idx + match_len, self.hash_fill_step); ... - self.add_suffixes_till(len); + self.add_suffixes_till(len, 1);Also applies to: 561-564
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@zstd/src/encoding/match_generator.rs` around lines 499 - 509, add_suffixes_till currently skips inserting some suffix starts because it uses .step_by(self.hash_fill_step) on the window slice, which breaks skip_matching()'s contract of making the entire block searchable; remove the .step_by(...) so the loop over slice.windows(MIN_MATCH_LEN).enumerate() inserts every start between suffix_idx and idx (i.e., iterate every position), and make the same change for the analogous loop around the code referenced at 561-564; ensure you still honor MIN_MATCH_LEN and update suffix_idx as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 516-523: repcode_candidate currently only probes offset_hist[1]
for zero-literal repeats but encode_offset_with_history can map repcodes to
offset_hist[1], offset_hist[2], and offset_hist[0] - 1; update repcode_candidate
to, when literals_len == 0, probe those three candidate offsets (offset_hist[1],
offset_hist[2], and offset_hist[0].saturating_sub(1)) in turn, call
offset_match_len for each, and return the first (offset, match_len) where
match_len >= MIN_MATCH_LEN; ensure you guard against duplicates and out-of-range
offsets when reading offset_hist.
---
Outside diff comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 499-509: add_suffixes_till currently skips inserting some suffix
starts because it uses .step_by(self.hash_fill_step) on the window slice, which
breaks skip_matching()'s contract of making the entire block searchable; remove
the .step_by(...) so the loop over slice.windows(MIN_MATCH_LEN).enumerate()
inserts every start between suffix_idx and idx (i.e., iterate every position),
and make the same change for the analogous loop around the code referenced at
561-564; ensure you still honor MIN_MATCH_LEN and update suffix_idx as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: fd787445-8328-413f-8d68-dd71d0bc58c0
📒 Files selected for processing (1)
zstd/src/encoding/match_generator.rs
- check rep1, rep2, and rep0-1 when literals_len is zero - deduplicate repeat offsets before probing to avoid duplicate work - add regression test covering rep2 fallback when rep1 misses
|
@coderabbitai re-review |
|
✅ Actions performedFull review triggered. |
2 similar comments
|
✅ Actions performedFull review triggered. |
|
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 1309-1323: The test
simple_matcher_zero_literal_repcode_checks_rep2_and_rep0_minus1 is misnamed
because with matcher.offset_hist = [99, 4, 10] the rep0 - 1 branch never runs
(rep0-1 == 98); either rename the test to reflect it only checks rep2, or modify
the fixture so rep0-1 equals 10 (for example set offset_hist to [11, 4, 99] or
similar) to actually exercise the rep0 - 1 branch; update the assertion or name
accordingly and keep this change localized in MatchGenerator::repcode_candidate
test setup in match_generator.rs.
- Around line 505-513: The current loop that inserts suffix anchors using
.step_by(self.hash_fill_step) (iterating over slice.windows(MIN_MATCH_LEN) in
match_generator.rs) can skip the final tail start(s) when hash_fill_step > 1, so
backfill the tail anchors before advancing suffix_idx: after the stepped loop
that updates last_entry.suffixes with key -> self.suffix_idx + key_index,
explicitly iterate the last one or two window starts at the end of slice (the
final MIN_MATCH_LEN-aligned windows) and insert them into last_entry.suffixes if
missing (using the same key -> self.suffix_idx + index calculation), so that
boundary starts are preserved across blocks even when step_by skips interior
positions; reference symbols: last_entry.suffixes, self.suffix_idx,
self.hash_fill_step, MIN_MATCH_LEN.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 455bc5bc-edd3-4227-ab29-1057895c402a
📒 Files selected for processing (1)
zstd/src/encoding/match_generator.rs
- backfill the last searchable suffix anchor in add_suffixes_till - force full-position seeding for skip_matching blocks - split zero-literal repcode tests into explicit rep2 and rep0-1 cases
|
@coderabbitai re-review |
|
✅ Actions performedFull review triggered. |
|
@coderabbitai re-review |
|
✅ Actions performedFull review triggered. |
Summary
fastestmatcher behavior with donorfaststrategyValidation
cargo fmt -- --checkcargo clippy --all-targets -- -D warningscargo nextest run --workspaceBenchmark
zstdon project corpus; output sizes match for covered casesCloses #14
Summary by CodeRabbit