feat(encoding): numeric compression levels (1-22) API#63
Conversation
- add CompressionLevel::Level(i32) and canonical from_level mapping - port level table, source-size hinting, and matcher sizing behavior - align CLI parsing/range validation and update docs/tests
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds C zstd–compatible numeric compression levels (via Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI/User
participant Encoder as StreamingEncoder
participant Matcher as MatchGenerator
participant Compressor as FrameCompressor
CLI->>Encoder: set_source_size_hint(size)
activate Encoder
Encoder->>Matcher: set_source_size_hint(size)
activate Matcher
Matcher->>Matcher: store hint for next reset
deactivate Matcher
deactivate Encoder
CLI->>Encoder: compress(data, level, store)
activate Encoder
Encoder->>Matcher: reset(level)
activate Matcher
Matcher->>Matcher: resolve LevelParams\napply source_size_hint\nclamp window_log
deactivate Matcher
Encoder->>Compressor: compress_block_encoded(...)
activate Compressor
Compressor->>Matcher: get_next_space()
Matcher-->>Compressor: buffer sized by adjusted window (rgba(70,130,180,0.5))
deactivate Compressor
deactivate Encoder
Encoder-->>CLI: emit compressed frame (header window adjusted)
sequenceDiagram
participant Caller as Caller
participant API as CompressionLevel::from_level
participant Table as Level Parameter Table
Caller->>API: from_level(7)
activate API
API->>Table: lookup level 7
activate Table
Table-->>API: LevelParams{matcher: Lazy2, window_log: X, ...}
deactivate Table
API-->>Caller: CompressionLevel::Better
deactivate API
Caller->>API: from_level(15)
activate API
API->>Table: lookup level 15
activate Table
Table-->>API: LevelParams{matcher: BtLazy2, window_log: Y, ...}
deactivate Table
API-->>Caller: CompressionLevel::Level(15)
deactivate API
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Pull request overview
Adds a C zstd-compatible numeric compression level API (including negative “ultra-fast” levels) and introduces source-size-aware tuning so small inputs can avoid oversized windows/tables, with corresponding CLI and test updates.
Changes:
- Introduces
CompressionLevel::Level(i32)+CompressionLevel::from_level(i32)and a 1–22 level parameter table (plus negative levels). - Adds
set_source_size_hint()plumbing throughMatcher,FrameCompressor, andStreamingEncoder, and forwards pledged content size as a matcher hint. - Updates CLI to accept numeric levels (including negative) and adds
--store; expands roundtrip/tests for new behavior.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| zstd/src/encoding/mod.rs | Extends CompressionLevel and Matcher with numeric levels + source-size hint hook. |
| zstd/src/encoding/match_generator.rs | Implements level parameter resolution, source-size-based parameter capping, and negative-level handling. |
| zstd/src/encoding/frame_compressor.rs | Exposes set_source_size_hint() and supports Level(_) in encoded compression path. |
| zstd/src/encoding/streaming_encoder.rs | Forwards pledged size/hints to matcher; supports Level(_) across streaming paths; adds tests. |
| zstd/src/tests/roundtrip_integrity.rs | Adds comprehensive roundtrip coverage for numeric levels, clamping, negative levels, and size-hint behavior. |
| cli/src/main.rs | Switches --level to i32 numeric levels with bounds checking; adds --store; forwards source size hint. |
| README.md | Documents numeric levels and usage examples. |
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'structured-zstd vs C FFI'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.
| Benchmark suite | Current: 03b33e2 | Previous: bbd826e | Ratio |
|---|---|---|---|
compress/default/small-1k-random/matrix/pure_rust |
5.546 ms |
4.409 ms |
1.26 |
compress/best/small-1k-random/matrix/c_ffi |
0.508 ms |
0.401 ms |
1.27 |
compress/default/small-10k-random/matrix/pure_rust |
7.069 ms |
5.932 ms |
1.19 |
compress/best/small-10k-random/matrix/c_ffi |
0.472 ms |
0.391 ms |
1.21 |
compress/default/small-4k-log-lines/matrix/pure_rust |
5.156 ms |
4.328 ms |
1.19 |
compress/best/small-4k-log-lines/matrix/c_ffi |
0.426 ms |
0.359 ms |
1.19 |
compress/best/decodecorpus-z000033/matrix/pure_rust |
80.399 ms |
64.198 ms |
1.25 |
compress/best/high-entropy-1m/matrix/pure_rust |
79.769 ms |
64.446 ms |
1.24 |
compress/best/high-entropy-1m/matrix/c_ffi |
1.111 ms |
0.906 ms |
1.23 |
compress/default/low-entropy-1m/matrix/pure_rust |
10.725 ms |
8.754 ms |
1.23 |
compress/best/low-entropy-1m/matrix/c_ffi |
1.382 ms |
1.13 ms |
1.22 |
compress/default/large-log-stream/matrix/c_ffi |
3.982 ms |
3.324 ms |
1.20 |
compress/best/large-log-stream/matrix/c_ffi |
12.748 ms |
10.602 ms |
1.20 |
decompress/fastest/high-entropy-1m/c_stream/matrix/c_ffi |
0.033 ms |
0.026 ms |
1.27 |
decompress/default/high-entropy-1m/c_stream/matrix/c_ffi |
0.033 ms |
0.027 ms |
1.22 |
decompress/better/high-entropy-1m/c_stream/matrix/c_ffi |
0.033 ms |
0.026 ms |
1.27 |
compress-dict/default/small-4k-log-lines/matrix/c_ffi_without_dict |
0.005 ms |
0.004 ms |
1.25 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @polaz
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Around line 49-50: Remove the duplicated checklist entry by consolidating the
two bullets into one: keep the existing "Numeric levels `0` (default), `1–22`,
and negative ultra-fast levels via `CompressionLevel::from_level(n)` (C zstd
compatible numbering)" bullet and delete the separate "Negative levels for
ultra-fast compression" bullet so the README checklist only mentions negative
levels once.
In `@zstd/src/encoding/match_generator.rs`:
- Around line 402-403: The Simple backend's reuse of pooled SuffixStore buffers
doesn't validate that a reused store's slot count matches the current block
size: when self.slice_size is reduced (set by
self.base_slice_size.min(max_window_size) and self.reported_window_size =
max_window_size) a previously pooled SuffixStore may be too small for the new
space.len(), causing high collision rates; update the reuse logic that pulls
from suffix_pool (and the analogous code at the other occurrence) to check the
chosen pool entry's capacity/slot count against space.len() and recreate a fresh
SuffixStore when the pooled store's slot count is smaller than space.len(),
ensuring we only reuse pools that are at least as large as the current space.
- Around line 76-79: The reset() path currently derives each level's hash_bits
via dfast_hash_bits_for_window(max_window_size), which changes defaults when no
source-size hint is provided; instead, only shrink the Dfast table when a hint
exists. In reset(), gate the call to dfast_hash_bits_for_window behind
hint.is_some() (or otherwise only apply the smaller value when hint.is_some()),
and otherwise set hash_bits to the documented default DFAST_HASH_BITS
(respecting MIN_WINDOW_LOG where applicable). Apply the same guard to the other
occurrences noted (around the lines referenced) so unhinted resets preserve the
original DFAST defaults.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6d342997-4603-4877-a6a5-1ca2e012f54e
📒 Files selected for processing (2)
README.mdzstd/src/encoding/match_generator.rs
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
zstd/src/encoding/match_generator.rs (2)
563-566:⚠️ Potential issue | 🟠 MajorReject undersized pooled
SuffixStores when the slice grows again.
get_next_space()now resizes pooledVec<u8>buffers correctly, but Lines 563-566 still reuse whateversuffix_pool.pop()returns. After a 1 KiB hinted frame, the next full-size Fastest/negative-level frame can index a much larger block with that 1 KiB suffix table, which stays correct but tanks match quality. Recreate the store whenstore.slots.len() < space.len().🔧 Suggested capacity check for `suffix_pool` reuse
- let suffixes = self - .suffix_pool - .pop() - .unwrap_or_else(|| SuffixStore::with_capacity(space.len())); + let suffixes = match self.suffix_pool.pop() { + Some(store) if store.slots.len() >= space.len() => store, + _ => SuffixStore::with_capacity(space.len()), + };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@zstd/src/encoding/match_generator.rs` around lines 563 - 566, The pooled SuffixStore popped from self.suffix_pool may be undersized for the new space; update the reuse logic around the suffixes assignment (the pop/unwrap_or_else block that currently constructs suffixes) to check the popped store's capacity/slot count (e.g., store.slots.len()) against space.len() and if store.slots.len() < space.len() create a new SuffixStore::with_capacity(space.len()) instead of reusing the undersized store; keep using self.suffix_pool.pop() when the capacity is sufficient so we only recreate when necessary.
359-360:⚠️ Potential issue | 🟠 MajorKeep unhinted Dfast levels on the documented default table size.
Line 424 still derives
hash_bitsfrommax_window_sizeeven when Line 359 consumed no hint, so plainCompressionLevel::Level(2)now gets a 19-bit Dfast table while levels 3/4 get 20 bits. That changes the default Dfast profile on the non-hinted path; if levels 2–4 are still supposed to share sizing until explicit per-level tuning exists, keepDFAST_HASH_BITSunless a source-size hint was supplied, or makehash_bitsan explicitLevelParamsfield.🔧 One safe way to keep shrinking on the hint path only
fn reset(&mut self, level: CompressionLevel) { let hint = self.source_size_hint.take(); + let hinted = hint.is_some(); let params = Self::level_params(level, hint); let max_window_size = 1usize << params.window_log; @@ let dfast = self .dfast_match_generator .get_or_insert_with(|| DfastMatchGenerator::new(max_window_size)); dfast.max_window_size = max_window_size; dfast.lazy_depth = params.lazy_depth; - dfast.set_hash_bits(dfast_hash_bits_for_window(max_window_size)); + dfast.set_hash_bits(if hinted { + dfast_hash_bits_for_window(max_window_size) + } else { + DFAST_HASH_BITS + }); let vec_pool = &mut self.vec_pool; dfast.reset(|mut data| {Also applies to: 423-424
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@zstd/src/encoding/match_generator.rs` around lines 359 - 360, The code currently derives hash_bits from max_window_size in Self::level_params even when no source_size_hint was provided (because self.source_size_hint was consumed earlier), causing unhinted CompressionLevel::Level(2) to get an unexpectedly larger Dfast table; change level parameter logic so that when no hint was supplied (i.e., source_size_hint is None) level_params uses the documented default DFAST_HASH_BITS instead of computing from max_window_size—either by adding an explicit flag/field to LevelParams indicating whether a hint was present or by making hash_bits an explicit LevelParams field and only compute/override it when source_size_hint.is_some(); update functions referencing hash_bits (e.g., level_params and any callers that use LevelParams.hash_bits) to follow this contract so unhinted levels retain the default DFAST_HASH_BITS.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/mod.rs`:
- Around line 91-110: Update the CompressionLevel docs to explain that
Level(i32) values above 11 (e.g., Level(12..=22)) exceed the named preset
hierarchy so “Best” is only the best named preset, not the absolute
highest-ratio mode, and add the same HC/hash-chain caveat already present on
Better/Best: large numeric levels rely on the lazy2 hash-chain backend and are
subject to the ~4 GiB + window limitation caused by the early-return guard in
HcMatchGenerator::insert_position (abs_pos >= u32::MAX) until table rebasing is
implemented; mirror the wording and placement used for Better/Best so readers
understand Level(n) may be stronger but not safer for very large streams.
---
Duplicate comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 563-566: The pooled SuffixStore popped from self.suffix_pool may
be undersized for the new space; update the reuse logic around the suffixes
assignment (the pop/unwrap_or_else block that currently constructs suffixes) to
check the popped store's capacity/slot count (e.g., store.slots.len()) against
space.len() and if store.slots.len() < space.len() create a new
SuffixStore::with_capacity(space.len()) instead of reusing the undersized store;
keep using self.suffix_pool.pop() when the capacity is sufficient so we only
recreate when necessary.
- Around line 359-360: The code currently derives hash_bits from max_window_size
in Self::level_params even when no source_size_hint was provided (because
self.source_size_hint was consumed earlier), causing unhinted
CompressionLevel::Level(2) to get an unexpectedly larger Dfast table; change
level parameter logic so that when no hint was supplied (i.e., source_size_hint
is None) level_params uses the documented default DFAST_HASH_BITS instead of
computing from max_window_size—either by adding an explicit flag/field to
LevelParams indicating whether a hint was present or by making hash_bits an
explicit LevelParams field and only compute/override it when
source_size_hint.is_some(); update functions referencing hash_bits (e.g.,
level_params and any callers that use LevelParams.hash_bits) to follow this
contract so unhinted levels retain the default DFAST_HASH_BITS.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: bd1c5550-fd80-4298-9c10-c14dacdad625
📒 Files selected for processing (7)
README.mdcli/src/main.rszstd/src/encoding/frame_compressor.rszstd/src/encoding/match_generator.rszstd/src/encoding/mod.rszstd/src/encoding/streaming_encoder.rszstd/src/tests/roundtrip_integrity.rs
|
@coderabbitai full review |
- use source_size_hint in CLI instead of hard pledged size from metadata - rename dfast hash helper to hash_index for clearer field/method separation - pin late set_source_size_hint InvalidInput behavior after write
|
@coderabbitai Fixed outside-diff note in |
|
Tip For best results, initiate chat on the files or code changes.
✏️ Learnings added
🧠 Learnings used |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 2196-2202: Replace the loose inequality check with a deterministic
assertion pinning the hinted Dfast table size: instead of asserting
hinted_tables < full_tables, assert that driver.dfast_matcher().short_hash.len()
equals the expected size computed by calling
dfast_hash_bits_for_window(MIN_WINDOW_LOG) (or the equivalent constant value for
MIN_WINDOW_LOG) so regressions in dfast_hash_bits_for_window() are caught.
In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 104-109: Update the doc comment for set_source_size_hint to
describe its visible effects: state that besides tuning matcher parameters it
can also reduce the emitted frame window and influence block sizing (thus
changing the layout of emitted blocks/frames), and note it must be called before
the first write and that it is different from set_pledged_content_size;
reference set_source_size_hint, set_pledged_content_size, and write so reviewers
can locate the docs and the related behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a253bb40-d7e1-4ccf-9b54-bbb91762387f
📒 Files selected for processing (3)
cli/src/main.rszstd/src/encoding/match_generator.rszstd/src/encoding/streaming_encoder.rs
- keep dictionary source-size hint scoped to payload bytes in FrameCompressor - relax dictionary window assertions to roundtrip + nonincreasing-window guarantees - pin dfast hinted table size and document visible set_source_size_hint effects
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
♻️ Duplicate comments (1)
zstd/src/encoding/frame_compressor.rs (1)
163-170:⚠️ Potential issue | 🟡 MinorUpdate the doc comment to match the new payload-only hint semantics.
Lines 168-170 still describe the old
dict + payloadbehavior, but Lines 190-193 now forward onlysize_hint, and the new dictionary tests expect a non-increasing advertised window. As written, this public contract promises a larger window than callers will actually get.📝 Suggested doc fix
- /// When dictionary priming is active, the effective matcher hint also - /// includes primed dictionary history bytes in addition to frame payload - /// bytes so the advertised window can cover both. + /// When dictionary priming is active, this hint still refers to frame + /// payload bytes only. Dictionary history is applied separately during + /// priming and does not expand the advertised frame window.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@zstd/src/encoding/frame_compressor.rs` around lines 163 - 170, The doc comment for the frame size hint still claims the hint includes primed dictionary bytes, but the implementation now forwards only size_hint; update the comment above the Provide a hint... block to state that the hint is payload-only (referring to size_hint) and does NOT include any primed dictionary history, and that the advertised window is computed from that payload-only size_hint (so callers may observe a non-increasing advertised window when a dictionary is present). Mention the symbol size_hint and the frame compressor behavior (advertised window selection) so readers know the contract matches the new forwarding logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@zstd/src/encoding/frame_compressor.rs`:
- Around line 163-170: The doc comment for the frame size hint still claims the
hint includes primed dictionary bytes, but the implementation now forwards only
size_hint; update the comment above the Provide a hint... block to state that
the hint is payload-only (referring to size_hint) and does NOT include any
primed dictionary history, and that the advertised window is computed from that
payload-only size_hint (so callers may observe a non-increasing advertised
window when a dictionary is present). Mention the symbol size_hint and the frame
compressor behavior (advertised window selection) so readers know the contract
matches the new forwarding logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6ab25ddc-cfba-4e85-898b-9d03286e0989
📒 Files selected for processing (3)
zstd/src/encoding/frame_compressor.rszstd/src/encoding/match_generator.rszstd/src/encoding/streaming_encoder.rs
- update FrameCompressor::set_source_size_hint docs to payload-only semantics - document that dictionary priming does not inflate advertised window
|
@coderabbitai Fixed duplicate note in |
|
Tip For best results, initiate chat on the files or code changes.
(ミ・ᴥ・ミ) 🐇 noted! ✏️ Learnings added
🧠 Learnings used |
Summary
CompressionLevel::Level(i32)variant andfrom_level()constructor for C zstd compatible numeric level APIset_source_size_hint()onMatchertrait,FrameCompressor, andStreamingEncoder— caps matcher/window parameters fromceil(log2(src_size))with aMIN_WINDOW_LOGfloor--storeflag for uncompressed framesTechnical Details
Each numeric level resolves to a
LevelParamsstruct containing the matcher backend, window log, hash fill step, lazy depth, and hash-chain config. Named variants map exactly to their numeric equivalents:Fastest=1,Default=3,Better=7,Best=11. Level 0 = default (level 3), matching C zstd semantics.For small inputs,
set_source_size_hint()adjustswindow_logand HC table sizes proportionally to source size, includingsrc_size == 0(clamped to minimum window), avoiding multi-MiB allocations for tiny payloads.StreamingEncoder::set_pledged_content_size()also forwards the pledged size as matcher hint.Window sizes are capped at 64 MiB (window_log=26) to stay within the decoder's 100 MiB limit.
Known Limitations
Test Plan
from_level(n)matches named variants for n=1,3,7,11MatchGeneratorDriver)cargo nextest run --workspace,cargo build --workspace,cargo fmt --allCloses #21
Summary by CodeRabbit
New Features
Documentation
Tests
Compatibility Note
CompressionLevel::Level(i32)expands the public enum shape and is semver-breaking for downstream crates that exhaustivelymatchonCompressionLevelwithout a wildcard arm.Note