perf(encoding): enable one-shot size hint across levels safely#94
perf(encoding): enable one-shot size hint across levels safely#94
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughcompress_to_vec now fully buffers input and sets a source-size hint on FrameCompressor; match_generator adds MIN_HINTED_WINDOW_LOG (14) to raise the hinted minimum window log; multiple tests were added/updated to validate hinted behavior and FFI-compatible decoding across levels and inputs. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant compress_to_vec as compress_to_vec (one-shot)
participant FrameCompressor
participant RustDecoder as zstd::stream::StreamingDecoder
participant CFFIDecoder as zstd::decode_all (C FFI)
Caller->>compress_to_vec: provide Read source
compress_to_vec->>compress_to_vec: read_to_end -> input Vec
compress_to_vec->>FrameCompressor: new(level)
compress_to_vec->>FrameCompressor: set_source_size_hint(input.len())
compress_to_vec->>FrameCompressor: set_source(&input)
compress_to_vec->>FrameCompressor: compress() -> compressed_frame
compress_to_vec->>Caller: return compressed_frame
Note over Caller,FrameCompressor: Validation flows in tests
Caller->>RustDecoder: pass compressed_frame
RustDecoder->>Caller: decoded bytes -> compare with original
Caller->>CFFIDecoder: pass compressed_frame
CFFIDecoder->>Caller: decoded bytes -> compare with original
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR aims to safely enable source-size hint propagation for one-shot encoding helpers across compression levels, while preserving correctness and C zstd (FFI) interoperability for small inputs.
Changes:
- Updated
compress_to_vec()to always apply a source-size hint when encoding a one-shot buffer. - Clamped hinted window tuning to a conservative minimum to avoid small-input regressions and FFI incompatibilities.
- Added targeted regression tests (including level/size matrices) to validate hinted streams against both Rust and C zstd decoders.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| zstd/tests/cross_validation.rs | Adds a targeted regression test ensuring hinted Rust output is accepted by both Rust and FFI decoders. |
| zstd/src/tests/roundtrip_integrity.rs | Adds a test asserting compress_to_vec(Default) benefits from auto-hinting on tiny inputs (smaller advertised window). |
| zstd/src/encoding/mod.rs | Changes compress_to_vec() implementation to use FrameCompressor with an explicit source-size hint. |
| zstd/src/encoding/match_generator.rs | Introduces a conservative hinted window floor (MIN_HINTED_WINDOW_LOG) and updates related unit tests. |
| zstd/src/encoding/frame_compressor.rs | Adds FFI-compatibility tests for hinted compression across levels and small input sizes. |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
zstd/src/encoding/match_generator.rs (1)
163-172:⚠️ Potential issue | 🟡 MinorUpdate the source-size adjustment docs/comments to match the new 16 KiB floor.
The implementation now clamps hinted windows to
MIN_HINTED_WINDOW_LOG, but the surrounding docs still say zero/small hints fall back toMIN_WINDOW_LOG/ 1 KiB. That mismatch will mislead future maintenance and test updates.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@zstd/src/encoding/match_generator.rs` around lines 163 - 172, The comment above adjust_params_for_source_size is out of date: update its doc to reflect that small or zero src_size values now clamp to MIN_HINTED_WINDOW_LOG (the new 16 KiB floor) rather than falling back to MIN_WINDOW_LOG (1 KiB); mention both constants (MIN_WINDOW_LOG and MIN_HINTED_WINDOW_LOG) and explain that src_log is computed as ceil_log2(src_size) then clamped to at least MIN_HINTED_WINDOW_LOG so readers know the effective minimum window is 16 KiB for hinted sizes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 4218-4269: The test
fastest_hint_iteration_23_sequences_reconstruct_source may never exercise the
Sequence::Triple branch; add a boolean flag (e.g., saw_triple) initialized
before calling driver.start_matching, set it to true inside the Sequence::Triple
arm of the closure, and after start_matching assert!(saw_triple, "fixture must
emit at least one match") to guarantee the match reconstruction path is
exercised; reference the test function
fastest_hint_iteration_23_sequences_reconstruct_source, the
driver.start_matching call, and the Sequence::Triple match arm when making this
change.
In `@zstd/src/encoding/mod.rs`:
- Around line 40-43: The compress_to_vec helper now eagerly reads the entire
Read into memory before compressing (see the source that reads into input with
read_to_end), so add a short rustdoc on compress_to_vec documenting this
eager-buffering behavior and the memory implications: explicitly state that peak
memory usage can be roughly input_size + output_size and recommend callers use
the streaming APIs (e.g., the incremental compressor functions) when memory
usage or very large inputs are a concern; keep the note concise and place it on
the compress_to_vec function.
---
Outside diff comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 163-172: The comment above adjust_params_for_source_size is out of
date: update its doc to reflect that small or zero src_size values now clamp to
MIN_HINTED_WINDOW_LOG (the new 16 KiB floor) rather than falling back to
MIN_WINDOW_LOG (1 KiB); mention both constants (MIN_WINDOW_LOG and
MIN_HINTED_WINDOW_LOG) and explain that src_log is computed as
ceil_log2(src_size) then clamped to at least MIN_HINTED_WINDOW_LOG so readers
know the effective minimum window is 16 KiB for hinted sizes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c9c3d4b9-ec3d-4d17-a0a5-588d19de0291
📒 Files selected for processing (5)
zstd/src/encoding/frame_compressor.rszstd/src/encoding/match_generator.rszstd/src/encoding/mod.rszstd/src/tests/roundtrip_integrity.rszstd/tests/cross_validation.rs
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
zstd/src/encoding/match_generator.rs:160
- The doc/comment for
adjust_params_for_source_sizesays the window cap is clamped toMIN_WINDOW_LOGand mentions a 1 KiB minimum window, but the implementation now additionally clamps toMIN_HINTED_WINDOW_LOG(16 KiB). Please update the documentation to reflect the actual hinted floor so future changes/tests don’t rely on outdated behavior.
/// Adjust level parameters for a known source size.
///
/// This derives a cap from `ceil(log2(src_size))`, then clamps it to
/// [`MIN_WINDOW_LOG`]. A zero-byte size hint is treated as
/// [`MIN_WINDOW_LOG`]. This keeps tables bounded for
/// small inputs while preserving the encoder's minimum supported window.
|
@coderabbitai Re outside-diff note on |
|
Tip For best results, initiate chat on the files or code changes.
🧠 Learnings used |
7d32be9 to
a4d9c51
Compare
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'structured-zstd vs C FFI (x86_64-gnu)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.30.
| Benchmark suite | Current: a4d9c51 | Previous: ceb9e27 | Ratio |
|---|---|---|---|
compress/better/small-4k-log-lines/matrix/c_ffi |
0.099 ms |
0.076 ms |
1.30 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @polaz
Summary
compress_to_vec()for all levelsValidation
Summary by CodeRabbit
Improvements
Tests