feat(encoding): add streaming write encoder#45
Conversation
- add StreamingEncoder with write/flush/finish frame lifecycle - switch CLI compression path to incremental io::copy->encoder flow - add roundtrip, flush, interop, and finished-state tests Closes #9
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds a public StreamingEncoder implementing io::Write (re-exported), a new streaming encoder implementation with tests, tighter CLI compress preflight and atomic temp-file replace logic, and README updates including a streaming example and checklist marks. Changes
Sequence DiagramsequenceDiagram
rect rgba(63,81,181,0.5)
participant CLI as CLI Process
end
rect rgba(76,175,80,0.5)
participant Temp as Temp Output File
end
rect rgba(255,193,7,0.5)
participant SE as StreamingEncoder
participant Matcher as Matcher/Compressor
end
rect rgba(233,30,99,0.5)
participant Out as Final Destination
end
CLI->>Temp: create_temporary_output_file()
CLI->>SE: StreamingEncoder::new(Temp, level)
activate SE
CLI->>SE: write(chunk_1)
Note over SE: emit FrameHeader lazily on first write
SE->>Temp: write(FrameHeader)
SE->>SE: buffer(chunk_1)
SE-->>CLI: Ok(bytes_written)
CLI->>SE: write(chunk_2)
SE->>SE: append(chunk_2)
Note over SE,Matcher: when pending >= block_capacity -> compress block
SE->>Matcher: compress_block(pending)
Matcher->>Temp: write(compressed_block)
SE-->>CLI: Ok(bytes_written)
CLI->>SE: flush()
Note over SE: encode partial non-final block and flush drain
SE->>Matcher: compress_block(pending, is_last=false)
Matcher->>Temp: write(block)
SE->>Temp: flush()
SE-->>CLI: Ok(())
CLI->>SE: finish()
Note over SE: finalize frame (last block + optional checksum)
SE->>Matcher: compress_block(remaining, is_last=true)
Matcher->>Temp: write(last_block)
SE->>Temp: write(checksum) if enabled
SE->>Temp: flush()
SE-->>CLI: Ok(Temp)
CLI->>CLI: replace_output_file(temp -> destination)
CLI-->>Out: rename/replace (preserve permissions, Windows backup on failure)
deactivate SE
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
Adds a new encoding::StreamingEncoder API to enable incremental Zstd frame encoding via io::Write, and migrates the CLI compression path to use it (enabling true pipe/stream-friendly compression).
Changes:
- Introduce
encoding::StreamingEncoder<W, M>implementingWritewithflush()andfinish()semantics. - Export
StreamingEncoderfromencodingand add unit + interop tests (incl. C zstd decode understd). - Switch CLI compression to
std::io::copyintoStreamingEncoder, thenfinish()the frame; update README status + usage snippet.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| zstd/src/encoding/streaming_encoder.rs | New streaming Write-based encoder implementation + tests |
| zstd/src/encoding/mod.rs | Wire up module + public re-export |
| cli/src/main.rs | Use streaming encoder in CLI compression pipeline |
| README.md | Document streaming encoder and update feature checklist |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cli/src/main.rs (1)
105-114:⚠️ Potential issue | 🟡 MinorCLI accepts compression levels that StreamingEncoder will reject at runtime.
The CLI accepts levels 3 (
Better) and 4 (Best) via the--levelargument, butStreamingEncoder::encode_block()returns an error for these levels with "streaming encoder currently supports Uncompressed/Fastest/Default only". Users selecting these levels will get a runtime error during compression.Consider either:
- Rejecting unsupported levels earlier with a clear CLI error message, or
- Documenting the limitation in the help text (line 37-44)
Option 1: Early validation with clear message
_ => { - unimplemented!("unsupported compression level: {}", level); + return Err(color_eyre::eyre::eyre!( + "compression level {} is not yet implemented (supported: 0-2)", + level + )); } }; + // StreamingEncoder currently only supports Uncompressed/Fastest/Default + if matches!(compression_level, CompressionLevel::Better | CompressionLevel::Best) { + return Err(color_eyre::eyre::eyre!( + "compression levels Better (3) and Best (4) are not yet implemented" + )); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cli/src/main.rs` around lines 105 - 114, The CLI currently maps `level` to `structured_zstd::encoding::CompressionLevel` including 3/4 (CompressionLevel::Better/Best) which `StreamingEncoder::encode_block()` doesn't support; update the code that parses/handles `level` so only 0..=2 are accepted and produce a clear CLI error for unsupported values (e.g., return an Err or exit with a user-friendly message) instead of constructing CompressionLevel::Better/Best, and mention `StreamingEncoder::encode_block` in the error text; alternatively, if you prefer the other route, update the CLI help text near the level parsing to explicitly document that only Uncompressed/Fastest/Default are supported by `StreamingEncoder::encode_block`.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 260-271: The test flush_emits_nonempty_partial_output currently
only asserts that flush() produced bytes and then drops the result of
StreamingEncoder::finish(); update the test to capture the finished buffer from
StreamingEncoder::finish() (or get_ref() after finish), then attempt to decode
the full composed stream (header + partial block + finalization) using the
appropriate decoder (e.g., StreamingDecoder or zstd::Decoder) and assert the
decoded bytes equal the original input ("partial-block"); ensure you still keep
the existing flush assertion and add the round-trip decode/assert to validate
the stream is actually decodable.
---
Outside diff comments:
In `@cli/src/main.rs`:
- Around line 105-114: The CLI currently maps `level` to
`structured_zstd::encoding::CompressionLevel` including 3/4
(CompressionLevel::Better/Best) which `StreamingEncoder::encode_block()` doesn't
support; update the code that parses/handles `level` so only 0..=2 are accepted
and produce a clear CLI error for unsupported values (e.g., return an Err or
exit with a user-friendly message) instead of constructing
CompressionLevel::Better/Best, and mention `StreamingEncoder::encode_block` in
the error text; alternatively, if you prefer the other route, update the CLI
help text near the level parsing to explicitly document that only
Uncompressed/Fastest/Default are supported by `StreamingEncoder::encode_block`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: e945d161-8f35-49f7-83e0-6eeebacb2df1
📒 Files selected for processing (4)
README.mdcli/src/main.rszstd/src/encoding/mod.rszstd/src/encoding/streaming_encoder.rs
- cover new_with_matcher/get_mut and empty-write path - cover finish-without-writes and uncompressed roundtrip - cover unsupported Better level and flush-after-finish behavior
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 34-71: Extract the duplicated initialization into a private helper
so both constructors reuse it: create a fn like init_state_and_fields(matcher:
M, drain: W, compression_level: CompressionLevel) -> StreamingEncoder<W, M> or a
helper that returns CompressState plus pending/flags/hasher, then have
StreamingEncoder::new create a MatchGeneratorDriver and call new_with_matcher
(or call the helper) to build the instance; reference StreamingEncoder::new,
StreamingEncoder::new_with_matcher, CompressState, MatchGeneratorDriver,
pending, frame_started, frame_finished, and hasher when making the change.
- Around line 168-186: The code delays validation of unsupported
CompressionLevel variants (Better/Best) until encoding a block, causing late
errors; add a fail-fast check in the StreamingEncoder constructor (e.g., new) or
at the start of ensure_frame_started() to validate self.compression_level and
return an Err(other_error(...)) immediately for any variant other than
CompressionLevel::Uncompressed, CompressionLevel::Fastest, or
CompressionLevel::Default; update any call sites/tests expecting late errors
accordingly and reference the CompressionLevel enum, StreamingEncoder::new (or
the constructor used), and ensure_frame_started() when locating where to add the
match/if that returns the early error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ba49dff0-73de-4dfb-b65a-849e2d7719fe
📒 Files selected for processing (1)
zstd/src/encoding/streaming_encoder.rs
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'structured-zstd vs C FFI'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.
| Benchmark suite | Current: 899b10f | Previous: 13b0866 | Ratio |
|---|---|---|---|
compress/default/decodecorpus-z000033/matrix/pure_rust |
140.063 ms |
119.974 ms |
1.17 |
compress/fastest/large-log-stream/matrix/c_ffi |
3.056 ms |
2.631 ms |
1.16 |
compress/default/large-log-stream/matrix/c_ffi |
3.757 ms |
3.246 ms |
1.16 |
decompress/fastest/high-entropy-1m/c_stream/matrix/c_ffi |
0.032 ms |
0.027 ms |
1.19 |
decompress/default/high-entropy-1m/rust_stream/matrix/c_ffi |
0.215 ms |
0.109 ms |
1.97 |
decompress/default/high-entropy-1m/c_stream/matrix/c_ffi |
0.032 ms |
0.026 ms |
1.23 |
compress-dict/fastest/small-10k-random/matrix/c_ffi_without_dict |
0.007 ms |
0.006 ms |
1.17 |
compress-dict/fastest/decodecorpus-z000033/matrix/c_ffi_without_dict |
3.084 ms |
2.492 ms |
1.24 |
compress-dict/fastest/decodecorpus-z000033/matrix/c_ffi_with_dict |
3.544 ms |
2.785 ms |
1.27 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @polaz
- fail fast on unsupported levels before writing frame header - cap streaming block capacity to zstd MAX_BLOCK_SIZE - optimize write path to avoid drain(...).collect() copies - make CLI reject unsupported streaming levels early
|
@coderabbitai Re: outside-diff CLI level mismatch — fixed. CLI now rejects levels 3/4 (and any unsupported value) early with a clear error before constructing |
|
Tip For best results, initiate chat on the files or code changes.
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cli/src/main.rs`:
- Around line 109-114: The help text and parser still advertise compression
levels 3 and 4 but main.rs now rejects them at runtime; update the CLI parser
and docs so allowed levels match the streaming encoder (0..=2) — specifically,
change the Compress::level argument/validator and its help text/possible_values
(where Compress::level is defined/registered) to only accept 0,1,2 or explicitly
mark 3/4 as unavailable until streaming mode supports them, ensuring the parser
(Clap Arg/validator) and the Compress::level docs are consistent with the match
arms that return Err for 3 and 4.
- Around line 119-123: Before opening/truncating the final output, detect if
input and output refer to the same file (e.g., canonicalize or compare
device/inode via metadata) and return an error if they alias; then create the
encoder against a temporary file (next to the desired path, e.g., output +
".tmp" or use a tempfile) instead of the final path, stream into that temporary
via ProgressMonitor/encoder_input and StreamingEncoder::new with std::io::copy,
call encoder.finish() on the temp, and only after successful finish
rename/atomically replace the temp to the final output path; ensure any created
temp is cleaned up on error and propagate wrapped errors from creation,
streaming, finish, and rename.
In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 72-94: The encoder currently consumes pending and advances hasher
before a fallible write, losing data on encode_block() errors; fix by making
errors sticky and by deferring consumption/hash mutation until after a
successful write: add an error flag (e.g., self.errored: Option<Error>) checked
in ensure_open() alongside frame_finished, set self.errored = Some(err) whenever
encode_block()/drain.write_all() fails, and return that error on subsequent
calls; change encode_block()/finish()/write()/flush() so they do not
mem::take()/mem::swap() pending or mutate self.hasher before the write—either
pass a borrowed slice to encode_block and only remove bytes from self.pending
and update self.hasher after a successful drain.write_all(), or snapshot/restore
hasher state on failure; ensure finish() follows this pattern and does not drop
pending on failure but sets errored and returns the original error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2e4df0b8-7944-469d-a15a-5888a3efe393
📒 Files selected for processing (2)
cli/src/main.rszstd/src/encoding/streaming_encoder.rs
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cli/src/main.rs`:
- Around line 193-197: Do not remove the existing output file before the final
rename; instead either (A) try fs::rename(temporary_output_path, output) first
and if it fails return the error while leaving both files untouched so the
temporary file can be inspected/retried, or (B) perform a safe swap: rename the
original output to a backup name (e.g., derive backup from output), then rename
temporary_output_path to output, and only after that remove the backup; if any
step fails restore the backup to output and return the error. Reference the
variables output and temporary_output_path and the fs::rename/fs::remove_file
operations when making the change.
In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 246-262: The write() loop in streaming_encoder.rs can write one
block via encode_block and then fail on a subsequent block, violating Write
semantics; modify write (the impl of Write::write) to preserve progress by
recording how many bytes were successfully emitted before any fallible
encode_block call and ensuring that if encode_block fails after at least one
successful write you restore pending (swap back) and return Ok(bytes_consumed)
rather than Err. Concretely, before each encode_block(&full_block, false)
invocation use a local bytes_consumed counter (or a pre-call snapshot of
pending/remaining) so that on Err(err) you mem::swap back the full_block into
self.pending and return Ok(bytes_consumed) (or alternatively ensure you only
attempt one fallible encode_block per write call), updating calls to
hash_block(&full_block) only after a successful encode_block and adjusting
fail(err) usage accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1c023ca3-3aab-41bd-8fa2-0991cc638cbb
📒 Files selected for processing (2)
cli/src/main.rszstd/src/encoding/streaming_encoder.rs
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 371-384: The test helper FailingWriteOnce uses std-only
Error::other in its write method, so guard the helper and any tests that depend
on it behind the std feature: add #[cfg(feature = "std")] to the
FailingWriteOnce type/impl (the impl containing fn write and fn flush) and add
#[cfg(feature = "std")] or #[cfg_attr(test, ...)] to the tests that reference it
(write_failure_poisoning_is_sticky and
write_preserves_progress_when_later_block_write_fails) so the no_std build does
not compile code that calls Error::other.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 321138bb-1d6c-4909-bbac-7707e43e4f44
📒 Files selected for processing (2)
cli/src/main.rszstd/src/encoding/streaming_encoder.rs
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
zstd/src/encoding/streaming_encoder.rs (1)
247-257:⚠️ Potential issue | 🔴 CriticalPoison the encoder even when returning partial
write()progress.At Line 251,
encode_block()ultimately useswrite_all(), which may fail after a prefix of the encoded block has already reached the sink. Thisconsumed > 0branch restorespendingand returnsOk(consumed)without settingerrored, so laterwrite/flush/finishcalls can append to a sink that may already contain a truncated block prefix.🔧 Minimal fix
if let Err(err) = self.encode_block(&full_block, false) { mem::swap(&mut self.pending, &mut full_block); if consumed > 0 { + let _ = self.fail(err); return Ok(consumed); } return Err(self.fail(err)); }Please also add a regression test with a writer that writes a prefix and then errors, not just an all-or-nothing failing writer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@zstd/src/encoding/streaming_encoder.rs` around lines 247 - 257, The encoder can be left unpoisoned when encode_block() fails after having already written a prefix to the sink, because the code currently restores pending and returns Ok(consumed) without setting errored; change the error branch in streaming_encoder::write loop (the block handling encode_block(&full_block, false) failure) to set self.errored = true before restoring pending and returning (so the encoder is poisoned even when returning partial progress), and ensure you still restore pending and return Ok(consumed) when consumed > 0; also add a regression test that uses a writer which successfully writes a prefix of the encoded block then returns an error (not an all-or-nothing writer) and assert that subsequent write/flush/finish calls fail due to the poisoned encoder.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cli/src/main.rs`:
- Around line 119-124: The temporary output file is created with default
permissions which overwrites any existing destination mode when
replace_output_file() renames it; modify the code that creates the temporary
file (around create_temporary_output_path and the OpenOptions call creating
temporary_output) to: if the destination path exists, read its current
permissions/mode via metadata().permissions() (or metadata().mode() on Unix),
and apply those permissions to the temporary file before rename (using
std::fs::set_permissions or libc::chmod for Unix-specific mode bits); ensure
this runs after creating the temp file but before calling replace_output_file();
also add a regression test that creates a mode-restricted destination file, runs
the replace operation, and asserts the destination file retains the original
mode.
- Around line 185-200: In replace_output_file, reject non-regular destinations
before performing the backup rename: when output.exists() is true, check that
output.is_file() (or equivalent metadata.is_file()) and if not return an error
(color_eyre::eyre::eyre!("output path is not a regular file") or wrap_err)
instead of proceeding to create_temporary_output_path and fs::rename; this
prevents moving directories or special files into a backup location and restores
the original File::create semantics. Ensure the check is placed before
create_temporary_output_path and before any fs::rename calls involving output
(references: replace_output_file, create_temporary_output_path).
---
Duplicate comments:
In `@zstd/src/encoding/streaming_encoder.rs`:
- Around line 247-257: The encoder can be left unpoisoned when encode_block()
fails after having already written a prefix to the sink, because the code
currently restores pending and returns Ok(consumed) without setting errored;
change the error branch in streaming_encoder::write loop (the block handling
encode_block(&full_block, false) failure) to set self.errored = true before
restoring pending and returning (so the encoder is poisoned even when returning
partial progress), and ensure you still restore pending and return Ok(consumed)
when consumed > 0; also add a regression test that uses a writer which
successfully writes a prefix of the encoded block then returns an error (not an
all-or-nothing writer) and assert that subsequent write/flush/finish calls fail
due to the poisoned encoder.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d7dfb888-f79e-41be-a652-77509a91d3e1
📒 Files selected for processing (2)
cli/src/main.rszstd/src/encoding/streaming_encoder.rs
- retry temporary output creation on AlreadyExists to avoid TOCTOU path races - reject non-regular destinations and preserve destination permissions on swap - consume StreamingEncoder on finish and remove per-block clone in encode path - poison encoder after partial-progress encode failures and add regressions
There was a problem hiding this comment.
♻️ Duplicate comments (1)
cli/src/main.rs (1)
208-216:⚠️ Potential issue | 🟠 Major
Path::exists()lets dangling symlinks bypass this guard.Because the
exists()fast path runs beforesymlink_metadata(), a broken symlink is treated as “missing” andrename()replaces the link itself with a regular file instead of rejecting it as a non-regular destination. OnlyErrorKind::NotFoundshould take the no-existing-output branch.🔧 Tighten the destination check
fn replace_output_file(temporary_output_path: &Path, output: &Path) -> color_eyre::Result<()> { - if !output.exists() { - return fs::rename(temporary_output_path, output) - .wrap_err("failed to move temporary output file into final location"); - } - - let output_kind = fs::symlink_metadata(output) - .wrap_err("failed to inspect existing output path")? - .file_type(); - if !output_kind.is_file() { - let _ = fs::remove_file(temporary_output_path); - return Err(eyre!( - "output path exists and is not a regular file: {output:?}" - )); - } + let output_type = match fs::symlink_metadata(output) { + Ok(metadata) => Some(metadata.file_type()), + Err(err) if err.kind() == ErrorKind::NotFound => None, + Err(err) => return Err(err).wrap_err("failed to inspect existing output path"), + }; + if let Some(file_type) = output_type { + if !file_type.is_file() { + let _ = fs::remove_file(temporary_output_path); + return Err(eyre!( + "output path exists and is not a regular file: {output:?}" + )); + } + } else { + return fs::rename(temporary_output_path, output) + .wrap_err("failed to move temporary output file into final location"); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cli/src/main.rs` around lines 208 - 216, The exists() check can treat dangling symlinks as missing; instead call fs::symlink_metadata(output) first and pattern-match its Result so only ErrorKind::NotFound triggers the "rename temporary_output_path into output" path; for any Ok(metadata) continue to inspect metadata.file_type() (as in output_kind) and for other Err variants return the wrapped error from symlink_metadata rather than falling back to exists(); update the logic around temporary_output_path, output, and output_kind accordingly so broken symlinks are rejected and only a genuine NotFound allows replacing the destination.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@cli/src/main.rs`:
- Around line 208-216: The exists() check can treat dangling symlinks as
missing; instead call fs::symlink_metadata(output) first and pattern-match its
Result so only ErrorKind::NotFound triggers the "rename temporary_output_path
into output" path; for any Ok(metadata) continue to inspect metadata.file_type()
(as in output_kind) and for other Err variants return the wrapped error from
symlink_metadata rather than falling back to exists(); update the logic around
temporary_output_path, output, and output_kind accordingly so broken symlinks
are rejected and only a genuine NotFound allows replacing the destination.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: defeacde-47e9-4ef4-95f6-10453ed0ce0a
📒 Files selected for processing (2)
cli/src/main.rszstd/src/encoding/streaming_encoder.rs
- clean up temporary output when backup rename fails - avoid matcher hash-path allocation by hashing matcher slice directly - preallocate per-block encoded buffer with block-capacity
- add invalid_input_error() helper that uses InvalidInput on std and falls back to Other on no_std where the variant does not exist - replace all direct Error::new(InvalidInput, ...) calls with the helper - add debug_assert for non-empty block invariant before compress_fastest - remove dead hash_block no-op call in cfg(not(hash)) branch - document non-zero window_size contract on Matcher trait - use try_into for source_size in CLI to avoid truncation on 32-bit
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cli/src/main.rs`:
- Around line 199-207: The Windows-specific comparison is using non-existent
methods file_index_high()/file_index_low(); replace those with the
MetadataExt::file_index() calls on input_metadata and output_metadata and
compare the resulting Option<u64> values (i.e., check
input_metadata.file_index() == output_metadata.file_index()) alongside
volume_serial_number(); ensure you keep this inside the #[cfg(windows)] block
and import std::os::windows::fs::MetadataExt so the code compiles.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c7ccb78a-0e6b-4c6a-9586-acd84413564a
📒 Files selected for processing (3)
cli/src/main.rszstd/src/encoding/mod.rszstd/src/encoding/streaming_encoder.rs
MetadataExt on Windows provides file_index() -> Option<u64>, not separate file_index_high()/file_index_low() methods. The old code would fail to compile on Windows targets.
- error_with_kind_message, invalid_input_error, other_error_owned, and other_error now construct Error::new with a boxed message in no_std instead of silently dropping it (alloc is already available) - Matcher::window_size doc reflects no_std ErrorKind::Other fallback - sticky_error cold-path note corrected to "still allocate"
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
README.md (1)
44-52:⚠️ Potential issue | 🟡 MinorClarify the
Defaultrow in the compression checklist.The crate now accepts
CompressionLevel::DefaultthroughStreamingEncoderand CLI level2, so the unchecked entry currently reads as if that level is unavailable. Either mark it supported or qualify that the implementation is present but still evolving toward zstd level 3 behavior.📝 Possible wording
-- [ ] Default (roughly level 3) +- [x] Default (roughly level 3, implementation still evolving)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 44 - 52, The README's compression checklist incorrectly implies CompressionLevel::Default is unsupported; update the "Default (roughly level 3)" row to reflect current behavior: either mark it as supported (checked) since StreamingEncoder accepts CompressionLevel::Default and the CLI maps level 2 to it, or change the text to "Supported (CompressionLevel::Default via StreamingEncoder / CLI level 2; implementation evolving toward zstd level 3 behavior)" to clarify the implementation is present but not fully equivalent to zstd level 3; reference CompressionLevel::Default, StreamingEncoder, and the CLI level mapping in the wording so readers can locate the relevant code paths.cli/src/main.rs (1)
152-161:⚠️ Potential issue | 🟡 MinorDon't print
0.00%for empty inputs.An empty source still produces a non-zero frame, so this currently logs
0.00%even when the output grew. Prefern/aor a dedicated empty-input message here.💡 One simple fix
- let compression_ratio = if source_size == 0 { - 0.0 - } else { - compressed_size as f64 / source_size as f64 * 100.0 - }; - info!( - "{} ——> {} ({compression_ratio:.2}%)", - fmt_size(source_size as f64), - fmt_size(compressed_size as f64) - ); + if source_size == 0 { + info!( + "{} ——> {} (n/a)", + fmt_size(source_size as f64), + fmt_size(compressed_size as f64) + ); + } else { + let compression_ratio = compressed_size as f64 / source_size as f64 * 100.0; + info!( + "{} ——> {} ({compression_ratio:.2}%)", + fmt_size(source_size as f64), + fmt_size(compressed_size as f64) + ); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cli/src/main.rs` around lines 152 - 161, The current log computes compression_ratio and prints "0.00%" for empty inputs; change the logic so when source_size == 0 you do not compute or print a numeric percentage but a sentinel string like "n/a" (or an empty-input message). Concretely, keep the calculation of compressed_size/source_size for the non-empty case (compression_ratio), but introduce a compression_str (or similar) set to "n/a" when source_size == 0 and to format!("{:.2}%", compression_ratio) otherwise, then pass that compression_str to the info! call instead of formatting compression_ratio inline; update references to compression_ratio, source_size, compressed_size, fmt_size, and the info! invocation accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zstd/src/encoding/mod.rs`:
- Around line 108-115: The documentation for fn window_size(&self) -> u64
currently doesn't promise stability mid-frame; update the docstring to state
explicitly that window_size() must return a positive non-zero value and remain
stable for the duration of a frame (i.e., until StreamingEncoder.reset() is
called), because StreamingEncoder.serialize the window into the frame header in
ensure_frame_started() and later sizing still consults the matcher; mention that
custom matchers must not increase the reported window_size() after a frame has
started to avoid under-sized headers.
---
Outside diff comments:
In `@cli/src/main.rs`:
- Around line 152-161: The current log computes compression_ratio and prints
"0.00%" for empty inputs; change the logic so when source_size == 0 you do not
compute or print a numeric percentage but a sentinel string like "n/a" (or an
empty-input message). Concretely, keep the calculation of
compressed_size/source_size for the non-empty case (compression_ratio), but
introduce a compression_str (or similar) set to "n/a" when source_size == 0 and
to format!("{:.2}%", compression_ratio) otherwise, then pass that
compression_str to the info! call instead of formatting compression_ratio
inline; update references to compression_ratio, source_size, compressed_size,
fmt_size, and the info! invocation accordingly.
In `@README.md`:
- Around line 44-52: The README's compression checklist incorrectly implies
CompressionLevel::Default is unsupported; update the "Default (roughly level 3)"
row to reflect current behavior: either mark it as supported (checked) since
StreamingEncoder accepts CompressionLevel::Default and the CLI maps level 2 to
it, or change the text to "Supported (CompressionLevel::Default via
StreamingEncoder / CLI level 2; implementation evolving toward zstd level 3
behavior)" to clarify the implementation is present but not fully equivalent to
zstd level 3; reference CompressionLevel::Default, StreamingEncoder, and the CLI
level mapping in the wording so readers can locate the relevant code paths.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 01e1c7d3-bb66-4723-96be-a71cc1d020c8
📒 Files selected for processing (4)
README.mdcli/src/main.rszstd/src/encoding/mod.rszstd/src/encoding/streaming_encoder.rs
…size contract - README checklist: Default compression is implemented (dfast matcher, within 10% of C zstd level 3 on corpus proxy) - Matcher::window_size() doc: must remain stable for frame lifetime, may change only after reset()
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
Summary
encoding::StreamingEncoderimplementingio::Writewith incremental frame encodingwrite()buffering,flush()for partial non-last blocks, andfinish()for frame finalizationstd::io::copy->StreamingEncoder->finish())finish()0..=2)StreamingEncoderfail-safe on write errors: preserve pending data on failed block emission and poison encoder after first write failureInvalidInputerrors throughinvalid_input_error()helper forno_stdcompatibility (falls back toErrorKind::OtherwhereInvalidInputvariant is absent; all error helpers preserve messages inno_stdviaalloc::boxed::Box)MetadataExt::file_index()(the combinedOption<u64>method, not the nonexistentfile_index_high()/file_index_low())Related
decompress()(out of scope)Testing
Closes #9
Summary by CodeRabbit
New Features
Improvements
Documentation
Tests