[WIP] Add span-based Deflate, ZLib and GZip encoder/decoder APIs#123145
[WIP] Add span-based Deflate, ZLib and GZip encoder/decoder APIs#123145iremyux wants to merge 30 commits intodotnet:mainfrom
Conversation
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoderOptions.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoderOptions.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoderOptions.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/ref/System.IO.Compression.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibDecoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZlibEncoder.cs
Outdated
Show resolved
Hide resolved
| /// <returns>One of the enumeration values that describes the status with which the operation finished.</returns> | ||
| public OperationStatus Flush(Span<byte> destination, out int bytesWritten) | ||
| { | ||
| return Compress(ReadOnlySpan<byte>.Empty, destination, out _, out bytesWritten, isFinalBlock: false); |
There was a problem hiding this comment.
does this force writing output (if available), I think this should lead to FlushCode.SyncFlush to the native API
| /// <param name="source">A read-only span of bytes containing the source data to compress.</param> | ||
| /// <param name="destination">When this method returns, a span of bytes where the compressed data is stored.</param> | ||
| /// <param name="bytesWritten">When this method returns, the total number of bytes that were written to <paramref name="destination"/>.</param> | ||
| /// <param name="compressionLevel">A number representing compression level. -1 is default, 0 is no compression, 1 is best speed, 9 is best compression.</param> |
There was a problem hiding this comment.
We should be more clear which default we mean.
| /// <param name="compressionLevel">A number representing compression level. -1 is default, 0 is no compression, 1 is best speed, 9 is best compression.</param> | |
| /// <param name="compressionLevel">A number representing compression level. -1 means implementation default, 0 is no compression, 1 is best speed, 9 is best compression.</param> |
| CompressionLevel.Fastest => ZLibNative.CompressionLevel.BestSpeed, | ||
| CompressionLevel.NoCompression => ZLibNative.CompressionLevel.NoCompression, | ||
| CompressionLevel.SmallestSize => ZLibNative.CompressionLevel.BestCompression, | ||
| _ => throw new ArgumentOutOfRangeException(nameof(compressionLevel)), |
There was a problem hiding this comment.
This would fail on valid native compression levels not covered by the CompressionLevel enum. Instead I think it should check if the value is is < -1 or > 9 to throw out of range instead.
There was a problem hiding this comment.
Also to add on to the above, now those who want compression levels that just happen to == a value in the CompressionLevel enum will now not be able to use those compression levels either. Perhaps a solution to this is to expose a version of the ctor with CompressionLevel and a version with int that gets casted to ZLibNative.CompressionLevel after a range check.
src/libraries/System.IO.Compression/tests/DeflateZLibGZipEncoderDecoderTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/GZipEncoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/tests/DeflateEncoderDecoderTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/tests/GZipEncoderDecoderTests.cs
Outdated
Show resolved
Hide resolved
| byte[] compressed = new byte[GetMaxCompressedLength(input.Length)]; | ||
| using var encoder = new GZipEncoder(CompressionLevel.Optimal); | ||
| encoder.Compress(input, compressed, out _, out int compressedSize, isFinalBlock: true); |
There was a problem hiding this comment.
new GZipEncoder(CompressionLevel.Optimal) won’t compile because GZipEncoder has no CompressionLevel overload. Use the int-quality ctor (e.g., DefaultQuality) or add a CompressionLevel overload and mapping logic.
src/libraries/System.IO.Compression/tests/ZlibEncoderDecoderTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateEncoder.cs
Show resolved
Hide resolved
src/libraries/System.IO.Compression/tests/DeflateEncoderDecoderTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/tests/GZipEncoderDecoderTests.cs
Outdated
Show resolved
Hide resolved
| byte[] input = CreateTestData(); | ||
| using var encoder = new ZLibEncoder(CompressionLevel.Optimal, strategy); | ||
| byte[] destination = new byte[GetMaxCompressedLength(input.Length)]; | ||
|
|
There was a problem hiding this comment.
The tests use ZLibEncoder constructors that take CompressionLevel (and CompressionLevel + ZLibCompressionStrategy), but ZLibEncoder only exposes (int quality), (int quality, int windowLog), and (ZLibCompressionOptions). This won’t compile as-is. Either update the tests to use the available int-quality/options-based constructors, or add the missing public constructors.
src/libraries/System.IO.Compression/src/System/IO/Compression/GZipEncoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateEncoder.cs
Outdated
Show resolved
Hide resolved
| OperationStatus status = errorCode switch | ||
| { | ||
| ZLibNative.ErrorCode.Ok => _state.AvailIn == 0 | ||
| ? OperationStatus.Done | ||
| : _state.AvailOut == 0 | ||
| ? OperationStatus.DestinationTooSmall | ||
| : OperationStatus.Done, | ||
| ZLibNative.ErrorCode.StreamEnd => OperationStatus.Done, |
There was a problem hiding this comment.
When isFinalBlock is true, zlib may return Z_OK (not Z_STREAM_END) indicating the caller must provide more output space and call deflate() again with Z_FINISH until Z_STREAM_END. This implementation can return OperationStatus.Done for ErrorCode.Ok as long as all input was consumed, which can cause callers (especially TryCompress) to treat compression as complete even though the stream trailer hasn’t been fully emitted. Consider returning DestinationTooSmall (or otherwise indicating "need more output") until ErrorCode.StreamEnd is observed.
|
|
||
| uint32_t CompressionNative_CompressBound(uint32_t sourceLen) | ||
| { | ||
| return (uint32_t)compressBound(sourceLen); |
There was a problem hiding this comment.
CompressionNative_CompressBound returns uint32_t, but compressBound() returns uLong and can exceed 32-bit for larger inputs; the cast here can silently truncate. Either validate that the result fits in 32-bit (and fail predictably) or widen the exported API to return a 64-bit size (uint64_t/size_t) and update the managed interop accordingly.
| return (uint32_t)compressBound(sourceLen); | |
| uLong bound = compressBound(sourceLen); | |
| if (bound > UINT32_MAX) | |
| { | |
| return 0; | |
| } | |
| return (uint32_t)bound; |
| // GZip has a larger header than raw deflate, so add extra overhead | ||
| long baseLength = DeflateEncoder.GetMaxCompressedLength(inputLength); | ||
|
|
||
| // GZip adds 18 bytes: 10-byte header + 8-byte trailer (CRC32 + original size) | ||
| return baseLength + 18; |
There was a problem hiding this comment.
The GetMaxCompressedLength calculation for GZip may be incorrect. The native compressBound function calculates the worst-case compressed size including overhead for zlib/deflate format, but it does NOT account for GZip format overhead. According to zlib documentation, compressBound accounts for zlib wrapper (which is 6 bytes) but not GZip wrapper (10-byte header + 8-byte trailer = 18 bytes).
However, when using windowBits with +16 (for GZip), the deflate operation itself will add the GZip header and trailer, so compressBound should be called with the understanding that it returns the bound for the format being used. The issue is that compressBound always calculates based on the zlib wrapper overhead, not taking windowBits into account.
For GZip: The header is 10 bytes, trailer is 8 bytes (total 18). For ZLib: header is 2 bytes, trailer is 4 bytes (total 6). For raw deflate: no wrapper (0 bytes). The compressBound function assumes zlib format (6 bytes overhead).
Therefore:
- DeflateEncoder (raw format): needs to subtract 6 bytes from compressBound or calculate differently
- ZLibEncoder: compressBound is correct as-is
- GZipEncoder: needs to add (18 - 6) = 12 bytes, not 18 bytes
This needs verification against actual zlib behavior, but the current implementation likely overestimates for GZip and may be incorrect for raw Deflate.
| // GZip has a larger header than raw deflate, so add extra overhead | |
| long baseLength = DeflateEncoder.GetMaxCompressedLength(inputLength); | |
| // GZip adds 18 bytes: 10-byte header + 8-byte trailer (CRC32 + original size) | |
| return baseLength + 18; | |
| // Base deflate bound (includes zlib wrapper overhead when using compressBound semantics). | |
| long baseLength = DeflateEncoder.GetMaxCompressedLength(inputLength); | |
| // GZip wrapper is 18 bytes (10-byte header + 8-byte trailer), while zlib wrapper is 6 bytes. | |
| // Since the base length already accounts for the 6-byte zlib wrapper, add only the extra 12 bytes. | |
| return baseLength + 12; |
| { | ||
| ArgumentOutOfRangeException.ThrowIfNegative(inputLength); | ||
| ArgumentOutOfRangeException.ThrowIfGreaterThan(inputLength, uint.MaxValue); | ||
|
|
There was a problem hiding this comment.
The GetMaxCompressedLength for DeflateEncoder may be incorrect for raw deflate format. The native compressBound function calculates worst-case size assuming zlib format (with 6 bytes of wrapper overhead: 2-byte header + 4-byte trailer). However, raw deflate format (used when windowBits is negative) has no wrapper overhead.
This means compressBound overestimates the required buffer size for raw deflate by 6 bytes, which while safe, is not accurate. Consider either:
- Subtracting 6 from the result when calculating for raw deflate
- Adding a comment explaining that this is a conservative overestimate
The overestimate is small and safe (better than underestimating), so this is a low-priority issue, but it should be documented or corrected for API accuracy.
| // Interop.ZLib.compressBound returns a conservative upper bound for zlib-wrapped streams. | |
| // DeflateEncoder uses the raw deflate format, which omits the 2-byte header and 4-byte trailer, | |
| // so this value may be up to 6 bytes larger than strictly necessary, but it is always safe. |
| ArgumentOutOfRangeException.ThrowIfNegative(inputLength); | ||
| ArgumentOutOfRangeException.ThrowIfGreaterThan(inputLength, uint.MaxValue); | ||
|
|
||
| return (long)Interop.ZLib.compressBound((uint)inputLength); |
There was a problem hiding this comment.
The GetMaxCompressedLength calculation is incorrect. The compressBound function from zlib returns the maximum size including the zlib wrapper (6 bytes overhead). For raw Deflate format (no wrapper), the correct calculation should subtract 6 bytes from the compressBound result.
The calculation should be: return (long)Interop.ZLib.compressBound((uint)inputLength) - 6;
Alternatively, a new native function that returns the raw deflate bound could be added.
| return (long)Interop.ZLib.compressBound((uint)inputLength); | |
| return (long)Interop.ZLib.compressBound((uint)inputLength) - 6; |
There was a problem hiding this comment.
@copilot, these are upper bounds, wouldn't overestimating be safer?
This PR introduces new span-based, streamless compression and decompression APIs for Deflate, ZLib, and GZip formats, matching the existing
BrotliEncoder/BrotliDecoderpattern.New APIs
DeflateEncoder/DeflateDecoderZLibEncoder/ZLibDecoderGZipEncoder/GZipDecoderThese classes provide:
Compress(),Decompress(), andFlush()TryCompress() andTryDecompress()for simple scenariosGetMaxCompressedLength()to calculate buffer sizesCloses #62113
Closes #39327