Skip to content

[WIP] Add span-based Deflate, ZLib and GZip encoder/decoder APIs#123145

Draft
iremyux wants to merge 30 commits intodotnet:mainfrom
iremyux:62113-zlib-encoder-decoder
Draft

[WIP] Add span-based Deflate, ZLib and GZip encoder/decoder APIs#123145
iremyux wants to merge 30 commits intodotnet:mainfrom
iremyux:62113-zlib-encoder-decoder

Conversation

@iremyux
Copy link
Contributor

@iremyux iremyux commented Jan 13, 2026

This PR introduces new span-based, streamless compression and decompression APIs for Deflate, ZLib, and GZip formats, matching the existing BrotliEncoder/BrotliDecoder pattern.

New APIs

  • DeflateEncoder / DeflateDecoder
  • ZLibEncoder / ZLibDecoder
  • GZipEncoder / GZipDecoder

These classes provide:

  • Instance-based API for streaming/chunked compression with Compress(), Decompress(), and Flush()
  • Static one-shot API via TryCompress() and TryDecompress() for simple scenarios
  • GetMaxCompressedLength() to calculate buffer sizes

Closes #62113
Closes #39327

/// <returns>One of the enumeration values that describes the status with which the operation finished.</returns>
public OperationStatus Flush(Span<byte> destination, out int bytesWritten)
{
return Compress(ReadOnlySpan<byte>.Empty, destination, out _, out bytesWritten, isFinalBlock: false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this force writing output (if available), I think this should lead to FlushCode.SyncFlush to the native API

/// <param name="source">A read-only span of bytes containing the source data to compress.</param>
/// <param name="destination">When this method returns, a span of bytes where the compressed data is stored.</param>
/// <param name="bytesWritten">When this method returns, the total number of bytes that were written to <paramref name="destination"/>.</param>
/// <param name="compressionLevel">A number representing compression level. -1 is default, 0 is no compression, 1 is best speed, 9 is best compression.</param>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be more clear which default we mean.

Suggested change
/// <param name="compressionLevel">A number representing compression level. -1 is default, 0 is no compression, 1 is best speed, 9 is best compression.</param>
/// <param name="compressionLevel">A number representing compression level. -1 means implementation default, 0 is no compression, 1 is best speed, 9 is best compression.</param>

@iremyux iremyux changed the title [WIP] Add span-based ZlibEncoder and ZlibDecoder APIs [WIP] Add span-based Deflate, ZLib and GZip encoder/decoder APIs Jan 19, 2026
CompressionLevel.Fastest => ZLibNative.CompressionLevel.BestSpeed,
CompressionLevel.NoCompression => ZLibNative.CompressionLevel.NoCompression,
CompressionLevel.SmallestSize => ZLibNative.CompressionLevel.BestCompression,
_ => throw new ArgumentOutOfRangeException(nameof(compressionLevel)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would fail on valid native compression levels not covered by the CompressionLevel enum. Instead I think it should check if the value is is < -1 or > 9 to throw out of range instead.

Copy link
Member

@AraHaan AraHaan Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to add on to the above, now those who want compression levels that just happen to == a value in the CompressionLevel enum will now not be able to use those compression levels either. Perhaps a solution to this is to expose a version of the ctor with CompressionLevel and a version with int that gets casted to ZLibNative.CompressionLevel after a range check.

Copilot AI review requested due to automatic review settings February 11, 2026 10:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 12 comments.

Comment on lines 167 to 169
byte[] compressed = new byte[GetMaxCompressedLength(input.Length)];
using var encoder = new GZipEncoder(CompressionLevel.Optimal);
encoder.Compress(input, compressed, out _, out int compressedSize, isFinalBlock: true);
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new GZipEncoder(CompressionLevel.Optimal) won’t compile because GZipEncoder has no CompressionLevel overload. Use the int-quality ctor (e.g., DefaultQuality) or add a CompressionLevel overload and mapping logic.

Copilot uses AI. Check for mistakes.
Comment on lines 113 to 116
byte[] input = CreateTestData();
using var encoder = new ZLibEncoder(CompressionLevel.Optimal, strategy);
byte[] destination = new byte[GetMaxCompressedLength(input.Length)];

Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests use ZLibEncoder constructors that take CompressionLevel (and CompressionLevel + ZLibCompressionStrategy), but ZLibEncoder only exposes (int quality), (int quality, int windowLog), and (ZLibCompressionOptions). This won’t compile as-is. Either update the tests to use the available int-quality/options-based constructors, or add the missing public constructors.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 12, 2026 13:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Comment on lines +262 to +269
OperationStatus status = errorCode switch
{
ZLibNative.ErrorCode.Ok => _state.AvailIn == 0
? OperationStatus.Done
: _state.AvailOut == 0
? OperationStatus.DestinationTooSmall
: OperationStatus.Done,
ZLibNative.ErrorCode.StreamEnd => OperationStatus.Done,
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When isFinalBlock is true, zlib may return Z_OK (not Z_STREAM_END) indicating the caller must provide more output space and call deflate() again with Z_FINISH until Z_STREAM_END. This implementation can return OperationStatus.Done for ErrorCode.Ok as long as all input was consumed, which can cause callers (especially TryCompress) to treat compression as complete even though the stream trailer hasn’t been fully emitted. Consider returning DestinationTooSmall (or otherwise indicating "need more output") until ErrorCode.StreamEnd is observed.

Copilot uses AI. Check for mistakes.

uint32_t CompressionNative_CompressBound(uint32_t sourceLen)
{
return (uint32_t)compressBound(sourceLen);
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompressionNative_CompressBound returns uint32_t, but compressBound() returns uLong and can exceed 32-bit for larger inputs; the cast here can silently truncate. Either validate that the result fits in 32-bit (and fail predictably) or widen the exported API to return a 64-bit size (uint64_t/size_t) and update the managed interop accordingly.

Suggested change
return (uint32_t)compressBound(sourceLen);
uLong bound = compressBound(sourceLen);
if (bound > UINT32_MAX)
{
return 0;
}
return (uint32_t)bound;

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 13, 2026 16:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Comment on lines +71 to +75
// GZip has a larger header than raw deflate, so add extra overhead
long baseLength = DeflateEncoder.GetMaxCompressedLength(inputLength);

// GZip adds 18 bytes: 10-byte header + 8-byte trailer (CRC32 + original size)
return baseLength + 18;
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetMaxCompressedLength calculation for GZip may be incorrect. The native compressBound function calculates the worst-case compressed size including overhead for zlib/deflate format, but it does NOT account for GZip format overhead. According to zlib documentation, compressBound accounts for zlib wrapper (which is 6 bytes) but not GZip wrapper (10-byte header + 8-byte trailer = 18 bytes).

However, when using windowBits with +16 (for GZip), the deflate operation itself will add the GZip header and trailer, so compressBound should be called with the understanding that it returns the bound for the format being used. The issue is that compressBound always calculates based on the zlib wrapper overhead, not taking windowBits into account.

For GZip: The header is 10 bytes, trailer is 8 bytes (total 18). For ZLib: header is 2 bytes, trailer is 4 bytes (total 6). For raw deflate: no wrapper (0 bytes). The compressBound function assumes zlib format (6 bytes overhead).

Therefore:

  • DeflateEncoder (raw format): needs to subtract 6 bytes from compressBound or calculate differently
  • ZLibEncoder: compressBound is correct as-is
  • GZipEncoder: needs to add (18 - 6) = 12 bytes, not 18 bytes

This needs verification against actual zlib behavior, but the current implementation likely overestimates for GZip and may be incorrect for raw Deflate.

Suggested change
// GZip has a larger header than raw deflate, so add extra overhead
long baseLength = DeflateEncoder.GetMaxCompressedLength(inputLength);
// GZip adds 18 bytes: 10-byte header + 8-byte trailer (CRC32 + original size)
return baseLength + 18;
// Base deflate bound (includes zlib wrapper overhead when using compressBound semantics).
long baseLength = DeflateEncoder.GetMaxCompressedLength(inputLength);
// GZip wrapper is 18 bytes (10-byte header + 8-byte trailer), while zlib wrapper is 6 bytes.
// Since the base length already accounts for the 6-byte zlib wrapper, add only the extra 12 bytes.
return baseLength + 12;

Copilot uses AI. Check for mistakes.
{
ArgumentOutOfRangeException.ThrowIfNegative(inputLength);
ArgumentOutOfRangeException.ThrowIfGreaterThan(inputLength, uint.MaxValue);

Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetMaxCompressedLength for DeflateEncoder may be incorrect for raw deflate format. The native compressBound function calculates worst-case size assuming zlib format (with 6 bytes of wrapper overhead: 2-byte header + 4-byte trailer). However, raw deflate format (used when windowBits is negative) has no wrapper overhead.

This means compressBound overestimates the required buffer size for raw deflate by 6 bytes, which while safe, is not accurate. Consider either:

  1. Subtracting 6 from the result when calculating for raw deflate
  2. Adding a comment explaining that this is a conservative overestimate

The overestimate is small and safe (better than underestimating), so this is a low-priority issue, but it should be documented or corrected for API accuracy.

Suggested change
// Interop.ZLib.compressBound returns a conservative upper bound for zlib-wrapped streams.
// DeflateEncoder uses the raw deflate format, which omits the 2-byte header and 4-byte trailer,
// so this value may be up to 6 bytes larger than strictly necessary, but it is always safe.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 16, 2026 10:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.

ArgumentOutOfRangeException.ThrowIfNegative(inputLength);
ArgumentOutOfRangeException.ThrowIfGreaterThan(inputLength, uint.MaxValue);

return (long)Interop.ZLib.compressBound((uint)inputLength);
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetMaxCompressedLength calculation is incorrect. The compressBound function from zlib returns the maximum size including the zlib wrapper (6 bytes overhead). For raw Deflate format (no wrapper), the correct calculation should subtract 6 bytes from the compressBound result.

The calculation should be: return (long)Interop.ZLib.compressBound((uint)inputLength) - 6;

Alternatively, a new native function that returns the raw deflate bound could be added.

Suggested change
return (long)Interop.ZLib.compressBound((uint)inputLength);
return (long)Interop.ZLib.compressBound((uint)inputLength) - 6;

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot, these are upper bounds, wouldn't overestimating be safer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[API Proposal]: Add Deflate, ZLib and GZip encoder/decoder APIs Span-based (non-stream) compression APIs

3 participants