Skip to content

fix: Ensure buffer space before reading in Decompress method#127105

Merged
rzikm merged 9 commits intodotnet:mainfrom
pumpkin-bit:fix/zstandard-stream-buffer-truncation
Apr 20, 2026
Merged

fix: Ensure buffer space before reading in Decompress method#127105
rzikm merged 9 commits intodotnet:mainfrom
pumpkin-bit:fix/zstandard-stream-buffer-truncation

Conversation

@pumpkin-bit
Copy link
Copy Markdown
Contributor

The essence of the bug:
During decompression, a portion of compressed data ends exactly at the boundary of a 64 kb buffer, and the remaining bytes, not yet decompressed by the decoder, are waiting for the next portion, resulting in a exception. The buffer is physically full, and AvailableSpan returns an empty fragment, Span<byte> empty reading from the file into an empty Span returns 0 bytes read.

Solution:
Checks have been added to the main Read and ReadAsync loops. If AvailableLength is zero, we force a call to _buffer.EnsureAvailableSpace(1);. the bytes are shifted to the beginning of the array, the read is released, and the stream then reads new data.

@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Apr 18, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @karelz, @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Member

@MihaZupan MihaZupan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Can you please also add a test that covers the described case?

pumpkin-bit and others added 3 commits April 18, 2026 19:21
…Zstandard/ZstandardStream.Decompress.cs

Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>
…Zstandard/ZstandardStream.Decompress.cs

Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>
@pumpkin-bit
Copy link
Copy Markdown
Contributor Author

@dotnet-policy-service agree

@pumpkin-bit
Copy link
Copy Markdown
Contributor Author

Thank you. Can you please also add a test that covers the described case?

done added a test case for ZstandardStream.

@MihaZupan
Copy link
Copy Markdown
Member

Did you check that the test fails before the fix in this PR?

@pumpkin-bit
Copy link
Copy Markdown
Contributor Author

Did you check that the test fails before the fix in this PR?

Yes, I checked using a test to simulate the situation with and without my fix. Without the fix, the decompressor caught an exception. With my fix, the buffer moved correctly without errors.

@MihaZupan
Copy link
Copy Markdown
Member

Testing locally, the test doesn't seem to be triggering the changed case at all.

Can you share the inputs & code you've used to hit this issue in the first place?

@pumpkin-bit
Copy link
Copy Markdown
Contributor Author

Testing locally, the test doesn't seem to be triggering the changed case at all.

Can you share the inputs & code you've used to hit this issue in the first place?

In my test (.net), I use ZstandardTestUtils.CreateTestData(150000)
and 2000-byte chunks. the problem may be that the CreateTestData method can generate data that compresses with varying efficiency depending on the platform (Linux, Windows, etc.). Could your compressed stream have a different length? That's why it's crossing the dangerous 64 KB limit this bug is dormant because it depends on the chunk size and array boundaries perfectly matching, which is extremely rare, but not impossible

for the .NET test, you can try precisely adjusted sizes or a hardcore
I also tried the simulation that worked for me under conditions of hardcore logic and the same chunk size and array boundaries.


using System;

public class Program
{
    public static void Main()
    {
        Console.WriteLine("running without fix...\n");
        try {
            RunSimulation(applyFix: false);
        } catch (Exception ex) {
            Console.WriteLine($"\nerror: {ex.Message}");
        }

        Console.WriteLine("\n\nrunning with fix applied...\n");
        try {
            RunSimulation(applyFix: true);
            Console.WriteLine("\ndone all data successfully read without truncation");
        } catch (Exception ex) {
            Console.WriteLine($"\nerror {ex.Message}");
        }
    }

    static void RunSimulation(bool applyFix)
    {
        var buffer = new MockBuffer(10);
        int totalToRead = 15; 
        int streamRemaining = 15;
        int iteration = 1;

        while (streamRemaining > 0 || buffer.ActiveLength > 0)
        {
            Console.WriteLine($"\niteration {iteration++}:");
            int consumed = Math.Min(3, buffer.ActiveLength);
            buffer.Discard(consumed);
            Console.WriteLine($"decompressor huk {consumed} took it the buffer has moved");

            if (streamRemaining <= 0 && buffer.ActiveLength == 0) break;

            if (applyFix)
            {
                buffer.EnsureAvailableSpace(1);
            }

            int spaceAvailable = buffer.AvailableLength;
            int bytesToRead = Math.Min(4, Math.Min(spaceAvailable, streamRemaining));
            Console.WriteLine($"available at the end of the array: {spaceAvailable} bytes. reading is stream...");
            
            if (bytesToRead <= 0 && streamRemaining > 0)
            {
                throw new Exception("ThrowTruncatedInvalidData() stream has more data but no space to read into");
            }
            
            buffer.Commit(bytesToRead);
            streamRemaining -= bytesToRead;
            Console.WriteLine($"reading {bytesToRead} bytes remaining in stream: {streamRemaining}.");
        }
    }
}

class MockBuffer
{
    private byte[] _array;
    private int _start;
    private int _end;
    public MockBuffer(int size) { _array = new byte[size]; }
    public int ActiveLength => _end - _start;
    public int AvailableLength => _array.Length - _end; 
    public void Discard(int count) { _start += count; } 
    public void Commit(int count) { _end += count; }    
    public void EnsureAvailableSpace(int minimumBytes)
    {
        if (AvailableLength < minimumBytes)
        {
            Console.WriteLine($"ensureAvailableSpace: little space move {_start} unread bytes to the beginning of the array");
            int active = ActiveLength;
            Array.Copy(_array, _start, _array, 0, active);
            _start = 0;
            _end = active;
        }
    }
}

@MihaZupan
Copy link
Copy Markdown
Member

The real logic has two important differences compared to your simulation:

  1. Discard will reset the _start and _end when empty
  2. Decompress calls will pass the remaining buffer to the native side, which also internally buffers partial compressed data, and marking it as consumed. We thus hit the logic in discard that resets the buffer back to totally empty.

Hence my question of whether you've hit this in practice or if it's just based on code observations.

@rzikm rzikm self-requested a review April 20, 2026 08:14
Copy link
Copy Markdown
Member

@rzikm rzikm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zstd is likely buffering all the data we pass to it internally, so the buffer is always drained completely, still, the call to EnsureAvailableSpace does not hurt in case this changes (or the implementation gets copied for other compression algs in the future)

I would be okay with taking the changes without the unit tests, since they don't reproduce the claimed issue (I ran the code coverage and only the fast-path of EnsureAvailableSpace ever gets executed)

@pumpkin-bit
Copy link
Copy Markdown
Contributor Author

zstd is likely buffering all the data we pass to it internally, so the buffer is always drained completely, still, the call to EnsureAvailableSpace does not hurt in case this changes (or the implementation gets copied for other compression algs in the future)

I would be okay with taking the changes without the unit tests, since they don't reproduce the claimed issue (I ran the code coverage and only the fast-path of EnsureAvailableSpace ever gets executed)

I agree. Even if the current native implementation typically flushes the buffer completely, having this safety check ensures the logic is robust to future changes or specific edge cases related to partial consumption, as this issue is extremely rare and depends on very specific state alignment I don't mind removing unit tests if they don't allow you to reliably reproduce the bug in the current environment. In any case, it's better to have protection in the code.

@MihaZupan
Copy link
Copy Markdown
Member

Yeah, we shouldn't commit the current test.

@MihaZupan
Copy link
Copy Markdown
Member

The new test, we should keep the existing file :)

@pumpkin-bit
Copy link
Copy Markdown
Contributor Author

The new test, we should keep the existing file :)

Sorry, it happened by accident.

@pumpkin-bit
Copy link
Copy Markdown
Contributor Author

I restored the file itself and deleted my test from it. I hope this time everything will work out without any accidental errors on my part. :)

Copy link
Copy Markdown
Member

@rzikm rzikm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks!

@rzikm rzikm enabled auto-merge (squash) April 20, 2026 16:24
@rzikm rzikm disabled auto-merge April 20, 2026 16:25
@rzikm rzikm enabled auto-merge (squash) April 20, 2026 16:25
@rzikm rzikm self-assigned this Apr 20, 2026
@rzikm rzikm merged commit b85a52a into dotnet:main Apr 20, 2026
84 of 91 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.IO.Compression community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants