Skip to content

seekable_format: Prevent rereading frame when seeking forward#3069

Closed
YoniGilad wants to merge 2 commits intofacebook:devfrom
YoniGilad:seekable-no-reread
Closed

seekable_format: Prevent rereading frame when seeking forward#3069
YoniGilad wants to merge 2 commits intofacebook:devfrom
YoniGilad:seekable-no-reread

Conversation

@YoniGilad
Copy link
Contributor

When decompressing a seekable file, if seeking forward within
a frame (by issuing multiple ZSTD_seekable_decompress calls
with a small gap between them), the frame will be unnecessarily
reread from the beginning. This patch makes it continue using
the current frame data and simply skip over the unneeded bytes.

When decompressing a seekable file, if seeking forward within
a frame (by issuing multiple ZSTD_seekable_decompress calls
with a small gap between them), the frame will be unnecessarily
reread from the beginning. This patch makes it continue using
the current frame data and simply skip over the unneeded bytes.
@Cyan4973
Copy link
Contributor

While I understand the intention (reduce the nb of fseek()), it's unclear to me if this change is safe.

What happens when offset > zs->decompressedOffset ?
Why is it guaranteed to be always fine ?

Could there be a test which shows the benefit and correctness of the change ?

@YoniGilad
Copy link
Contributor Author

@Cyan4973 When offset > zs->decompressedOffset (but still within the same frame), we will enter the if (zs->decompressedOffset < offset) condition below, which will cause it to decompress and skip the unneeded data. This is the same mechanism that skips from the start of a frame to the requested offset.

I will work on adding a test into seekable_tests.c that performs multiple reads at different offsets. To measure the benefit I think I can use the ZSTD_seekable_customFile mechanism with functions that keep track of the number of bytes read and then check that it's not too high.

This does the following:
1. Compress test data into multiple frames
2. Perform a series of small decompressions and seeks forward, checking
   that compressed data wasn't reread unnecessarily.
3. Perform some seeks forward and backward to ensure correctness.
@YoniGilad
Copy link
Contributor Author

I added a test that does the following:

  1. Compress test data into multiple frames
  2. Perform a series of small decompressions and seeks forward, checking that the total data read is less than the compressed size - i.e. that compressed data wasn't reread. (this check fails without this PR's fix).
  3. Perform some seeks forward and backward to ensure correctness.

@Cyan4973
Copy link
Contributor

Thanks @YoniGilad ,
your patch has been rebased and re-proposed for merge in #3581 .

@Cyan4973 Cyan4973 closed this Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants