Skip to content

Conversation

@Viq111
Copy link
Collaborator

@Viq111 Viq111 commented Jan 31, 2020

Based on comments from #22 & #39

This PR moves the stream compression C calls from the deprecated ZSTD_compressBegin / ZSTD_compressContinue to the current stable ZSTD_compressStream2 API

It allows piggy-back on #31 to do the same for the streaming interface

See https://github.com/facebook/zstd/blob/5120883a9c29a7966bbd2144df04f6976f4c7acc/lib/zstd.h#L576 for streaming best practices

(I'll do the Decompression API change in a separate PR to reduce the size of the PR and also technically the 2 changes can be made independently)

- Move Compression API from the (now deprecated) ZSTD_compressBegin / ZSTD_compressContinue API to ZSTD_compressStream2
- Add some C passthrough methods to not have to allocated pointers
- Add a source buffer if all the source data cannot be consumed in one zstd call

resolve #22
resolve #39
If we can avoid an extra copy of the source data, don't copy. That will save the copy but also the allocation (which can be big on input)
@Viq111
Copy link
Collaborator Author

Viq111 commented Jan 31, 2020

To add content to that PR, I tried to reproduce the issue in #39 by writing a tool to stresstest CPU + memory (many workers, working close to OS memory limit) at a separate branch: https://github.com/DataDog/zstd/compare/viq111/mem_stresstest?expand=1

I wasn't able to reproduce it. My best guess is as this comment: #22 (comment) points out, there could be cases where the full input wouldn't be consumed, then in between calls that part of the memory is reused and edited.
This happen very rarely because the dstBuffer is sized to be compressBound(input) so it should always have enough space to write to. I'm guessing there are now cases where the full input is not consumed

Copy link

@gmmeyer gmmeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks a lot for putting this together! :) I have a question about one of the lines.

@seiflotfy
Copy link

Great job on this! can we have it merged?

@Viq111
Copy link
Collaborator Author

Viq111 commented Feb 3, 2020

Hi @seiflotfy

Thanks for your interest!
I finished this PR Friday and I still have to do the decompression side of it (similar code change but pretty much a repeat of that PR so shouldn't be too much time needed).
After that, I also want to test it on a bunch of payloads as it changes quite a bit the internals.

My current goal for merging this is end of this week / next week and do a pre-release from there.

@Viq111
Copy link
Collaborator Author

Viq111 commented Mar 19, 2020

Sorry for the delay, I wanted to thoroughly check that we covered all cases.
I created a memory stress tooling at https://github.com/DataDog/zstd/compare/viq111/mem_stresstest and found a bug which I was able to to uncover
@gmmeyer if you want to give a look at the last commit I think we are ready to merge then

@Viq111 Viq111 merged commit 98d11f8 into 1.x Mar 19, 2020
@Viq111 Viq111 deleted the viq111/stream2 branch March 19, 2020 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants