Reduce external sequence producer API overhead by 25%#3471
Merged
embg merged 2 commits intofacebook:devfrom Feb 2, 2023
Merged
Reduce external sequence producer API overhead by 25%#3471embg merged 2 commits intofacebook:devfrom
embg merged 2 commits intofacebook:devfrom
Conversation
terrelln
reviewed
Feb 1, 2023
Cyan4973
reviewed
Feb 1, 2023
Cyan4973
approved these changes
Feb 1, 2023
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a cctxParam to disable repcode search during external sequence parsing (only in explicit block delim mode for now, since that's where we currently care about parsing speed). For external matchfinders which don't explicitly search for repcode matches, this sacrifices less than 1% compression ratio on silesia.tar.
In general, the compression ratio trade-off is matchfinder- and data-dependent. Users should benchmark against their own data to determine if the trade-off is worth it. I have enabled by default below compression level 10, because the speed improvement is so great that I imagine few practical use-cases would gain enough ratio from disabling to justify disabling.
In the future, we might be able to use SIMD to run repcode search much faster, and change the trade-off such that enabling at low levels makes sense.
Overall external matchfinder API overhead (non-external-matchfinder compression CPU) is currently about 50% inside
ZSTD_copySequencesToSeqStoreExplicitBlockDelim(), so this PR reduces overall overhead by about 25% (see perf numbers below).Before:

After:

cc @daweiq @GarenJian-Intel