Merged
Conversation
Memory constrained use cases that manage multiple archives benefit from retaining multiple archive seek tables without retaining a ZSTD_seekable instance for each. * New opaque type for seek table: ZSTD_seekTable. * ZSTD_seekable_copySeekTable() supports copying seek table out of a ZSTD_seekable. * ZSTD_seekTable_[eachSeekTableOp]() defines seek table API that mirrors existing seek table operations. * Existing ZSTD_seekable_[eachSeekTableOp]() retained; they delegate to ZSTD_seekTable the variant. These changes allow the above-mentioned use cases to initialize a ZSTD_seekable, extract its ZSTD_seekTable, then throw the ZSTD_seekable away to save memory. Standard ZSTD operations can then be used to decompress frames based on seek table offsets. The copy and delegate patterns are intended to minimize impact on existing code and clients. Using copy instead of move for the infrequent operation extracting a seek table ensures that the extraction does not render the ZSTD_seekable useless. Delegating to *new* seek table-oriented APIs ensures that this is not a breaking change for existing clients while supporting all meaningful operations that depend only on seek table data.
[contrib] Support seek table-only API
read-only objects are properly const-ified in parameters
Seekable hang fix
and simple roundtrip test
New direct seekTable access methods
It is a stack high-point for some compression strategies and has an easy fix. This moves the normalized count into the entropy workspace.
Reduce stack usage of ZSTD_buildCTable()
This saves ~700 bytes of stack space in HUF_writeCTable.
Add HUF_writeCTable_wksp() function
* Use `HUF_readStats_wksp()` * Use workspace in `HUF_fillDTableX2*()` * Clean up workspace usage to use a workspace struct
* Move `counting` into the workspace * Inrease `HUF_DECOMPRESS_WORKSPACE_SIZE` by 512 bytes
doc: ZSTD_free*() functions accept NULL pointer
Make the number of physical CPU cores detection more robust
This commit introduces a GitHub action that is triggered on release creation, which creates the release tarball, compresses it, hashes it, signs it, and attaches all of those files to the release.
changed strategy, now unconditionally prefetch the first 2 cache lines, instead of cache lines corresponding to the first and last bytes of the match. This better corresponds to cpu expectation, which should auto-prefetch following cachelines on detecting the sequential nature of the read. This is globally positive, by +5%, though exact gains depend on compiler (from -2% to +15%). The only negative counter-example is gcc-9.
…_prefetch_refactor
This seems to bring an additional ~+1.2% decompression speed on average across 10 compilers x 6 scenarios.
Refactor prefetching for the decoding loop
the new alignment setting is better for gcc-9 and gcc-10 by about ~+5%. Unfortunately, it's worse for essentially all other compilers. Make the new alignment setting conditional to gcc-9+.
Apply flags to libzstd-nomt in libzstd style
improved gcc-9 and gcc-10 decoding speed
When running armv6 userspace on armv8 hardware with a 64 bit Linux kernel, the mode 2 caused SIGBUS (unaligned memory access). Running all our arm builds in the build farm only on armv8 simplifies administration a lot. Depending on compiler and environment, this change might slow down memory accesses (did not benchmark it). The original analysis is 6 years old. Fixes #2632
Avoid SIGBUS on armv6
Cyan4973
approved these changes
May 11, 2021
Contributor
Cyan4973
left a comment
There was a problem hiding this comment.
As expected,
extended fuzzer tests started during the week-end have not found anything so far.
This seems good to go.
|
On Windows 10, maybe this release has a performance regression. Just replace the |
|
This change is missing from changelog: [1.5.0] Enable multithreading in lib build by default (#2584) |
and restored limit to 256 when in 64-bit mode (it was reduced to 200 to give more room for 32-bit). This should fix test instability issues using lot of threads in 32-bit environments.
With small enough input files, the inferred value of fileWindowLog could be smaller than ZSTD_WINDOWLOG_MIN. This can be reproduced like so: $ echo abc > small $ echo abcdef > small2 $ zstd --patch-from small small2 -o patch previously, this would fail with the error "zstd: error 11 : Parameter is out of bound"
reduce ZSTDMT_NBWORKERS_MAX in 32-bit mode
hopefully, bionic will have a more recent version of python required to install meson.
Fixed meson test on travisCI
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changelog
ZSTD_defaultCLevel()ZSTD_getDictID_fromCDict()ZSTD_compress_advanced()ZSTD_compress_usingCDict_advanced()ZSTD_compressBegin_advanced()ZSTD_compressBegin_usingCDict_advanced()ZSTD_initCStream_srcSize()ZSTD_initCStream_usingDict()ZSTD_initCStream_usingCDict()ZSTD_initCStream_advanced()ZSTD_initCStream_usingCDict_advanced()ZSTD_resetCStream()clangand for--longmodes (faster speed for decompressSequencesLong #2614 improved gcc-9 and gcc-10 decoding speed #2630, @Cyan4973)ZSTD_entropyCost(), fix superblocks no sequences case #2592, @senhuang42)ZSTD_estimateCCtxSize*()monotonically increases with compression level (Add memory monotonicity test over srcSize #2538, @senhuang42)zdict.hdictionary training API documentation ([zdict] Add a FAQ to the top of zdict.h #2622, @terrelln)ZSTD_free*()functions accept NULL pointers (doc: ZSTD_free*() functions accept NULL pointer #2521, @animalize)zstd_errors.handzdict.htolib/root ([1.5.0] Movezstd_errors.handzdict.htolib/root #2597, @terrelln)build/directory (Move Single-File Build Script fromcontrib/tobuild/#2618, @felixhandte)ZSTDMT_JOBSIZE_MINto be configured at compile-time, reduce default to 512KB (allow jobSize to be as low as 512 KB #2611, @Cyan4973)ZBUFF_*()is no longer built by default ([1.5.0] Remove ZBUFF #2583, @senhuang42)md5on Darwin (Detect Presence ofmd5on Darwin #2609, @felixhandte)--progressflag added to always display progress bar (Add --progress flag #2595, @senhuang42)--force(Allow Reading from Block Devices with--force#2613, @felixhandte)--filelistend-of-line bug (fix --filelist compatibility with Windows cr+lf line ending #2620, @Cyan4973)