Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions programs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,8 +157,8 @@ Advanced arguments :

Dictionary builder :
--train ## : create a dictionary from a training set of files
--train-cover[=k=#,d=#,steps=#,split=#] : use the cover algorithm with optional args
--train-fastcover[=k=#,d=#,f=#,steps=#,split=#,accel=#] : use the fastcover algorithm with optional args
--train-cover[=k=#,d=#,steps=#,split=#,shrink[=#]] : use the cover algorithm with optional args
--train-fastcover[=k=#,d=#,f=#,steps=#,split=#,shrink[=#],accel=#] : use the fastcover algorithm with optional args
--train-legacy[=s=#] : use the legacy algorithm with selectivity (default: 9)
-o file : `file` is dictionary name (default: dictionary)
--maxdict=# : limit dictionary to specified size (default: 112640)
Expand Down
11 changes: 10 additions & 1 deletion programs/zstd.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,13 +244,15 @@ Compression of small files similar to the sample set will be greatly improved.
This compares favorably to 4 bytes default.
However, it's up to the dictionary manager to not assign twice the same ID to
2 different dictionaries.
* `--train-cover[=k#,d=#,steps=#,split=#]`:
* `--train-cover[=k#,d=#,steps=#,split=#,shrink[=#]]`:
Select parameters for the default dictionary builder algorithm named cover.
If _d_ is not specified, then it tries _d_ = 6 and _d_ = 8.
If _k_ is not specified, then it tries _steps_ values in the range [50, 2000].
If _steps_ is not specified, then the default value of 40 is used.
If _split_ is not specified or split <= 0, then the default value of 100 is used.
Requires that _d_ <= _k_.
If _shrink_ flag is not used, then the default value for _shrinkDict_ of 0 is used.
If _shrink_ is not specified, then the default value for _shrinkDictMaxRegression_ of 1 is used.

Selects segments of size _k_ with highest score to put in the dictionary.
The score of a segment is computed by the sum of the frequencies of all the
Expand All @@ -262,6 +264,9 @@ Compression of small files similar to the sample set will be greatly improved.
If _split_ is 100, all input samples are used for both training and testing
to find optimal _d_ and _k_ to build dictionary.
Supports multithreading if `zstd` is compiled with threading support.
Having _shrink_ enabled takes a truncated dictionary of minimum size and doubles
in size until compression ratio of the truncated dictionary is at most
_shrinkDictMaxRegression%_ worse than the compression ratio of the largest dictionary.

Examples:

Expand All @@ -275,6 +280,10 @@ Compression of small files similar to the sample set will be greatly improved.

`zstd --train-cover=k=50,split=60 FILEs`

`zstd --train-cover=shrink FILEs`

`zstd --train-cover=shrink=2 FILEs`

* `--train-fastcover[=k#,d=#,f=#,steps=#,split=#,accel=#]`:
Same as cover but with extra parameters _f_ and _accel_ and different default value of split
If _split_ is not specified, then it tries _split_ = 75.
Expand Down