Skip to content

perf: reuse zstd compressors in encoding#5598

Merged
wjones127 merged 2 commits intolance-format:mainfrom
wkalt:task/reuse-compressors
Dec 30, 2025
Merged

perf: reuse zstd compressors in encoding#5598
wjones127 merged 2 commits intolance-format:mainfrom
wkalt:task/reuse-compressors

Conversation

@wkalt
Copy link
Copy Markdown
Contributor

@wkalt wkalt commented Dec 30, 2025

Prior to this commit, the ZstdBufferCompressor would construct a new
zstd stream encoder on every call to compress. With this change we
create one compression context for the ZstdBufferCompressor, and reuse
it across calls to compress.

Reuse is not implemented for lz4 compression or for decompression. These
were both explored but did not bring meaningful benefits over the
existing code.

@wkalt
Copy link
Copy Markdown
Contributor Author

wkalt commented Dec 30, 2025

Here is the improvement from the encoding benchmark the patch introduces:

[/mnt/work/home/wyatt/work/lance] $ cargo bench --bench encoder -- encode_compressed --baseline encode-before
   Compiling lance-encoding v2.0.0-beta.5 (/mnt/work/home/wyatt/work/lance/rust/lance-encoding)
    Finished `bench` profile [optimized + debuginfo] target(s) in 19.80s
     Running benches/encoder.rs (target/release/deps/encoder-a8754c30e2dc3932)
Benchmarking encode_compressed/zstd_strings_10cols: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 58.3s.
encode_compressed/zstd_strings_10cols
                        time:   [5.6452 s 5.6886 s 5.7446 s]
                        thrpt:  [8.7038 Melem/s 8.7895 Melem/s 8.8571 Melem/s]
                 change:
                        time:   [-30.007% -29.406% -28.676%] (p = 0.00 < 0.10)
                        thrpt:  [+40.205% +41.655% +42.872%]
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

This adds some simple benchmarks for encoding/decoding of a
string-valued column under lz4 and zstd compression. A benchmark for
parallel decoding is also included.
Prior to this commit, the ZstdBufferCompressor would construct a new
zstd stream encoder on every call to compress. With this change we
create one compression context for the ZstdBufferCompressor, and reuse
it across calls to compress.

Reuse is not implemented for lz4 compression or for decompression. These
were both explored but did not bring meaningful benefits over the
existing code.
@wkalt wkalt force-pushed the task/reuse-compressors branch from 5df31ed to c7320f7 Compare December 30, 2025 20:46
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 30, 2025

Codecov Report

❌ Patch coverage is 77.77778% with 8 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ust/lance-encoding/src/encodings/physical/block.rs 77.77% 6 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@wjones127 wjones127 merged commit b07572c into lance-format:main Dec 30, 2025
29 checks passed
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
Prior to this commit, the ZstdBufferCompressor would construct a new
zstd stream encoder on every call to compress. With this change we
create one compression context for the ZstdBufferCompressor, and reuse
it across calls to compress.

Reuse is not implemented for lz4 compression or for decompression. These
were both explored but did not bring meaningful benefits over the
existing code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants