geotiff: parallelize strip writer, adaptive tile threshold, optional libdeflate (closes #1800)#1801
Merged
Conversation
…#1800) The deflate strip-write path was 3.7x slower than rioxarray/GDAL because `_write_stripped` ran zlib.compress serially while the tile writer already parallelized via a thread pool. Three changes: 1. Mirror `_write_tiled`'s ThreadPoolExecutor pattern in `_write_stripped`. Strip preparation is hoisted into a new `_prepare_strip` helper so the same code drives both the serial and parallel paths. A 2048x2048 deflate strip write drops from 405 ms to 70 ms (5.8x speedup, beats rioxarray's 102 ms). 2. Replace the tile writer's `n_tiles <= 4` sequential cutoff with a bytes-based threshold (`_PARALLEL_MIN_BYTES = 4 MiB`). Pre-fix, `tile_size=1024` on a 2048x2048 image produced n_tiles=4 and forced the slow path; now those writes parallelize too. 3. Route `deflate_compress` through the optional `libdeflate` package when installed (1.5-2x faster than stdlib zlib at the same level; GDAL >= 3.7 already uses it). Output is wire-compatible -- decoded streams round-trip through `zlib.decompress` unchanged. Compressors are cached per thread via `threading.local`.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves GeoTIFF write performance by extending the existing parallel tile-compression approach to strip writes, adding a payload-size-based heuristic for when to parallelize, and introducing an optional libdeflate backend for faster deflate compression while maintaining zlib wire compatibility.
Changes:
- Parallelize
_write_strippedusingThreadPoolExecutorfor sufficiently large compressed payloads. - Replace the tile writer’s
n_tiles <= 4sequential cutoff with a bytes-based threshold (_PARALLEL_MIN_BYTES = 4 MiB). - Add an optional
libdeflatepath indeflate_compress()with per-thread compressor caching andzlib.compressfallback.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
xrspatial/geotiff/_writer.py |
Adds _PARALLEL_MIN_BYTES, introduces parallel strip compression, and updates tile sequential/parallel decision logic to be payload-size-based. |
xrspatial/geotiff/_compression.py |
Adds optional libdeflate backend for deflate compression with thread-local compressor caching and zlib fallback. |
xrspatial/geotiff/tests/test_parallel_writer_1800.py |
Adds round-trip and path-selection tests for strip/tile parallelization and libdeflate fallback behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+216
to
+234
| def test_libdeflate_compressor_cache_is_thread_local(): | ||
| """The cache lives in threading.local, so two threads see distinct dicts.""" | ||
| import xrspatial.geotiff._compression as comp_mod | ||
|
|
||
| if not comp_mod._HAVE_LIBDEFLATE: | ||
| pytest.skip('libdeflate not installed') | ||
|
|
||
| seen = {} | ||
|
|
||
| def grab(tag): | ||
| # First call populates the cache; we grab its id(). | ||
| comp_mod._libdeflate_compressor(6) | ||
| seen[tag] = id(comp_mod._libdeflate_thread_local.cache) | ||
|
|
||
| t1 = ThreadPoolExecutor(max_workers=2) | ||
| list(t1.map(grab, ['a', 'b'])) | ||
| t1.shutdown(wait=True) | ||
| # Two workers populated two distinct local caches. | ||
| assert len(set(seen.values())) == 2 |
PR #1801's review flagged that `test_libdeflate_compressor_cache_is_thread_local` could pass with a single observed cache id: `ThreadPoolExecutor(max_workers=2).map(...)` is free to run both submissions on the same worker if the first returns quickly. Force both tasks to occupy a worker at the same time with a `threading.Barrier`, record `threading.get_ident()` so the assertion fails loudly if only one thread actually ran, and use the executor as a context manager so the pool is shut down on assertion failure.
brendancol
added a commit
that referenced
this pull request
May 13, 2026
Resolves conflict in xrspatial/geotiff/__init__.py: keeps the `_read_vrt_dask` dispatch hook from the PR branch. All other geotiff changes from main (#1791, #1793, #1801, #1802, #1803, #1804, #1805, #1806) were already integrated into the working tree by the prior 7329dd9 commit; this merge just records the parent so git recognises the reconciliation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1800.
Summary
_write_strippedwith the sameThreadPoolExecutorpattern that_write_tiledhas used since the start. zlib, zstd, lz4, and the Numba LZW kernel all release the GIL, so this is a direct speedup with no new code paths to maintain.n_tiles <= 4sequential cutoff with a bytes-based threshold (_PARALLEL_MIN_BYTES = 4 MiB). The old cutoff turnedtile_size=1024on a 2048x2048 image (n_tiles=4, 16 MiB) into the slow path, ~8x slower than the parallel path.libdeflatebackend in_compression.deflate_compresswith a stdlibzlib.compressfallback. Compressors are cached per thread viathreading.local; output is wire-compatible (zlib format) either way.Numbers (20-core box)
After the change, deflate strip writes are 1.4-2.9x faster than rioxarray/GDAL. The tile writer's fast path is unchanged.
Compatibility
Round-trip data is bit-identical to the current serial path; verified across
float32,float64,uint8,uint16,int16,int32and acrosscompression={none, deflate, lzw, zstd}with and without predictor. Files produced by the new path open in rioxarray/GDAL and decode through stdlibzlib.No public-API change.
libdeflateis an optional install; absent it, the code callszlib.compressdirectly with no extra overhead.Test plan
xrspatial/geotiff/tests/test_parallel_writer_1800.py(21 new tests): round-trip parity across dtypes and compression codecs, threshold-based path-selection probes (ThreadPoolExecutoris/isn't constructed), libdeflate fallback equivalence tozlib.compress, end-to-end writes throughwrite().xrspatial/geotiff/tests/suite passes. 8 pre-existing GPU test failures (test_predictor2_big_endian_gpu_1517,test_size_param_validation_gpu_vrt_1776) and 1 pre-existing matplotlib deepcopy failure (test_features.py::TestPalette) are reproducible onmainand unrelated to this change.