geotiff: parallelize strip writer, adaptive tile threshold, optional libdeflate (closes #1800) by brendancol · Pull Request #1801 · xarray-contrib/xarray-spatial

brendancol · 2026-05-13T14:35:36Z

Closes #1800.

Summary

Parallelizes _write_stripped with the same ThreadPoolExecutor pattern that _write_tiled has used since the start. zlib, zstd, lz4, and the Numba LZW kernel all release the GIL, so this is a direct speedup with no new code paths to maintain.
Replaces the tile writer's n_tiles <= 4 sequential cutoff with a bytes-based threshold (_PARALLEL_MIN_BYTES = 4 MiB). The old cutoff turned tile_size=1024 on a 2048x2048 image (n_tiles=4, 16 MiB) into the slow path, ~8x slower than the parallel path.
Adds an optional libdeflate backend in _compression.deflate_compress with a stdlib zlib.compress fallback. Compressors are cached per thread via threading.local; output is wire-compatible (zlib format) either way.

Numbers (20-core box)

Workload	Before	After	rioxarray/GDAL
2048x2048 float32 random, deflate strip	405 ms	70 ms	102 ms
3600x3600 float32 terrain, deflate+predictor=3, strip	948 ms	121 ms	356 ms
2048x2048 deflate tile_size=1024	395 ms	109 ms	n/a

After the change, deflate strip writes are 1.4-2.9x faster than rioxarray/GDAL. The tile writer's fast path is unchanged.

Compatibility

Round-trip data is bit-identical to the current serial path; verified across float32, float64, uint8, uint16, int16, int32 and across compression={none, deflate, lzw, zstd} with and without predictor. Files produced by the new path open in rioxarray/GDAL and decode through stdlib zlib.

No public-API change. libdeflate is an optional install; absent it, the code calls zlib.compress directly with no extra overhead.

Test plan

xrspatial/geotiff/tests/test_parallel_writer_1800.py (21 new tests): round-trip parity across dtypes and compression codecs, threshold-based path-selection probes (ThreadPoolExecutor is/isn't constructed), libdeflate fallback equivalence to zlib.compress, end-to-end writes through write().
Full xrspatial/geotiff/tests/ suite passes. 8 pre-existing GPU test failures (test_predictor2_big_endian_gpu_1517, test_size_param_validation_gpu_vrt_1776) and 1 pre-existing matplotlib deepcopy failure (test_features.py::TestPalette) are reproducible on main and unrelated to this change.
Cross-library round-trip with rioxarray/GDAL on synthetic and real (Copernicus DSM, USGS) GeoTIFFs.

…#1800) The deflate strip-write path was 3.7x slower than rioxarray/GDAL because `_write_stripped` ran zlib.compress serially while the tile writer already parallelized via a thread pool. Three changes: 1. Mirror `_write_tiled`'s ThreadPoolExecutor pattern in `_write_stripped`. Strip preparation is hoisted into a new `_prepare_strip` helper so the same code drives both the serial and parallel paths. A 2048x2048 deflate strip write drops from 405 ms to 70 ms (5.8x speedup, beats rioxarray's 102 ms). 2. Replace the tile writer's `n_tiles <= 4` sequential cutoff with a bytes-based threshold (`_PARALLEL_MIN_BYTES = 4 MiB`). Pre-fix, `tile_size=1024` on a 2048x2048 image produced n_tiles=4 and forced the slow path; now those writes parallelize too. 3. Route `deflate_compress` through the optional `libdeflate` package when installed (1.5-2x faster than stdlib zlib at the same level; GDAL >= 3.7 already uses it). Output is wire-compatible -- decoded streams round-trip through `zlib.decompress` unchanged. Compressors are cached per thread via `threading.local`.

Copilot

Pull request overview

This PR improves GeoTIFF write performance by extending the existing parallel tile-compression approach to strip writes, adding a payload-size-based heuristic for when to parallelize, and introducing an optional libdeflate backend for faster deflate compression while maintaining zlib wire compatibility.

Changes:

Parallelize _write_stripped using ThreadPoolExecutor for sufficiently large compressed payloads.
Replace the tile writer’s n_tiles <= 4 sequential cutoff with a bytes-based threshold (_PARALLEL_MIN_BYTES = 4 MiB).
Add an optional libdeflate path in deflate_compress() with per-thread compressor caching and zlib.compress fallback.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`xrspatial/geotiff/_writer.py`	Adds `_PARALLEL_MIN_BYTES`, introduces parallel strip compression, and updates tile sequential/parallel decision logic to be payload-size-based.
`xrspatial/geotiff/_compression.py`	Adds optional `libdeflate` backend for deflate compression with thread-local compressor caching and zlib fallback.
`xrspatial/geotiff/tests/test_parallel_writer_1800.py`	Adds round-trip and path-selection tests for strip/tile parallelization and `libdeflate` fallback behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+def test_libdeflate_compressor_cache_is_thread_local():
+    """The cache lives in threading.local, so two threads see distinct dicts."""
+    import xrspatial.geotiff._compression as comp_mod
+
+    if not comp_mod._HAVE_LIBDEFLATE:
+        pytest.skip('libdeflate not installed')
+
+    seen = {}
+
+    def grab(tag):
+        # First call populates the cache; we grab its id().
+        comp_mod._libdeflate_compressor(6)
+        seen[tag] = id(comp_mod._libdeflate_thread_local.cache)
+
+    t1 = ThreadPoolExecutor(max_workers=2)
+    list(t1.map(grab, ['a', 'b']))
+    t1.shutdown(wait=True)
+    # Two workers populated two distinct local caches.
+    assert len(set(seen.values())) == 2


PR #1801's review flagged that `test_libdeflate_compressor_cache_is_thread_local` could pass with a single observed cache id: `ThreadPoolExecutor(max_workers=2).map(...)` is free to run both submissions on the same worker if the first returns quickly. Force both tasks to occupy a worker at the same time with a `threading.Barrier`, record `threading.get_ident()` so the assertion fails loudly if only one thread actually ran, and use the executor as a context manager so the pool is shut down on assertion failure.

Resolves conflict in xrspatial/geotiff/__init__.py: keeps the `_read_vrt_dask` dispatch hook from the PR branch. All other geotiff changes from main (#1791, #1793, #1801, #1802, #1803, #1804, #1805, #1806) were already integrated into the working tree by the prior 7329dd9 commit; this merge just records the parent so git recognises the reconciliation.

brendancol added enhancement New feature or request performance PR touches performance-sensitive code labels May 13, 2026

brendancol requested a review from Copilot May 13, 2026 14:36

Copilot started reviewing on behalf of brendancol May 13, 2026 14:37 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

brendancol merged commit 56cc261 into main May 13, 2026
2 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

geotiff: parallelize strip writer, adaptive tile threshold, optional libdeflate (closes #1800)#1801

geotiff: parallelize strip writer, adaptive tile threshold, optional libdeflate (closes #1800)#1801
brendancol merged 2 commits into
mainfrom
issue-1800

brendancol commented May 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brendancol commented May 13, 2026

Summary

Numbers (20-core box)

Compatibility

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants