geotiff: fix libdeflate import, hybrid codec for parallel writer#1826
Open
brendancol wants to merge 4 commits into
Open
geotiff: fix libdeflate import, hybrid codec for parallel writer#1826brendancol wants to merge 4 commits into
brendancol wants to merge 4 commits into
Conversation
The deflate write path had two bugs that together explain a 3.97x slowdown vs GDAL on 512x512 float32 arrays: 1. The optional libdeflate backend imported `libdeflate` (no such package on PyPI) with a `Compressor(level).compress(..., Format.ZLIB)` API that belongs to a different binding. `_HAVE_LIBDEFLATE` was therefore always False in every install and deflate always fell back to stdlib `zlib.compress`, which is ~3x slower than what GDAL gets from its bundled libdeflate. Switched the import to `deflate` (Christian Heimes's PyPI binding) and the call to `deflate.zlib_compress`. 2. `deflate.zlib_compress` does not release the GIL: measured 1.2x speedup across 8 threads vs `zlib.compress`'s 5.1x. Routing the parallel strip/tile path through libdeflate would have regressed large-array writes that rely on thread-pool scaling. Added a `gil_friendly` flag to `compress`/`deflate_compress`, plumbed through `_prepare_strip`, `_prepare_tile`, and `_compress_block`. Sequential callers pick up libdeflate's per-call speedup; parallel callers stay on stdlib zlib for GIL release. Bench (xrspatial.geotiff vs GDAL via rasterio, float32, median of 5): 512x512 deflate: 25.5 ms -> 8.8 ms (3.97x -> 1.48x of GDAL) 2048x2048 deflate: 71.2 ms -> 65.9 ms (0.82x -> 0.75x of GDAL) Also: - `setup.cfg`: new `geotiff = deflate` extras group. - One-shot UserWarning when libdeflate is absent, pointing at `pip install xarray-spatial[geotiff]`. - `deflate.zlib_compress` returns `bytearray`; cast to `bytes` to keep the `-> bytes` contract callers and tests rely on. - Test `test_deflate_higher_level_not_larger` widened from 128x128 noisy gradient to 512x512 smooth gradient: small noisy inputs are not reliably monotonic in level vs output size across codec backends. - Dropped the obsolete thread-local Compressor cache and its test (the `deflate` binding's compress functions are stateless).
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates GeoTIFF deflate compression to use the deflate package when beneficial, while preserving zlib for parallel writer paths where GIL behavior matters.
Changes:
- Switches optional deflate backend import from
libdeflatetodeflate. - Adds a
gil_friendlycompression flag and threads it through parallel strip/tile/streaming compression paths. - Adds a
geotiffextras group and adjusts deflate-related tests.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
xrspatial/geotiff/_compression.py |
Uses the deflate package, adds fallback warning, and exposes gil_friendly compression dispatch. |
xrspatial/geotiff/_writer.py |
Passes gil_friendly=True through parallel compression paths. |
xrspatial/geotiff/tests/test_parallel_writer_1800.py |
Updates deflate backend tests and removes obsolete thread-local cache coverage. |
xrspatial/geotiff/tests/test_compression_level.py |
Makes compression-level size comparison input larger and smoother. |
setup.cfg |
Adds geotiff optional dependency extra for deflate. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -188,12 +188,14 @@ def test_deflate_compress_zlib_wire_compatible(): | |||
|
|
|||
|
|
|||
| def test_deflate_compress_fallback_when_libdeflate_missing(monkeypatch): | |||
The thread-local Compressor cache test was the last direct user of the bare `from concurrent.futures import ThreadPoolExecutor` import. Remaining tests reach the executor through `cf.ThreadPoolExecutor` (so they can monkeypatch it), so the top-level import is now unused. Removing it silences flake8 F401.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
xrspatial.geotiffimported a package name (libdeflate) that does not exist on PyPI, with an API that belongs to a different binding._HAVE_LIBDEFLATEwas therefore False in every install and deflate writes always fell back to stdlibzlib.compress(~3x slower than the libdeflate that GDAL ships bundled). Fixed by importing the actualdeflatePyPI binding.deflate.zlib_compressdoes not release the GIL (measured: 1.2x speedup across 8 threads vszlib.compress's 5.1x), so a flat switch to libdeflate would have regressed the parallel strip/tile writer that needs thread-pool scaling on large arrays. Added agil_friendlyflag so sequential callers get libdeflate's per-call speedup and parallel callers stay on stdlib zlib.geotiffextras group (pip install xarray-spatial[geotiff]) that pulls indeflate, and a one-shot fallback warning when it is absent.Bench
Compared with GDAL (via
rasterio.open(..., 'w', compress='DEFLATE')) on float32 arrays:Small-array deflate now lands at ~1.5x of GDAL (the residual gap is per-write Python overhead in the header assembly path). Large-array deflate stays faster than GDAL via the parallel zlib path.
Notes
deflate.zlib_compressreturnsbytearray; cast tobytesto preserve the documented-> bytescontract.test_deflate_higher_level_not_largerwidened from a 128x128 noisy gradient to a 512x512 smooth gradient. Tiny noisy inputs are not reliably monotonic in level vs output size across codec backends.deflatebinding's compress functions are stateless.Test plan
pytest xrspatial/geotiff/tests/ -k 'deflate or compression or parallel or write'(586 passed, 2 skipped; one pre-existing unrelated failure onmain)