geotiff: in-place nodata mask on GPU (#1934)#1937
Merged
Merged
Conversation
`_apply_nodata_mask_gpu` ran `cupy.where(arr == sentinel, nan, arr)` for both the float and the post-`astype(float64)` integer paths, which allocates a fresh output buffer the same shape as the input. Every call site passes a freshly decoded GPU buffer that no caller-visible state aliases, so writing NaN into the existing buffer with `cupy.putmask` drops one chunk-sized device allocation per call. Adds `test_apply_nodata_mask_gpu_inplace_1934.py` covering the float correctness path, the in-place pointer guarantee, a pool `used_bytes` ceiling for both the float and integer paths, the NaN-sentinel no-op, and the `nodata=None` passthrough.
Note that #1934 (`_apply_nodata_mask_gpu` in-place mutation) was filed and fixed during the 2026-05-15 rockout pass on geotiff.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR optimizes GeoTIFF GPU nodata masking by avoiding an extra device-sized allocation: _apply_nodata_mask_gpu now mutates the already-owned CuPy buffer in place (using cupy.putmask) instead of producing a new array via cupy.where. This reduces allocator pressure especially on dask+cupy per-chunk execution.
Changes:
- Replace
cupy.where(...)with in-placecupy.putmask(...)in_apply_nodata_mask_gpufor both float and integer→float64 paths. - Add new GPU regression tests covering correctness plus in-place/pool-allocation expectations.
- Update internal performance sweep tracking state.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
xrspatial/geotiff/_backends/_gpu_helpers.py |
Switch nodata sentinel replacement to cupy.putmask to avoid allocating a fresh output array. |
xrspatial/geotiff/tests/test_apply_nodata_mask_gpu_inplace_1934.py |
New regression tests for correctness and allocation/in-place behavior around _apply_nodata_mask_gpu. |
.claude/sweep-performance-state.csv |
Record the performance sweep note for the change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+29
to
+43
| def _gpu_available() -> bool: | ||
| if importlib.util.find_spec("cupy") is None: | ||
| return False | ||
| try: | ||
| import cupy | ||
| return bool(cupy.cuda.is_available()) | ||
| except Exception: | ||
| return False | ||
|
|
||
|
|
||
| _HAS_GPU = _gpu_available() | ||
| _gpu_only = pytest.mark.skipif( | ||
| not _HAS_GPU, | ||
| reason="cupy + CUDA required", | ||
| ) |
Comment on lines
+156
to
+179
| arr_gpu = cupy.full((512, 512), 3, dtype=cupy.uint16) | ||
| arr_gpu[0, 0] = 1 # ensure non-sentinel pixel exists | ||
|
|
||
| pool = cupy.get_default_memory_pool() | ||
| cupy.cuda.Stream.null.synchronize() | ||
| used_before = pool.used_bytes() | ||
|
|
||
| out = _apply_nodata_mask_gpu(arr_gpu, 3) | ||
| cupy.cuda.Stream.null.synchronize() | ||
| used_after = pool.used_bytes() | ||
|
|
||
| # Required: one float64 buffer (512*512*8 = 2 MiB) from astype. | ||
| # Pre-fix would have allocated a second float64 buffer for cupy.where | ||
| # (another 2 MiB) on top of that. | ||
| float64_bytes = out.nbytes | ||
| growth = used_after - used_before | ||
| # Allow some slack for the bool mask + .any() scalar (well under | ||
| # one float64 buffer of slack). | ||
| assert growth < 2 * float64_bytes, ( | ||
| f"unexpected allocation growth {growth} bytes >= " | ||
| f"2 * float64_bytes {2 * float64_bytes}; pre-fix double-alloc" | ||
| ) | ||
|
|
||
|
|
- Reuse the shared ``requires_gpu`` marker from ``xrspatial/geotiff/tests/conftest.py`` instead of redefining a local ``_HAS_GPU`` import-and-runtime check. The conftest helper already validates both ``import cupy`` and ``cupy.cuda.is_available()``. - Run the two memory-pool allocation tests under an isolated ``MemoryPool`` allocator and switch the measurement from ``used_bytes`` to ``total_bytes`` (called after ``free_all_blocks``) so the assertion cannot be masked by the input buffer being refcount-freed back to the pool before ``used_after`` is sampled.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_apply_nodata_mask_gpurancupy.where(arr == sentinel, nan, arr)for both the float path and the post-astype(float64)integer path.cupy.whereallocates a fresh output buffer the same shape as the input, so each call paid one chunk-sized device allocation plus the corresponding device-to-device copy._backends/gpu.pypass a freshly decoded GPU buffer that no caller-visible state aliases:gpu.py:387(stripped read),gpu.py:684(tiled GPU decode),gpu.py:1072(per-chunk delayed task).cupy.wherestep withcupy.putmaskso the existing buffer is mutated in place. Drops one chunk-sized allocation per call; matters most on the dask+cupy path where the chunk task runs once per chunk.Test plan
pytest xrspatial/geotiff/tests/test_apply_nodata_mask_gpu_inplace_1934.py -x -q-- 7 new tests pass (correctness on float + int paths, in-place pointer guarantee, poolused_bytesceiling on both paths, NaN-sentinel no-op,nodata=Nonepassthrough).pytest xrspatial/geotiff/tests/test_nodata_nan_int_1774.py xrspatial/geotiff/tests/test_nodata_out_of_range_1581.py xrspatial/geotiff/tests/test_gpu_nodata_1542.py xrspatial/geotiff/tests/test_miniswhite_nodata_1809.py xrspatial/geotiff/tests/test_gds_chunked_gpu_parity_1896.py -q-- 52 existing nodata-related tests pass.Closes #1934.