geotiff: in-place nodata mask on GPU (#1934) by brendancol · Pull Request #1937 · xarray-contrib/xarray-spatial

brendancol · 2026-05-15T14:34:22Z

Summary

_apply_nodata_mask_gpu ran cupy.where(arr == sentinel, nan, arr) for both the float path and the post-astype(float64) integer path. cupy.where allocates a fresh output buffer the same shape as the input, so each call paid one chunk-sized device allocation plus the corresponding device-to-device copy.
All three call sites in _backends/gpu.py pass a freshly decoded GPU buffer that no caller-visible state aliases:
- gpu.py:387 (stripped read), gpu.py:684 (tiled GPU decode), gpu.py:1072 (per-chunk delayed task).
Replaced the cupy.where step with cupy.putmask so the existing buffer is mutated in place. Drops one chunk-sized allocation per call; matters most on the dask+cupy path where the chunk task runs once per chunk.

Test plan

pytest xrspatial/geotiff/tests/test_apply_nodata_mask_gpu_inplace_1934.py -x -q -- 7 new tests pass (correctness on float + int paths, in-place pointer guarantee, pool used_bytes ceiling on both paths, NaN-sentinel no-op, nodata=None passthrough).
pytest xrspatial/geotiff/tests/test_nodata_nan_int_1774.py xrspatial/geotiff/tests/test_nodata_out_of_range_1581.py xrspatial/geotiff/tests/test_gpu_nodata_1542.py xrspatial/geotiff/tests/test_miniswhite_nodata_1809.py xrspatial/geotiff/tests/test_gds_chunked_gpu_parity_1896.py -q -- 52 existing nodata-related tests pass.

Closes #1934.

`_apply_nodata_mask_gpu` ran `cupy.where(arr == sentinel, nan, arr)` for both the float and the post-`astype(float64)` integer paths, which allocates a fresh output buffer the same shape as the input. Every call site passes a freshly decoded GPU buffer that no caller-visible state aliases, so writing NaN into the existing buffer with `cupy.putmask` drops one chunk-sized device allocation per call. Adds `test_apply_nodata_mask_gpu_inplace_1934.py` covering the float correctness path, the in-place pointer guarantee, a pool `used_bytes` ceiling for both the float and integer paths, the NaN-sentinel no-op, and the `nodata=None` passthrough.

Note that #1934 (`_apply_nodata_mask_gpu` in-place mutation) was filed and fixed during the 2026-05-15 rockout pass on geotiff.

Copilot

Pull request overview

This PR optimizes GeoTIFF GPU nodata masking by avoiding an extra device-sized allocation: _apply_nodata_mask_gpu now mutates the already-owned CuPy buffer in place (using cupy.putmask) instead of producing a new array via cupy.where. This reduces allocator pressure especially on dask+cupy per-chunk execution.

Changes:

Replace cupy.where(...) with in-place cupy.putmask(...) in _apply_nodata_mask_gpu for both float and integer→float64 paths.
Add new GPU regression tests covering correctness plus in-place/pool-allocation expectations.
Update internal performance sweep tracking state.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`xrspatial/geotiff/_backends/_gpu_helpers.py`	Switch nodata sentinel replacement to `cupy.putmask` to avoid allocating a fresh output array.
`xrspatial/geotiff/tests/test_apply_nodata_mask_gpu_inplace_1934.py`	New regression tests for correctness and allocation/in-place behavior around `_apply_nodata_mask_gpu`.
`.claude/sweep-performance-state.csv`	Record the performance sweep note for the change.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+def _gpu_available() -> bool:
+    if importlib.util.find_spec("cupy") is None:
+        return False
+    try:
+        import cupy
+        return bool(cupy.cuda.is_available())
+    except Exception:
+        return False
+
+
+_HAS_GPU = _gpu_available()
+_gpu_only = pytest.mark.skipif(
+    not _HAS_GPU,
+    reason="cupy + CUDA required",
+)


+    arr_gpu = cupy.full((512, 512), 3, dtype=cupy.uint16)
+    arr_gpu[0, 0] = 1  # ensure non-sentinel pixel exists
+
+    pool = cupy.get_default_memory_pool()
+    cupy.cuda.Stream.null.synchronize()
+    used_before = pool.used_bytes()
+
+    out = _apply_nodata_mask_gpu(arr_gpu, 3)
+    cupy.cuda.Stream.null.synchronize()
+    used_after = pool.used_bytes()
+
+    # Required: one float64 buffer (512*512*8 = 2 MiB) from astype.
+    # Pre-fix would have allocated a second float64 buffer for cupy.where
+    # (another 2 MiB) on top of that.
+    float64_bytes = out.nbytes
+    growth = used_after - used_before
+    # Allow some slack for the bool mask + .any() scalar (well under
+    # one float64 buffer of slack).
+    assert growth < 2 * float64_bytes, (
+        f"unexpected allocation growth {growth} bytes >= "
+        f"2 * float64_bytes {2 * float64_bytes}; pre-fix double-alloc"
+    )
+
+


- Reuse the shared ``requires_gpu`` marker from ``xrspatial/geotiff/tests/conftest.py`` instead of redefining a local ``_HAS_GPU`` import-and-runtime check. The conftest helper already validates both ``import cupy`` and ``cupy.cuda.is_available()``. - Run the two memory-pool allocation tests under an isolated ``MemoryPool`` allocator and switch the measurement from ``used_bytes`` to ``total_bytes`` (called after ``free_all_blocks``) so the assertion cannot be masked by the input buffer being refcount-freed back to the pool before ``used_after`` is sampled.

brendancol added 2 commits May 15, 2026 07:34

geotiff: record rockout 2026-05-15 LOW fix in sweep-performance state

511f085

Note that #1934 (`_apply_nodata_mask_gpu` in-place mutation) was filed and fixed during the 2026-05-15 rockout pass on geotiff.

Copilot AI review requested due to automatic review settings May 15, 2026 14:34

github-actions Bot added the performance PR touches performance-sensitive code label May 15, 2026

Copilot started reviewing on behalf of brendancol May 15, 2026 14:34 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

brendancol merged commit 6bff87e into main May 15, 2026
10 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

geotiff: in-place nodata mask on GPU (#1934)#1937

geotiff: in-place nodata mask on GPU (#1934)#1937
brendancol merged 3 commits into
mainfrom
rockout-performance-low-geotiff-2026-05-15

brendancol commented May 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brendancol commented May 15, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants