Summary
_apply_nodata_mask_gpu in xrspatial/geotiff/_backends/_gpu_helpers.py uses cupy.where(mask, nan, arr_gpu) to replace nodata sentinels with NaN. cupy.where allocates a fresh output buffer the same shape as the input array. Writing the NaN into the existing buffer with cupy.putmask (or boolean assignment) avoids the chunk-sized allocation and the corresponding device-to-device copy.
Locations
_apply_nodata_mask_gpu (line ~45 of _gpu_helpers.py):
- Float path:
arr_gpu = cupy.where(arr_gpu == sentinel, nan, arr_gpu)
- Integer path:
arr_gpu = arr_gpu.astype(cupy.float64) followed by arr_gpu = cupy.where(mask, nan, arr_gpu)
The integer path already allocates a new float64 buffer via astype; the trailing cupy.where then allocates a second buffer of the same size. Both can collapse into a single allocation followed by an in-place write.
Call sites
All call sites pass a freshly decoded GPU buffer that is not shared with caller-visible state:
_backends/gpu.py:387 - arr_gpu returned from read_to_array's GPU stripped path
_backends/gpu.py:684 - arr_gpu returned from the GPU tile decoder
_backends/gpu.py:1072 - per-chunk arr from _decode_window_gpu_direct
In-place mutation is safe at each.
Proposed fix
Use cupy.putmask(arr_gpu, mask, nan) (or arr_gpu[mask] = nan) so the existing buffer is reused. Saves one chunk-sized device buffer per call on GPU-bound workloads with large chunks.
Severity
LOW. Allocator pressure reduction rather than a correctness issue. Filed during the 2026-05-15 deep sweep on geotiff despite earlier classification as marginal.
Summary
_apply_nodata_mask_gpuinxrspatial/geotiff/_backends/_gpu_helpers.pyusescupy.where(mask, nan, arr_gpu)to replace nodata sentinels with NaN.cupy.whereallocates a fresh output buffer the same shape as the input array. Writing the NaN into the existing buffer withcupy.putmask(or boolean assignment) avoids the chunk-sized allocation and the corresponding device-to-device copy.Locations
_apply_nodata_mask_gpu(line ~45 of_gpu_helpers.py):arr_gpu = cupy.where(arr_gpu == sentinel, nan, arr_gpu)arr_gpu = arr_gpu.astype(cupy.float64)followed byarr_gpu = cupy.where(mask, nan, arr_gpu)The integer path already allocates a new float64 buffer via
astype; the trailingcupy.wherethen allocates a second buffer of the same size. Both can collapse into a single allocation followed by an in-place write.Call sites
All call sites pass a freshly decoded GPU buffer that is not shared with caller-visible state:
_backends/gpu.py:387-arr_gpureturned fromread_to_array's GPU stripped path_backends/gpu.py:684-arr_gpureturned from the GPU tile decoder_backends/gpu.py:1072- per-chunkarrfrom_decode_window_gpu_directIn-place mutation is safe at each.
Proposed fix
Use
cupy.putmask(arr_gpu, mask, nan)(orarr_gpu[mask] = nan) so the existing buffer is reused. Saves one chunk-sized device buffer per call on GPU-bound workloads with large chunks.Severity
LOW. Allocator pressure reduction rather than a correctness issue. Filed during the 2026-05-15 deep sweep on
geotiffdespite earlier classification as marginal.