Skip to content

geotiff: _apply_nodata_mask_gpu allocates instead of mutating in-place #1934

@brendancol

Description

@brendancol

Summary

_apply_nodata_mask_gpu in xrspatial/geotiff/_backends/_gpu_helpers.py uses cupy.where(mask, nan, arr_gpu) to replace nodata sentinels with NaN. cupy.where allocates a fresh output buffer the same shape as the input array. Writing the NaN into the existing buffer with cupy.putmask (or boolean assignment) avoids the chunk-sized allocation and the corresponding device-to-device copy.

Locations

  • _apply_nodata_mask_gpu (line ~45 of _gpu_helpers.py):
    • Float path: arr_gpu = cupy.where(arr_gpu == sentinel, nan, arr_gpu)
    • Integer path: arr_gpu = arr_gpu.astype(cupy.float64) followed by arr_gpu = cupy.where(mask, nan, arr_gpu)

The integer path already allocates a new float64 buffer via astype; the trailing cupy.where then allocates a second buffer of the same size. Both can collapse into a single allocation followed by an in-place write.

Call sites

All call sites pass a freshly decoded GPU buffer that is not shared with caller-visible state:

  • _backends/gpu.py:387 - arr_gpu returned from read_to_array's GPU stripped path
  • _backends/gpu.py:684 - arr_gpu returned from the GPU tile decoder
  • _backends/gpu.py:1072 - per-chunk arr from _decode_window_gpu_direct

In-place mutation is safe at each.

Proposed fix

Use cupy.putmask(arr_gpu, mask, nan) (or arr_gpu[mask] = nan) so the existing buffer is reused. Saves one chunk-sized device buffer per call on GPU-bound workloads with large chunks.

Severity

LOW. Allocator pressure reduction rather than a correctness issue. Filed during the 2026-05-15 deep sweep on geotiff despite earlier classification as marginal.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions