Guard sieve() numpy and cupy backends against oversized rasters (#1296) by brendancol · Pull Request #1298 · xarray-contrib/xarray-spatial

brendancol · 2026-04-28T13:40:37Z

Summary

Adds memory guards to the numpy and cupy sieve() backends, matching the existing dask guards.
A 50000x50000 numpy raster previously asked for ~50 GB of host RAM with no error from sieve(); now raises MemoryError with a clear message.
Same asymmetric-guard pattern previously fixed in cost_distance cost_distance: missing memory guard on CuPy backend #1262, mahalanobis mahalanobis: unbounded float64 allocations on numpy and cupy backends #1288, multispectral true_color allocates ~17 bytes/pixel of working memory with no guard #1291, and kde kde: unbounded width/height allocation in numpy and cupy backends #1287.

Closes #1296.

Implementation

_check_memory(rows, cols) at 28 bytes/pixel host budget (matches the dask guard already in _sieve_dask).
_check_gpu_memory(rows, cols) at 16 bytes/pixel GPU round-trip budget; calls _check_memory first so the host budget still applies for the eventual .get() transfer.
Both raise at 50% of available memory.
Wired into _sieve_numpy and _sieve_cupy before any allocation.
_available_memory_bytes was duplicated across the file; consolidated into one definition.

Test plan

pytest xrspatial/tests/test_sieve.py, 47/47 pass.
New: test_sieve_numpy_memory_guard (direct _sieve_numpy call with mocked low memory).
New: test_sieve_numpy_memory_guard_via_public_api (full sieve() entry point).
Existing test_sieve_dask_memory_guard still passes.

Followup (separate)

The int32 indices in _label_connected silently truncate when n > 2^31 (rasters above ~46340x46340). The docstring already calls this out. The new memory guard rejects rasters that large before the int32 issue can trigger, so this is documentation/clarity rather than an exploitable bug, left for a separate change.

The dask paths in xrspatial/sieve.py already raised MemoryError before calling .compute() on huge inputs, but the numpy and cupy paths went straight into _sieve_numpy with no check. _label_connected allocates parent, rank, and region_map_flat as int32 arrays plus a float64 result copy (about 20 bytes/pixel), so a 50000x50000 raster asked for ~50 GB of host RAM silently. Add _check_memory(rows, cols) and _check_gpu_memory(rows, cols) at 28 bytes/pixel host budget (matches the existing dask guard) and 16 bytes/pixel GPU round-trip budget, both at 50% of available memory. Wire _check_memory into _sieve_numpy at the top and _check_gpu_memory into _sieve_cupy before data.get(). Consolidate the _available_memory_bytes definition that was duplicated across blocks. Same asymmetric-guard pattern fixed in cost_distance #1262, mahalanobis #1288, multispectral #1291, and kde #1287. Two new memory-guard tests cover the numpy backend (direct _sieve_numpy call and via public sieve()). All 47 tests pass.

…#1319) Fixes #1318. flow_accumulation() on the numpy and cupy backends had no memory check. _flow_accum_cpu allocated accum (8 B/px) + in_degree (4 B/px) + valid (1 B/px) + queue_r/queue_c (8 B/px each) ~ 29 B/pixel of working memory plus the caller's input array. _flow_accum_cupy did the same shape on the device at ~16 B/pixel. A 50000x50000 numpy raster asked for ~72 GB of host memory before anything errored out. Adds _available_memory_bytes / _available_gpu_memory_bytes helpers and _check_memory / _check_gpu_memory budget checks at 50% of available RAM/VRAM. Wires them into the public flow_accumulation_d8() dispatch before the eager numpy and cupy paths run. Dask paths skip the guard because per-tile allocations are bounded by chunk size. Mirrors the pattern from sieve (#1298), kde (#1289), resample (#1297), sky_view_factor (#1300), surface_distance (#1305).

github-actions Bot added the performance PR touches performance-sensitive code label Apr 28, 2026

brendancol merged commit f325ece into main Apr 28, 2026
11 checks passed

Copilot AI mentioned this pull request Apr 28, 2026

Guard resample() against unbounded output allocations (#1295) #1297

Merged

7 tasks

This was referenced Apr 29, 2026

Guard flow_accumulation_mfd() against unbounded memory allocations (#1321) #1324

Merged

Guard hand_d8() against unbounded memory allocations (#1323) #1326

Merged

flow_length_d8(): numpy and cupy backends have no memory guard #1327

Closed

brendancol deleted the issue-1296 branch May 4, 2026 19:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guard sieve() numpy and cupy backends against oversized rasters (#1296)#1298

Guard sieve() numpy and cupy backends against oversized rasters (#1296)#1298
brendancol merged 1 commit into
mainfrom
issue-1296

brendancol commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented Apr 28, 2026

Summary

Implementation

Test plan

Followup (separate)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant