Skip to content

Guard flow_accumulation() against unbounded eager allocations (#1318)#1319

Merged
brendancol merged 1 commit into
mainfrom
fix/1318-flow-accumulation-memory-guard
Apr 29, 2026
Merged

Guard flow_accumulation() against unbounded eager allocations (#1318)#1319
brendancol merged 1 commit into
mainfrom
fix/1318-flow-accumulation-memory-guard

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • Adds _check_memory / _check_gpu_memory budget checks (29 B/px CPU, 16 B/px GPU, 50% threshold) to the eager numpy and cupy backends in flow_accumulation_d8.py.
  • Dask backends skip the guard because per-tile allocations are already bounded by chunk size.
  • Adds 4 memory-guard tests (numpy raise, normal-input pass, dask bypass, error message).

Fixes #1318.

Background

_flow_accum_cpu allocated accum (float64), in_degree (int32), valid (int8), and two H*W int64 BFS queues, ~29 B/pixel of working memory plus the caller's input. _flow_accum_cupy allocated accum (float64) + in_degree (int32) + state (int32) ~16 B/pixel of GPU memory. Neither backend checked against available memory, so a 50000x50000 numpy raster asked for ~72 GB of host memory before anything errored out.

Hydro is safety-critical, so the same asymmetric-guard pattern applies here as in sieve (#1298), kde (#1289), resample (#1297), sky_view_factor (#1300), surface_distance (#1305).

The dinf and mfd flow_accumulation variants share the same shape and will be handled in separate follow-up PRs per the one-fix-per-security-PR policy.

Test plan

  • pytest xrspatial/hydro/tests/test_flow_accumulation_d8.py -- 33 passed (29 existing + 4 new memory-guard cases)
  • Mocked tiny memory budget on numpy raises MemoryError with a "working memory" / "dask" message
  • Normal-size input still succeeds
  • Dask backend bypasses the guard with mocked 1-byte budget

Fixes #1318.

flow_accumulation() on the numpy and cupy backends had no memory check.
_flow_accum_cpu allocated accum (8 B/px) + in_degree (4 B/px) + valid
(1 B/px) + queue_r/queue_c (8 B/px each) ~ 29 B/pixel of working memory
plus the caller's input array. _flow_accum_cupy did the same shape on
the device at ~16 B/pixel. A 50000x50000 numpy raster asked for ~72 GB
of host memory before anything errored out.

Adds _available_memory_bytes / _available_gpu_memory_bytes helpers and
_check_memory / _check_gpu_memory budget checks at 50% of available
RAM/VRAM. Wires them into the public flow_accumulation_d8() dispatch
before the eager numpy and cupy paths run. Dask paths skip the guard
because per-tile allocations are bounded by chunk size.

Mirrors the pattern from sieve (#1298), kde (#1289), resample (#1297),
sky_view_factor (#1300), surface_distance (#1305).
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Apr 29, 2026
@brendancol brendancol merged commit 23c5aae into main Apr 29, 2026
11 checks passed
brendancol added a commit that referenced this pull request Apr 29, 2026
…1321) (#1324)

Port the _check_memory / _check_gpu_memory helpers from #1319 into the
MFD variant.  The numpy and cupy backends now reject grids whose
working set would exceed 50% of host or device memory, with a message
that points the caller at the dask backends for out-of-core work.

CPU working memory: ~29 B/px (accum + in_degree + valid + queue_r +
queue_c). GPU working memory: ~16 B/px (accum + in_degree + state).
Dask backends are unaffected -- per-tile allocations are bounded by
chunk size.

Adds 4 memory-guard tests: oversize-rejection, valid-pass-through,
dimension-in-message, dask-suggestion-in-message.

Fixes #1321.
brendancol added a commit that referenced this pull request Apr 29, 2026
…1322) (#1325)

Adds _check_memory (29 B/px) and _check_gpu_memory (16 B/px) budget checks
at the 50% threshold to the eager numpy and cupy backends. Dask paths
already use bounded per-tile allocations so they skip the guard.

Same root cause and fix shape as #1318 / #1319 (flow_accumulation_d8).
brendancol added a commit that referenced this pull request Apr 29, 2026
…1331)

Mirror the asymmetric guard pattern from PR #1319: numpy and cupy
backends check projected working set against 50% of available
memory before allocating H*W arrays; dask backends skip the check
since per-tile allocations are already bounded.

CPU peak working set is ~40 B/px (Strahler kernel: order, in_degree,
max_in, cnt_max, queue_r, queue_c). GPU peak is ~37 B/px, budgeted
at 40 B/px conservatively.

Adds 5 tests covering oversize numpy rejection, normal pass, dask
bypass, dimensions in error message, and cupy oversize gating.
brendancol added a commit that referenced this pull request Apr 29, 2026
)

Add memory guards to the numpy and cupy dispatch branches.

CPU peak working set is ~33 B/pixel: float64 cast (8) + labels (8)
+ state (1) + path_r (8) + path_c (8). A 50000x50000 raster needs
~83 GB before the dispatch even runs.

GPU peak is ~28 B/pixel on the device: flow_dir_f64 (8) + pp_f64 (8)
+ labels (8) + state int32 (4).

Helpers _check_memory and _check_gpu_memory raise MemoryError with
the grid dimensions and a pointer to the dask backend when the
projected working set exceeds 50% of available memory. Dask paths
skip the guard since per-tile allocations are bounded by the user's
chunk size.

Same pattern as #1319, #1325, #1324, #1326.
brendancol added a commit that referenced this pull request Apr 29, 2026
Adds _check_memory and _check_gpu_memory budget checks to the eager
numpy and cupy backends in hand_d8.py. The kernel allocates ~38 B/px
of working memory (in_degree int32, valid int8, is_stream int8,
drain_elev float64, hand_out float64, plus two int64 BFS queues), so a
50000x50000 raster requested ~95 GB before any sanity check.

Dask backends skip the guard since per-tile allocations are bounded by
chunk size. Mirrors the pattern from #1319 (flow_accumulation_d8).

Adds 4 memory-guard tests (numpy raise, normal-input pass, dask bypass,
error message + dimension content) plus a cupy raise test that's
skipped without CUDA. 636 hydro tests pass.
brendancol added a commit that referenced this pull request Apr 29, 2026
…1332)

Adds a memory guard to flow_length_d8() matching the pattern from
#1319, #1324, #1325, and #1326. The eager numpy and cupy backends now
raise MemoryError before allocating an HxW working set that would
exceed 50% of available host or GPU memory. Dask paths skip the check
since per-tile allocations are bounded by chunk size.

CPU budget is 29 B/px (in_degree int32 + valid int8 + flow_len float64
+ order_r/order_c int64). GPU budget is 32 B/px covering the device
input + output copies in _flow_length_cupy.
brendancol added a commit that referenced this pull request Apr 29, 2026
The numpy and cupy paths allocate H*W working buffers (labels and BFS
queues on CPU, labels grid on GPU) before any sanity check on the input
size. Passing a sufficiently large in-memory raster can OOM the host
or device.

Add per-module _BYTES_PER_PIXEL (24) and _GPU_BYTES_PER_PIXEL (8)
constants and _check_memory / _check_gpu_memory helpers that raise
MemoryError when the projected working set exceeds 50% of available
RAM / free GPU memory. Wire the guards into the eager numpy and cupy
branches of sink_d8(); dask paths skip the guard since per-tile
allocations are bounded.

Mirrors the pattern from #1318/#1319 and the rest of the hydro guard
series.
brendancol added a commit that referenced this pull request Apr 29, 2026
… (#1366)

The numpy and cupy dispatches each allocate three full H*W float64
buffers (flow_accum cast, pour_points cast, output) -- ~24 B/px with no
memory check. Add per-module _check_memory and _check_gpu_memory helpers
modeled on flow_accumulation_d8 (#1318/#1319), wired into the public
dispatch before the eager allocations. Dask paths use windowed slicing
and skip the guard.
@brendancol brendancol deleted the fix/1318-flow-accumulation-memory-guard branch May 4, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flow_accumulation(): numpy and cupy backends have no memory guard

1 participant