Skip to content

flow_direction_mfd(): numpy and cupy backends have no memory guard #1423

@brendancol

Description

@brendancol

Description

flow_direction_mfd on the numpy and cupy backends allocates an (8, rows, cols) float64 array with no memory guard. That is 64 bytes per input cell — an 8x amplifier on top of the input raster.

  • _cpu at xrspatial/hydro/flow_direction_mfd.py:51 does np.full((8, rows, cols), np.nan, dtype=np.float64).
  • _run_cupy at xrspatial/hydro/flow_direction_mfd.py:285 does cupy.full((8,) + data.shape, cupy.nan, dtype='f8').

A (50000, 50000) DEM asks for ~100 GB of host or GPU memory before anything errors out.

The sister module flow_accumulation_mfd already uses _available_memory_bytes() / _check_memory(rows, cols) and _check_gpu_memory(rows, cols) (flow_accumulation_mfd.py:68-129). flow_direction_mfd is the only MFD module without it. The flow_direction_d8 and flow_direction_dinf variants allocate same-shape outputs (1x amplifiers) and do not need the guard.

Expected behavior

flow_direction_mfd() raises MemoryError with a clear message on numpy and cupy backends when the projected (8, H, W) working set exceeds available memory, matching the flow_accumulation_mfd behavior. Dask paths process per-chunk through map_overlap and inherit the per-chunk guard via _cpu.

Proposed fix

Port _available_memory_bytes / _available_gpu_memory_bytes / _check_memory / _check_gpu_memory from flow_accumulation_mfd.py (64 bytes/pixel budget, 50% threshold) and call from _run_numpy and _run_cupy before the (8, H, W) allocation runs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghigh-priorityoomOut-of-memory risk with large datasets

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions