Skip to content

Fix CuPy Bellman-Ford iteration limit in cost_distance#1192

Merged
brendancol merged 4 commits into
masterfrom
issue-1191
Apr 14, 2026
Merged

Fix CuPy Bellman-Ford iteration limit in cost_distance#1192
brendancol merged 4 commits into
masterfrom
issue-1191

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1191.

Summary

  • The CuPy parallel Bellman-Ford loop in cost_distance used max_iterations = height + width. On maze-like friction surfaces where the only passable route zigzags across the grid, shortest paths can have up to height * width - 1 edges. The old limit caused early termination -- reachable pixels were incorrectly reported as NaN.
  • Changed to height * width, the standard Bellman-Ford V-1 bound. The early-exit changed flag still short-circuits on open grids, so this only costs extra iterations when they're actually needed.

Test plan

  • New test_snake_maze_long_path -- 5x5 snake maze with a 16-edge shortest path (exceeds old h+w=10 limit), verified across all four backends
  • Full test_cost_distance.py suite passes (48 tests)

Three performance fixes from the Phase 2 sweep targeting WILL OOM
verdicts under 30TB workloads:

geotiff: read_geotiff_dask() was reading the entire file into RAM just
to extract metadata before building the lazy dask graph. Now uses
_read_geo_info() which parses only the IFD via mmap -- O(1) memory
regardless of file size. Peak memory during dask setup dropped from
4.41 MB to 0.21 MB at 512x512 (21x reduction).

sieve: region_val_buf was allocated at rows*cols (16 GB for a 46K x 46K
raster) when the actual region count is typically orders of magnitude
smaller. Now counts regions first, allocates at actual size. Also reuses
the dead rank array as root_to_id, saving another 4 bytes/pixel. Memory
guard fixed from a misleading 5x multiplier to an accurate 28
bytes/pixel estimate.

reproject: _reproject_dask_cupy pre-allocated the full output on GPU via
cp.full(out_shape), which OOMs for large outputs. Now checks available
GPU memory and falls back to the existing map_blocks path (with
is_cupy=True) when the output exceeds VRAM. Fast path preserved for
outputs that fit.
Four more performance fixes from the Phase 2 sweep:

polygonize: _polygonize_dask called dask.compute(*delayed_results) which
held all chunk polygon data in memory at once. Now processes chunks
incrementally -- interior polygons go straight to the output list and
only boundary polygons accumulate for the merge step.

polygon_clip: clip_polygon called mask.compute() to materialize the
entire rasterized mask before applying it. For a polygon covering most
of a 30TB raster, the uint8 mask alone would be multi-TB. Now keeps the
mask lazy for dask paths and applies it via xarray.where (dask+numpy)
or da.map_blocks (dask+cupy).

kde: Both dask paths captured the full point arrays (xs, ys, ws) in every
tile task's closure, serializing O(n_tiles * n_points) data. Now
pre-filters points per tile using a bounding-box + cutoff-radius check,
so each task receives only nearby points.

pathfinding: When friction=None, the A* kernel allocated a dummy
np.ones((h, w)) array that was never read (use_friction=False skips all
friction lookups). For a 100K x 100K grid that's 80 GB of wasted
allocation. Now passes a 1x1 dummy instead.
The parallel Bellman-Ford loop used max_iterations = height + width,
which is too low for maze-like friction surfaces where shortest paths
can snake across the entire grid (up to height * width - 1 edges).
Changed to height * width, the standard Bellman-Ford V-1 bound.
Tests a maze where the shortest path has 16 edges on a 5x5 grid,
which requires more than height + width Bellman-Ford iterations.
Covers all four backends (numpy, cupy, dask+numpy, dask+cupy).
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Apr 13, 2026
@brendancol brendancol merged commit f3e8603 into master Apr 14, 2026
11 checks passed
@brendancol brendancol deleted the issue-1191 branch May 5, 2026 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CuPy cost_distance Bellman-Ford terminates early on maze-like friction surfaces

1 participant