Fix CuPy Bellman-Ford iteration limit in cost_distance by brendancol · Pull Request #1192 · xarray-contrib/xarray-spatial

brendancol · 2026-04-13T21:08:09Z

Closes #1191.

Summary

The CuPy parallel Bellman-Ford loop in cost_distance used max_iterations = height + width. On maze-like friction surfaces where the only passable route zigzags across the grid, shortest paths can have up to height * width - 1 edges. The old limit caused early termination -- reachable pixels were incorrectly reported as NaN.
Changed to height * width, the standard Bellman-Ford V-1 bound. The early-exit changed flag still short-circuits on open grids, so this only costs extra iterations when they're actually needed.

Test plan

New test_snake_maze_long_path -- 5x5 snake maze with a 16-edge shortest path (exceeds old h+w=10 limit), verified across all four backends
Full test_cost_distance.py suite passes (48 tests)

Three performance fixes from the Phase 2 sweep targeting WILL OOM verdicts under 30TB workloads: geotiff: read_geotiff_dask() was reading the entire file into RAM just to extract metadata before building the lazy dask graph. Now uses _read_geo_info() which parses only the IFD via mmap -- O(1) memory regardless of file size. Peak memory during dask setup dropped from 4.41 MB to 0.21 MB at 512x512 (21x reduction). sieve: region_val_buf was allocated at rows*cols (16 GB for a 46K x 46K raster) when the actual region count is typically orders of magnitude smaller. Now counts regions first, allocates at actual size. Also reuses the dead rank array as root_to_id, saving another 4 bytes/pixel. Memory guard fixed from a misleading 5x multiplier to an accurate 28 bytes/pixel estimate. reproject: _reproject_dask_cupy pre-allocated the full output on GPU via cp.full(out_shape), which OOMs for large outputs. Now checks available GPU memory and falls back to the existing map_blocks path (with is_cupy=True) when the output exceeds VRAM. Fast path preserved for outputs that fit.

Four more performance fixes from the Phase 2 sweep: polygonize: _polygonize_dask called dask.compute(*delayed_results) which held all chunk polygon data in memory at once. Now processes chunks incrementally -- interior polygons go straight to the output list and only boundary polygons accumulate for the merge step. polygon_clip: clip_polygon called mask.compute() to materialize the entire rasterized mask before applying it. For a polygon covering most of a 30TB raster, the uint8 mask alone would be multi-TB. Now keeps the mask lazy for dask paths and applies it via xarray.where (dask+numpy) or da.map_blocks (dask+cupy). kde: Both dask paths captured the full point arrays (xs, ys, ws) in every tile task's closure, serializing O(n_tiles * n_points) data. Now pre-filters points per tile using a bounding-box + cutoff-radius check, so each task receives only nearby points. pathfinding: When friction=None, the A* kernel allocated a dummy np.ones((h, w)) array that was never read (use_friction=False skips all friction lookups). For a 100K x 100K grid that's 80 GB of wasted allocation. Now passes a 1x1 dummy instead.

The parallel Bellman-Ford loop used max_iterations = height + width, which is too low for maze-like friction surfaces where shortest paths can snake across the entire grid (up to height * width - 1 edges). Changed to height * width, the standard Bellman-Ford V-1 bound.

Tests a maze where the shortest path has 16 edges on a 5x5 grid, which requires more than height + width Bellman-Ford iterations. Covers all four backends (numpy, cupy, dask+numpy, dask+cupy).

brendancol added 4 commits April 10, 2026 21:51

Add snake-maze regression test for cost_distance (#1191)

0901936

Tests a maze where the shortest path has 16 edges on a 5x5 grid, which requires more than height + width Bellman-Ford iterations. Covers all four backends (numpy, cupy, dask+numpy, dask+cupy).

github-actions Bot added the performance PR touches performance-sensitive code label Apr 13, 2026

brendancol merged commit f3e8603 into master Apr 14, 2026
11 checks passed

brendancol deleted the issue-1191 branch May 5, 2026 03:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CuPy Bellman-Ford iteration limit in cost_distance#1192

Fix CuPy Bellman-Ford iteration limit in cost_distance#1192
brendancol merged 4 commits into
masterfrom
issue-1191

brendancol commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented Apr 13, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant