Skip to content

Speed up cost_distance iterative tile Dijkstra 2-4x#1023

Merged
brendancol merged 1 commit into
masterfrom
refactor/cost-distance-iterative-perf
Mar 18, 2026
Merged

Speed up cost_distance iterative tile Dijkstra 2-4x#1023
brendancol merged 1 commit into
masterfrom
refactor/cost-distance-iterative-perf

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

@brendancol brendancol commented Mar 18, 2026

Summary

  • Batch-compute all dask tiles in a single dask.compute() call and cache them, replacing per-tile .compute() calls that re-executed the graph on every iteration
  • Assemble the final result eagerly from cached tiles instead of a second pass through da.map_blocks
  • Store friction boundary strips as float64 to skip repeated dtype conversion in _compute_seeds
  • Pass precomputed f_min from the dask+cupy fallback path to _cost_distance_dask, avoiding a redundant da.nanmin().compute()

Benchmarked improvement on the iterative (unbounded max_cost) dask path:

Config Before (s) After (s) Speedup Before (MB) After (MB) Mem saved
200x100 0.206 0.050 4.1x 1.90 0.80 58%
300x150 0.229 0.075 3.0x 2.91 1.38 53%
400x200 0.263 0.114 2.3x 4.02 2.19 46%

numpy and dask-bounded (map_overlap) paths are unchanged.

Test plan

  • All 44 existing test_cost_distance.py tests pass
  • Verify dask iterative results still match numpy reference on larger grids
  • Spot-check that dask+cupy fallback path passes _f_min correctly (requires GPU)

Batch-compute all dask tiles in a single scheduler pass and cache them
for reuse across iterations, replacing per-tile .compute() calls that
re-executed the dask graph each time. Store friction boundaries as
float64 to skip repeated dtype conversion. Assemble the final result
eagerly from cached tiles instead of through da.map_blocks. Pass
precomputed f_min from dask+cupy fallback to avoid a redundant
da.nanmin().compute().

Benchmarked improvement on the iterative (unbounded max_cost) path:
  200x100: 0.206s -> 0.050s (4.1x), 1.90MB -> 0.80MB (-58%)
  300x150: 0.229s -> 0.075s (3.0x), 2.91MB -> 1.38MB (-53%)
  400x200: 0.263s -> 0.114s (2.3x), 4.02MB -> 2.19MB (-46%)

numpy and dask-bounded (map_overlap) paths are unchanged.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Mar 18, 2026
@brendancol brendancol merged commit 1670b01 into master Mar 18, 2026
11 checks passed
@brendancol brendancol deleted the refactor/cost-distance-iterative-perf branch May 4, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant