Add native Dask+CuPy backends for hydrology core functions (#952) by brendancol · Pull Request #966 · xarray-contrib/xarray-spatial

brendancol · 2026-03-04T21:16:32Z

Summary

Replaces CPU fallback with native GPU tile kernels for six hydrology functions (flow_accumulation, watershed, basin, stream_order, stream_link, snap_pour_point) when running on Dask+CuPy arrays
Each tile now runs existing CUDA kernels directly with seed injection at boundaries, keeping data GPU-resident through the iterative tile sweep
Adds a native CUDA kernel for snap_pour_point's single-GPU CuPy path (previously fell back to CPU)
Updates README feature matrix: all six functions now show native support across all four backends

What changed

Per-tile GPU kernels with seed injection: The Dask+CuPy path previously converted CuPy chunks to NumPy, ran the CPU tile kernel, then converted back. Now each tile runs the same GPU frontier-peeling kernels used by the single-GPU path, with external boundary values injected before the peeling loop starts. Seeds are transferred CPU-side only at tile boundaries (small O(edge_length) strips), while all tile-interior computation stays on GPU.

snap_pour_point native CuPy: Added _snap_pour_point_gpu CUDA kernel where each thread handles one pour point's windowed max search. The flow accumulation array stays on GPU instead of being pulled to CPU.

stream_link tile-aware kernel: Added _stream_link_find_ready_tile CUDA kernel that uses global coordinate offsets for position-based link IDs, so tile-local results are consistent with full-array results.

Test plan

All 158 existing + new tests pass for all six modules
New dask+cupy tests with multiple chunk sizes and random acyclic grids for each function
Verified dask+cupy output matches dask+numpy output exactly for basin (pre-existing tile-sweep convergence issue affects both backends identically)

Replace CPU fallback with native GPU tile kernels for flow_accumulation, watershed, basin, stream_order, stream_link, and snap_pour_point when running on Dask+CuPy arrays. Each function now runs its existing CUDA kernels per-tile with seed injection at tile boundaries, keeping data GPU-resident throughout the iterative tile sweep. Also adds a native CUDA kernel for snap_pour_point's single-GPU CuPy path.

github-actions Bot added the performance PR touches performance-sensitive code label Mar 4, 2026

brendancol force-pushed the issue-952 branch from 605500a to ff14824 Compare March 4, 2026 21:30

brendancol merged commit 7cad73b into master Mar 4, 2026
11 checks passed

brendancol deleted the issue-952 branch May 4, 2026 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add native Dask+CuPy backends for hydrology core functions (#952)#966

Add native Dask+CuPy backends for hydrology core functions (#952)#966
brendancol merged 1 commit into
masterfrom
issue-952

brendancol commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented Mar 4, 2026

Summary

What changed

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant