Skip to content

Add native Dask+CuPy backends for hydrology core functions (#952)#966

Merged
brendancol merged 1 commit into
masterfrom
issue-952
Mar 4, 2026
Merged

Add native Dask+CuPy backends for hydrology core functions (#952)#966
brendancol merged 1 commit into
masterfrom
issue-952

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • Replaces CPU fallback with native GPU tile kernels for six hydrology functions (flow_accumulation, watershed, basin, stream_order, stream_link, snap_pour_point) when running on Dask+CuPy arrays
  • Each tile now runs existing CUDA kernels directly with seed injection at boundaries, keeping data GPU-resident through the iterative tile sweep
  • Adds a native CUDA kernel for snap_pour_point's single-GPU CuPy path (previously fell back to CPU)
  • Updates README feature matrix: all six functions now show native support across all four backends

What changed

Per-tile GPU kernels with seed injection: The Dask+CuPy path previously converted CuPy chunks to NumPy, ran the CPU tile kernel, then converted back. Now each tile runs the same GPU frontier-peeling kernels used by the single-GPU path, with external boundary values injected before the peeling loop starts. Seeds are transferred CPU-side only at tile boundaries (small O(edge_length) strips), while all tile-interior computation stays on GPU.

snap_pour_point native CuPy: Added _snap_pour_point_gpu CUDA kernel where each thread handles one pour point's windowed max search. The flow accumulation array stays on GPU instead of being pulled to CPU.

stream_link tile-aware kernel: Added _stream_link_find_ready_tile CUDA kernel that uses global coordinate offsets for position-based link IDs, so tile-local results are consistent with full-array results.

Test plan

  • All 158 existing + new tests pass for all six modules
  • New dask+cupy tests with multiple chunk sizes and random acyclic grids for each function
  • Verified dask+cupy output matches dask+numpy output exactly for basin (pre-existing tile-sweep convergence issue affects both backends identically)

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Mar 4, 2026
Replace CPU fallback with native GPU tile kernels for flow_accumulation,
watershed, basin, stream_order, stream_link, and snap_pour_point when
running on Dask+CuPy arrays. Each function now runs its existing CUDA
kernels per-tile with seed injection at tile boundaries, keeping data
GPU-resident throughout the iterative tile sweep. Also adds a native
CUDA kernel for snap_pour_point's single-GPU CuPy path.
@brendancol brendancol merged commit 7cad73b into master Mar 4, 2026
11 checks passed
@brendancol brendancol deleted the issue-952 branch May 4, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant