Skip to content

Fuse hypsometric_integral dask path to a single graph evaluation#1212

Merged
brendancol merged 1 commit into
masterfrom
perf/zonal-hypsometric-single-compute
Apr 16, 2026
Merged

Fuse hypsometric_integral dask path to a single graph evaluation#1212
brendancol merged 1 commit into
masterfrom
perf/zonal-hypsometric-single-compute

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • Each block in _hi_block_stats now discovers its own local unique zones and returns a dict of zone_id -> (min, max, sum, count), so there is no up-front _unique_finite_zones pass.
  • _hi_reduce stream-merges the per-block dicts into a single hi_lookup. Scheduler peak memory now scales with the number of distinct zones instead of n_blocks * n_zones.
  • The dask path collapses from two blocking dask.compute() calls to one.

Motivation

On the previous implementation the dask+numpy path did two full graph evaluations before the user ever called .compute() on the result:

  1. _unique_finite_zones(zones_data) to discover zones globally.
  2. _hi_reduce on all per-block partials to build the hi_lookup dict.

On top of that, _hi_reduce built np.stack(partials_list) producing an (n_blocks, n_zones, 4) float64 array in a single scheduler task — at 240,000 blocks * 1000 zones that is ~7.7 GB held on the scheduler.

The streaming dict-merge keeps scheduler memory proportional to the number of distinct zones (tens of KB for typical inputs) and halves the wall time for the dask backend.

Benchmark

Synthetic input: 512×512 float64, chunks=128 (16 blocks), 20 integer zones.

Metric Value
Wall time (median) 45 ms
Tracemalloc peak 4.5 MB
Unique zones recovered 20

Test plan

  • pytest xrspatial/tests/test_hypsometric_integral.py — 29 tests pass
  • pytest xrspatial/tests/test_zonal.py — full zonal suite pass

_hi_dask_numpy did two blocking dask.compute() calls (_unique_finite_zones
at one step, _hi_reduce at the next), so the caller paid for two full
input scans before the lazy map_blocks output was even returned.
_hi_reduce also np.stacked the per-block partials into an
(n_blocks, n_zones, 4) array on the scheduler; at 240k blocks * 1000
zones that is ~7.7 GB resident in a single scheduler task.

Have each block discover its own local unique zones and return a dict
mapping zone id -> (min, max, sum, count). _hi_reduce stream-merges the
partial dicts into a global hi_lookup so scheduler peak memory scales
with the number of distinct zones, not n_blocks * n_zones. The
up-front _unique_finite_zones pass is gone and the whole dask path
collapses to a single graph evaluation.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Apr 16, 2026
@brendancol brendancol merged commit d05d9b7 into master Apr 16, 2026
11 checks passed
@brendancol brendancol deleted the perf/zonal-hypsometric-single-compute branch May 4, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant