Fuse hypsometric_integral dask path to a single graph evaluation by brendancol · Pull Request #1212 · xarray-contrib/xarray-spatial

brendancol · 2026-04-16T16:05:45Z

Summary

Each block in _hi_block_stats now discovers its own local unique zones and returns a dict of zone_id -> (min, max, sum, count), so there is no up-front _unique_finite_zones pass.
_hi_reduce stream-merges the per-block dicts into a single hi_lookup. Scheduler peak memory now scales with the number of distinct zones instead of n_blocks * n_zones.
The dask path collapses from two blocking dask.compute() calls to one.

Motivation

On the previous implementation the dask+numpy path did two full graph evaluations before the user ever called .compute() on the result:

_unique_finite_zones(zones_data) to discover zones globally.
_hi_reduce on all per-block partials to build the hi_lookup dict.

On top of that, _hi_reduce built np.stack(partials_list) producing an (n_blocks, n_zones, 4) float64 array in a single scheduler task — at 240,000 blocks * 1000 zones that is ~7.7 GB held on the scheduler.

The streaming dict-merge keeps scheduler memory proportional to the number of distinct zones (tens of KB for typical inputs) and halves the wall time for the dask backend.

Benchmark

Synthetic input: 512×512 float64, chunks=128 (16 blocks), 20 integer zones.

Metric	Value
Wall time (median)	45 ms
Tracemalloc peak	4.5 MB
Unique zones recovered	20

Test plan

pytest xrspatial/tests/test_hypsometric_integral.py — 29 tests pass
pytest xrspatial/tests/test_zonal.py — full zonal suite pass

_hi_dask_numpy did two blocking dask.compute() calls (_unique_finite_zones at one step, _hi_reduce at the next), so the caller paid for two full input scans before the lazy map_blocks output was even returned. _hi_reduce also np.stacked the per-block partials into an (n_blocks, n_zones, 4) array on the scheduler; at 240k blocks * 1000 zones that is ~7.7 GB resident in a single scheduler task. Have each block discover its own local unique zones and return a dict mapping zone id -> (min, max, sum, count). _hi_reduce stream-merges the partial dicts into a global hi_lookup so scheduler peak memory scales with the number of distinct zones, not n_blocks * n_zones. The up-front _unique_finite_zones pass is gone and the whole dask path collapses to a single graph evaluation.

github-actions Bot added the performance PR touches performance-sensitive code label Apr 16, 2026

brendancol merged commit d05d9b7 into master Apr 16, 2026
11 checks passed

brendancol deleted the perf/zonal-hypsometric-single-compute branch May 4, 2026 13:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuse hypsometric_integral dask path to a single graph evaluation#1212

Fuse hypsometric_integral dask path to a single graph evaluation#1212
brendancol merged 1 commit into
masterfrom
perf/zonal-hypsometric-single-compute

brendancol commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented Apr 16, 2026

Summary

Motivation

Benchmark

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant