Zonal dask paths materialize full arrays, OOM on large inputs

**Describe the bug**

The zonal module's dask code paths call `.compute()` on full arrays, so they can't process anything larger than RAM.

Four locations:

1. `_unique_finite_zones` (line 134): `da.unique(arr).compute()` pulls the entire zones array into memory. Used by `_stats_dask_numpy` and `_crosstab_dask_numpy`.

2. `_unique_finite_cats` (line 145): same thing on the values array. Used by `_find_cats` for crosstab.

3. `_regions_dask` (line 1823): `data.compute()` on the full dask array. There's a size guard (`nbytes * 5 > 0.5 * available_memory`), but a 30TB raster blows past it.

4. `_regions_dask_cupy` (line 1858): `data.compute()` on the full dask+cupy array. No size guard.

**Benchmarks (512x512 array, 19 zones)**

| Backend | Function | Wall time (ms) | Peak tracemalloc (MB) | RSS delta (KB) |
|---------|----------|---------------:|----------------------:|---------------:|
| numpy | stats | 18.9 | 25.46 | 78,544 |
| dask+numpy | stats | 1,120.67 | 37.54 | 45,980 |
| numpy | regions | 16,334.96 | 5.62 | 0 |
| dask+numpy | regions | 15,436.37 | 7.23 | 11,520 |

Dask stats uses more memory than numpy (37.5 MB vs 25.5 MB) on this small array and runs 59x slower. Dask regions just calls `.compute()` and falls through to the numpy path.

**Expected behavior**

Dask backends should bound memory by chunk size. `stats()` and `crosstab()` can gather unique zone/category values per-chunk and merge them. `regions()` should either do chunked connected-component labeling or raise `MemoryError` early for oversized inputs.

**Impact**

30TB dataset, 16GB machine: all four paths OOM before doing any work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zonal dask paths materialize full arrays, OOM on large inputs #1110

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Backend	Function	Wall time (ms)	Peak tracemalloc (MB)	RSS delta (KB)
numpy	stats	18.9	25.46	78,544
dask+numpy	stats	1,120.67	37.54	45,980
numpy	regions	16,334.96	5.62	0
dask+numpy	regions	15,436.37	7.23	11,520

Zonal dask paths materialize full arrays, OOM on large inputs #1110

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions