Inline dask aggregate kernel to remove per-pixel numba dispatch

**Describe the bug**

The dask aggregate path calls a numba kernel once per output pixel. In `_agg_block_np` (xrspatial/resample.py:471-498), the inner loop runs:

```python
out[lo_y, lo_x] = func(sub, 1, 1)[0, 0]
```

for every output cell. `func` is `_agg_mean`, `_agg_min`, `_agg_max`, `_agg_median`, or `_agg_mode`. Each call dispatches into numba and allocates a fresh `(1, 1)` output. A 1000x1000 dask aggregate is about 1M kernel dispatches plus 1M tiny allocations, so the dask aggregate path is much slower than it needs to be.

**Expected behavior**

One numba call per chunk, walking the chunk's full output region in a single jitted loop and writing into a pre-allocated output buffer. The eager `_run_numpy` path already does this; the dask helper should too.

**Fix**

Add per-method block kernels that take the global geometry (`global_in_h`, `global_out_h`, `cum_in_y`, `cum_out_y`, `in_y0`, `in_x0`) as parameters and use `int(go * global_in_h / global_out_h) - in_y0` for window bounds. Replace the inner `func(sub, 1, 1)[0, 0]` loop with one call into the new kernel. `_agg_block_cupy` already round-trips to CPU, so it picks up the speedup for free.

**Additional context**

The eager numpy aggregate path is unchanged. Only the dask block helper is touched. Reference: `xrspatial/resample.py:471-498`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inline dask aggregate kernel to remove per-pixel numba dispatch #1463

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inline dask aggregate kernel to remove per-pixel numba dispatch #1463

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions