Guard kriging() against unbounded memory allocations (#1307) by brendancol · Pull Request #1309 · xarray-contrib/xarray-spatial

brendancol · 2026-04-29T13:55:47Z

Summary

Adds a memory guard at the top of kriging(). It estimates the worst case across the three large allocations (variogram pair arrays at N*(N-1)/2, the (N+1)x(N+1) kriging matrix, and the (grid_pixels, N+1) prediction matrix) and raises MemoryError before any of them run.
Same pattern as kde: unbounded width/height allocation in numpy and cupy backends #1287, Guard resample() against unbounded scale_factor / target_resolution #1295, sieve(): numpy and cupy backends have no memory guard #1296, sky_view_factor(): numpy and cupy backends have no memory guard #1299, landforms() outer_radius is unbounded — _circular_kernel can request hundreds of GB #1302: 0.8 * _available_memory_bytes() threshold, helper imported from xrspatial.zonal.
The error message names which allocation drove the estimate so a user knows whether to shrink N or the template grid.
Tests in TestKrigingMemoryGuard monkeypatch _available_memory_bytes to a small number. Covers the prediction-matrix path, the kriging-matrix path, a small-input pass-through, and a direct unit test of the helper.

Test plan

pytest xrspatial/tests/test_interpolation.py (34 passed locally)
CI passes
No regressions on existing kriging numpy/dask tests

kriging() takes an arbitrary point count N and template grid size with no upper bound. Three eager allocations scale with these inputs: - np.triu_indices(N) in _experimental_variogram (O(N^2) int64 pairs) - the (N+1) x (N+1) kriging matrix and its inverse - the (grid_pixels, N+1) prediction matrix in _kriging_predict A caller passing 50k points or a 5000x5000 template silently triggers tens of GB of allocation before any guard. Add _check_kriging_memory() that estimates the worst case of these three and raises MemoryError when the estimate exceeds 80% of available memory (using xrspatial.zonal._available_memory_bytes, same pattern as balanced_allocation). The error message names which allocation drove the estimate so the user knows whether to reduce N or the grid size.

) The k-nearest path in `idw()` calls `cKDTree.query(query_pts, k=k)`, which returns a `(grid_pixels, k)` float64 distance array and an int64 index array. Peak allocation is `grid_pixels * k * 16` bytes before any IDW arithmetic runs. A 50000 x 50000 template with k=12 needs about 480 GB and OOMs the process with no message naming the inputs that caused it. Add `_check_idw_memory(grid_pixels, k)` and call it at the top of the public `idw()` entrypoint when k is set on a numpy-backed template. Dask templates dispatch `_idw_knearest_numpy` per chunk via `map_blocks`, so chunk size already bounds the per-chunk allocation; the guard skips dask paths to avoid refusing legitimate chunked workloads. GPU backends reject k early. Same shape as the kriging guard from #1309. Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: brendancol <433221+brendancol@users.noreply.github.com>

github-actions Bot added the performance PR touches performance-sensitive code label Apr 29, 2026

brendancol merged commit fe755a4 into main Apr 29, 2026
11 checks passed

brendancol deleted the issue-1307 branch May 4, 2026 13:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guard kriging() against unbounded memory allocations (#1307)#1309

Guard kriging() against unbounded memory allocations (#1307)#1309
brendancol merged 1 commit into
mainfrom
issue-1307

brendancol commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented Apr 29, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant